You have to fall in love with the Insights not with the Models (or with Coding)
"It is essential to remember that when it comes to data science, the goal should not be to fall in love with the models or coding, but instead to fall in love with the insights that can be gained from the data. Models and coding are simply tools that allow us to gain those insights, so it is important to focus on the end goal of uncovering useful information and knowledge from the data." This was written entirely by AI. With openai.com
As in all data analysis, context is important. Weekend, Madrid, tapas, cañas, nerds, and talks about how data science is changing. The interesting thing is that there was a cross-section of people. Engineers, statisticians, economists, management, human resources, and data science outliers. The geographic distribution was also quite well distributed, at least between Latin Americans and Europeans. So some conclusions with statistical weight, at least in the set of friends, we could draw.
So, I share some conclusions that I can draw, and that really worried me:
Obviously, the previous comments are biased. It wasn't all bad news, but it does worry me. Data science has been around for a long time and has always been about finding patterns and giving decision makers facts. In fact, some decisions are so simple and routinary that they could be systematized with prescriptive analytics. And only use the decision-making time in those ad hoc or that require more creativity.
Coding is not the focus of the topic. Comparing models is not the key to data science. They ask you what model do you use? have you used XGBoost for modelling? What libraries do you use? I think it's a huge waste of time. And that is not a predictor of anything. And if I tell you that I use SPSS Modeler and that when loading the database, defining the target, it automatically recommends all possible models. I press a run and it gives me a report with the performance of each model. Is that a data scientist? What if I do the same thing with Python, and the result is the same, am I a better data scientist?
Look at the models that are used depending on where you work (industry, academia or research).
The problem is that many people who work in the industry want to use the models that those working in the academia use. And I'm not saying that people who work in the industry don't innovate based on data science (or in models), what I'm saying is that innovations in the industry are in the 4Ds that I raised in this post (here). Designing the problem in an innovative way (churn , default, etc. is not innovative). Define what Data you are going to use. Are you going to use differentiated, alternative, complementary data, or will you use the company data (biased) and add the data from the yellow pages?
Regarding Development , how are you going to develop the algorithms. Today in the industry everything is more or less within a fairly small margin. Believe me, in the last 3 or 4 years I have developed a number of models and deployed them, all based on code (mainly in R, because the need for more statistical power was important).
Recommended by LinkedIn
At an academic level, for my McS thesis in Statistics I developed 5 different survival models to see how they performed. My PhD thesis I used the Dif in Dif model to analyze the impact of (tax) incentives on investment decisions. Use autoregressive moving average to understand the behavior of Covid in Uruguay. I used ANN, RNN and CNN to develop an Income Predictor for the entire population of Uruguay. I used ANN and MLR to understand the propensity of several clients of a financial institutions (+200k clients). I used MLR to be able to infer the price of a head of cattle in auction processes. Use CausalImpact to find out if the UK government change in October had a major impact on the pound (here). And believe me, I can go on.
In fact, I leave here a comparison of models to understand how to predict "if a stock was going to have dividends or not". I leave it here. All the code. It's free. Use it. No company is going to generate competitive advantage based on this code, but if you are a SME and need my help, send me a DM and I´ll help you for free.
Innovations in data science are not in how many libraries you use. Or if you use Python. Or whether you code or use lowcode. In whether you code or use SAS, SPSS, Excel, it's somewhere else.
It is in understanding that you have problems if you conclude without considering ergodicity, without knowing what is moral hazard, adverse selection, without understanding that you cannot model a chaotic experiment based on Bayes, in not knowing what Entropy implies for a data base, in not understanding that information asymmetry can be seen from different perspectives (as George A. Akerlof, A. Michael Spence & Joseph E. Stiglitz did), in developing nested models capable of telling us if the WHO (people or companies) will WHAT (propensity , origination , default , collection , churn , etc.) but also know WHEN they will, not knowing that within the value theory we have at least 3 stages: generating, appropriating and distributing value. Within another number of concepts that make data science in business (in others fields the knowledge are different, but the concept is identical).
And the truth is that this has nothing to do with Python/R libraries, it has to do with creativity, with the evolution of a discipline that is based on being able to find patterns, to generate insights, to make better decisions, to optimize the theory of value, and to be able to generate dynamic and sustainable competitive advantages.
BI | Data Vizualization | Customer experience
2yGracias Diego, por un artículo tan interesante.
🗣️ Passionate Speaker and Strategic Advisor in AI, Data, Trends, Metaverse, Future of Marketing and Work. 🏆 LinkedIn TopVoice ✍️ TEDx Writer and Thinker. #Data&AI #Metaverse #ROI #FutureOfMarketing #FutureOfWork
2yVery interesting Diego Vallarino, PhD (he/him)