You have to fall in love with the Insights not with the Models (or with Coding)

Diego Vallarino, PhD (he/him)

Global AI & Data Strategy Leader | Quantitative Finance Analyst | Risk & Fraud ML-AI Specialist | Ex-Executive at Coface, Scotiabank & Equifax | Board Member | PhD, MSc, MBA | EB1A Green Card Holder

Published Dec 5, 2022

"It is essential to remember that when it comes to data science, the goal should not be to fall in love with the models or coding, but instead to fall in love with the insights that can be gained from the data. Models and coding are simply tools that allow us to gain those insights, so it is important to focus on the end goal of uncovering useful information and knowledge from the data." This was written entirely by AI. With openai.com

As in all data analysis, context is important. Weekend, Madrid, tapas, cañas, nerds, and talks about how data science is changing. The interesting thing is that there was a cross-section of people. Engineers, statisticians, economists, management, human resources, and data science outliers. The geographic distribution was also quite well distributed, at least between Latin Americans and Europeans. So some conclusions with statistical weight, at least in the set of friends, we could draw.

So, I share some conclusions that I can draw, and that really worried me:

There are people who have a lot of love for coding and not so much for the problem they want to solve.
There are people who want to use the latest algorithm that was discovered at MIT, Stanford, Google, in a 5-thousand-year sector, with a company´s culture from the XX century.
There are some people who love the model more than solving a decision-making problem.
Some people ask you what library you use, before asking what you want to solve.
There are some people who prefer to use XGBoost, because they read the last technical post, than another model that may have a little less accuracy but it is possible to deploy much faster.
There are people who code when their ideal is to get that code to do something similar (and generally worse) to Excel, SAS or SPSS.
There are people who pay a lot of attention to the technological infrastructure and forget that business results are needed. If there is no revenue, it is difficult for costs to increase.
There are people who are more afraid of lowcode than of Freddy Krueger.
There are people who launch products as if they were unique. The example of Quantumblack (McKinsey AI) with its CausalNex (here), based on the Google library on CausalImpact (here). In 90%, they do the same.

Obviously, the previous comments are biased. It wasn't all bad news, but it does worry me. Data science has been around for a long time and has always been about finding patterns and giving decision makers facts. In fact, some decisions are so simple and routinary that they could be systematized with prescriptive analytics. And only use the decision-making time in those ad hoc or that require more creativity.

Coding is not the focus of the topic. Comparing models is not the key to data science. They ask you what model do you use? have you used XGBoost for modelling? What libraries do you use? I think it's a huge waste of time. And that is not a predictor of anything. And if I tell you that I use SPSS Modeler and that when loading the database, defining the target, it automatically recommends all possible models. I press a run and it gives me a report with the performance of each model. Is that a data scientist? What if I do the same thing with Python, and the result is the same, am I a better data scientist?

Look at the models that are used depending on where you work (industry, academia or research).

No hay texto alternativo para esta imagen — Source: KDnuggets & Forrester

The problem is that many people who work in the industry want to use the models that those working in the academia use. And I'm not saying that people who work in the industry don't innovate based on data science (or in models), what I'm saying is that innovations in the industry are in the 4Ds that I raised in this post (here). Designing the problem in an innovative way (churn , default, etc. is not innovative). Define what Data you are going to use. Are you going to use differentiated, alternative, complementary data, or will you use the company data (biased) and add the data from the yellow pages?

Regarding Development , how are you going to develop the algorithms. Today in the industry everything is more or less within a fairly small margin. Believe me, in the last 3 or 4 years I have developed a number of models and deployed them, all based on code (mainly in R, because the need for more statistical power was important).

Recommended by LinkedIn

The two cultures paper - a must read paper to…

Ajit Jaokar 5 months ago

To Data & Beyond Week 4 Summary

Youssef Hosni 10 months ago

Top LLM Papers of the Week (August Week 3, 2024)

Kalyan KS 4 months ago

At an academic level, for my McS thesis in Statistics I developed 5 different survival models to see how they performed. My PhD thesis I used the Dif in Dif model to analyze the impact of (tax) incentives on investment decisions. Use autoregressive moving average to understand the behavior of Covid in Uruguay. I used ANN, RNN and CNN to develop an Income Predictor for the entire population of Uruguay. I used ANN and MLR to understand the propensity of several clients of a financial institutions (+200k clients). I used MLR to be able to infer the price of a head of cattle in auction processes. Use CausalImpact to find out if the UK government change in October had a major impact on the pound (here). And believe me, I can go on.

In fact, I leave here a comparison of models to understand how to predict "if a stock was going to have dividends or not". I leave it here. All the code. It's free. Use it. No company is going to generate competitive advantage based on this code, but if you are a SME and need my help, send me a DM and I´ll help you for free.

Innovations in data science are not in how many libraries you use. Or if you use Python. Or whether you code or use lowcode. In whether you code or use SAS, SPSS, Excel, it's somewhere else.

It is in understanding that you have problems if you conclude without considering ergodicity, without knowing what is moral hazard, adverse selection, without understanding that you cannot model a chaotic experiment based on Bayes, in not knowing what Entropy implies for a data base, in not understanding that information asymmetry can be seen from different perspectives (as George A. Akerlof, A. Michael Spence & Joseph E. Stiglitz did), in developing nested models capable of telling us if the WHO (people or companies) will WHAT (propensity , origination , default , collection , churn , etc.) but also know WHEN they will, not knowing that within the value theory we have at least 3 stages: generating, appropriating and distributing value. Within another number of concepts that make data science in business (in others fields the knowledge are different, but the concept is identical).

And the truth is that this has nothing to do with Python/R libraries, it has to do with creativity, with the evolution of a discipline that is based on being able to find patterns, to generate insights, to make better decisions, to optimize the theory of value, and to be able to generate dynamic and sustainable competitive advantages.

Porandu

2,449 followers

+ Subscribe

Luis Ojeda

BI | Data Vizualization | Customer experience

Gracias Diego, por un artículo tan interesante.

1 Reaction

Macarena Estévez

🗣️ Passionate Speaker and Strategic Advisor in AI, Data, Trends, Metaverse, Future of Marketing and Work. 🏆 LinkedIn TopVoice ✍️ TEDx Writer and Thinker. #Data&AI #Metaverse #ROI #FutureOfMarketing #FutureOfWork

Very interesting Diego Vallarino, PhD (he/him)

2 Reactions

See more comments

To view or add a comment, sign in

See all

You have to fall in love with the Insights not with the Models (or with Coding)

Diego Vallarino, PhD (he/him)

Global AI & Data Strategy Leader | Quantitative Finance Analyst | Risk & Fraud ML-AI Specialist | Ex-Executive at Coface, Scotiabank & Equifax | Board Member | PhD, MSc, MBA | EB1A Green Card Holder

Recommended by LinkedIn

Porandu

2,449 followers

More articles by this author

Insights from the community

Others also viewed

Breaking BERT — How to break into Machine Learning

Everything About Decision Tree From Scratch

The Accidental Data Scientists

What Data Science Forgot

Vector Indexing plus Knowledge Graphs with Neo4j

Alinea Meets Blair Young - Data Scientist at Triptease

Exploring the World of AI, Data Science & Beyond: My Journey & Reflections

KD 16:n43: The hard thing about Deep Learning; Big Data Main Events in 2016, Key Trends for 2017

Interview with a Kaggle Master, GANS & Much More!

Will I Be Replaced by AI?

Explore topics

Recommended by LinkedIn

Porandu

2,449 followers

Labor Network Analysis in Uruguay: A Policy Perspective Centered on the 25.000 Pesos Threshold.

Dec 11, 2024

Does the Dichotomy Between Data-Driven Decisions and Managerial Experience Exist?

Dec 4, 2024

The Fusion of AI Innovation and Financial Fraud Detection: A Transformative Approach.

Nov 19, 2024

The True Value of Data: Insights from Industry Leaders.

Nov 12, 2024

Project 2025: The Path to Structural Inflation in the U.S.

Nov 8, 2024

From Data to Emotion: Why Understanding U.S. Voter Behavior Is So Challenging.

Nov 6, 2024

The Challenge of Predicting Election Outcomes: Lessons from Uruguay and the USA.

Nov 4, 2024

A Dynamic Approach to Stock Price Prediction: Comparing RNN and Mixture of Experts Models Across Different Volatility Profiles

Oct 28, 2024

Charting the Chaos: Transformative Research on Market Power and Predictive Models.

Oct 23, 2024

Dynamic Portfolio Rebalancing: A Hybrid Model Using GNNs and Pathfinding for Cost Efficiency.

Oct 22, 2024

Insights from the community

Others also viewed

Breaking BERT — How to break into Machine Learning

Everything About Decision Tree From Scratch

The Accidental Data Scientists

What Data Science Forgot

Vector Indexing plus Knowledge Graphs with Neo4j

Alinea Meets Blair Young - Data Scientist at Triptease

Exploring the World of AI, Data Science & Beyond: My Journey & Reflections

KD 16:n43: The hard thing about Deep Learning; Big Data Main Events in 2016, Key Trends for 2017

Interview with a Kaggle Master, GANS & Much More!

Will I Be Replaced by AI?

Explore topics