Analyze real data, data scientists!

Analyze real data, data scientists!

Data is always messy and its analysis requires smart choices. That’s why AI will never replace human market researchers. In both academia and industry, smart consumers of market research therefore demand transparency in the exact steps researchers took with the data – and the peer review process in top journals does a decent, though imperfect, job in catching those that would change the actionable insights - a Nature article claimed it would have prevented the Theranos fraud.

No alt text provided for this image

 

What is definitely NOT a good way of dealing with data messiness is to just make up data. In academia, we have seen our share of scandals. In his memoir “Faking Science: A True Story of Academic Fraud”, former psychology professor Diederik Stapel explains how he got annoyed with data not supporting his brilliant ideas at the 95% statistical confidence level, and fabricating them to fit said ideas. In his words, he “became impatient, overambitious, reckless. I wanted to go faster and better and higher and smarter, all the time.” Fortunately, academia has gotten much better in catching and reporting fraud.

No alt text provided for this image

How about faking data in industry? This week’s lawsuit from JPMorgan Chase against Frank is likely just the tip of the iceberg. According to the filing, Frank’s founder and CEO Charlie Javice lied “about Frank’s success, Frank’s size, and the depth of Frank’s market penetration in order to induce JPMC to purchase Frank for $175 million”. Specifically, Javice used “synthetic data” techniques to create a list of 4.265 million “students” who did not actually exist. When a Frank engineer declined to do so, Javice “turned to a datascience professor at a New York City area college who advertised his “creative solutions” to data problems.” Based a list of 293,192 actual students who had started or submitted a FAFSA application through Frank, Javice directed the Data Science Professor to use “synthetic data” techniques to create 4.265 million customer names, email addresses, birthdays, and other personal information. Interestingly, the lawsuit has access to emails between Javice and the Data Science Professor showing their understanding of their actions. The Data Science professor wrote

(1) “[f]or names, our plan was to sample first name and last name independently and then ensure none of the sampled names are real” , and

(2) “I can’t seem to find addresses in my raw files . . . . Should I attempt to fabricate them?”

Moreover, when reviewing the synthetic data, the Data Science Professor noted that many entries confusingly had customers living, attending high school, and attending college in the same town and state, and concluded that the list “would look fishy to [him] if [he] were to audit it.”


No alt text provided for this image

 Beyond such million dollar monetary consequences, faking data can cost more, including failed harvests and famine if you eg “changed centrally held figures for a key metric such as soil fertility that many arable farmers use to organize their planting schedules”. It can also costs lives in medical trials, such as when dr. Werner Beswoda faked results that high-dose chemotherapy was successful in the treatment of women with high-risk breast cancer. Other researchers build on such research, and can waste years, sometimes setting back the research by decades. They also give an excuse to people not to take actual science seriously. For instance, a January 11th study debunks the misconception that COVID trials cut clinical corners.


The bottom line? Stay vigilant as a consumer of scientific studies and market research: ask the tough questions and demand answer. Stay patient as a researcher: it is the dynamic dialogue between theory and empirics that drives science forward.

Raphael Fitoussi

Community Manager et Graphiste freelance | J’accompagne les PME et les indépendants à se distinguer grâce à des stratégies digitales efficaces et des designs percutants

1mo

Hey Koen! It's crucial to shed light on the dark side of data science and the industry issues you're addressing really hit home. How do you reckon we can ensure more integrity and transparency in both academic and industry sectors? The messy nature of data definitely requires sharp minds to tame it, and your insights highlight the human touch that AI can't replace. Looking forward to more of your insights into the evolving landscape and ways we can push boundaries while avoiding the traps of fraud!

Like
Reply
Byron Sharp

Research Professor (Marketing Science), Director Ehrenberg-Bass Institute, Adelaide University of South Australia.

2y

Never trust a single study, especially a single set of data. Use it as an interesting starting point.

Luc Wathieu

Professor @ Georgetown | Behavioral Economics | Consumer Empowerment | Product Management | Customer Analytics

2y

Nice overview!

Dr. Augustine Fou

FouAnalytics - "see Fou yourself" with better analytics

2y

interesting details. thank you for posting.

To view or add a comment, sign in

More articles by Prof. dr. Koen Pauwels

  • Mythbusting for online marketplaces

    Mythbusting for online marketplaces

    While I covered how the FTC convincingly argued that online marketplaces are waaaay better than brand websites, do…

    18 Comments
  • How to think about retail media: 4 stakeholders

    How to think about retail media: 4 stakeholders

    Retail media is a convergence of advertising, e-commerce, and data-driven insights. At its core, retail media leverages…

    8 Comments
  • Understanding Retail Media Part 1

    Understanding Retail Media Part 1

    Retail media, the integration of advertising and e-commerce within retail platforms, has grown in 2024 to over $140 B…

    4 Comments
  • Cleanrooms beat walled gardens in big data analytics

    Cleanrooms beat walled gardens in big data analytics

    Big data analytics is a holistic approach to managing, processing and analyzing the 5 V data-related dimensions to…

    1 Comment
  • Democratize Big data Analytics

    Democratize Big data Analytics

    In our latest Journal of Business Research publication, management professor Zeynep Aksehirli and I address the 3…

    13 Comments
  • Omnicom buys IPG in ad world turmoil

    Omnicom buys IPG in ad world turmoil

    On Monday, Omnicom announced it would acquire rival Interpublic Group, creating the largest ad firm in the world. 22 of…

    18 Comments
  • 3Cs on how to increase Research Impact

    3Cs on how to increase Research Impact

    Last week saw me returning to Europe to give the opening keynote in Let’s Talk About Service (LTAS), a dedicated…

    17 Comments
  • 5 academics pitch to managers : DMSB DAY I

    5 academics pitch to managers : DMSB DAY I

    Our first-ever DMSB Day united faculty and industry leaders for an engaging showcase of research ideas designed to…

    2 Comments
  • Integrating experiments and MMM in marketing

    Integrating experiments and MMM in marketing

    Last Thursday continued our Marketing Science Institute Insights Exchange on Econometric Modeling vs. Field…

    9 Comments
  • Memes + model: which topics drove the election

    Memes + model: which topics drove the election

    I was late to the party on memes, preferring multivariate models to help me understand what matters most in marketing…

    14 Comments

Insights from the community

Others also viewed

Explore topics