The Limitations of the Data in Predictive Analytics

The Limitations of the Data in Predictive Analytics

As with many aspects of any business system, data is a human creation — so it’s apt to have some limits on its usability when you first obtain it. Here’s an overview of some limitations you’re likely to encounter:

  • The data could be incomplete. Missing values, even the lack of a section or a substantial part of the data, could limit its usability.
  • For example, your data might cover only one or two conditions of a larger set that you’re trying to model — as when a model built to analyze stock market performance only has data available from the past 5 years, which skews both the data and the model toward the assumption of a bull market.
  • The moment the market undergoes any correction that leads to a bear market, the model fails to adapt — simply because it wasn’t trained and tested with data representing a bear market.
Make sure you’re looking at a timeframe that gives you a complete picture of the natural fluctuations of your data; your data shouldn’t be limited by seasonality.
  • If you’re using data from surveys, keep in mind that people don’t always provide accurate information. Not everyone will answer truthfully about (say) how many times they exercise — or how many alcoholic beverages they consume — per week. People may not be dishonest so much as self-conscious, but the data is still skewed.
  • Data collected from different sources can vary in quality and format. Data collected from such diverse sources as surveys, e-mails, data-entry forms, and the company website will have different attributes and structures. Data from various sources may not have much compatibility among data fields. Such data requires major preprocessing before it’s analysis-ready. The accompanying sidebar provides an example.
Data collected from multiple sources may have differences in formatting, duplicate records, and inconsistencies across merged data fields. Expect to spend a long time cleaning such data — and even longer validating its reliability.

To determine the limitations of your data, be sure to:

  • Verify all the variables you’ll use in your model.
  • Assess the scope of the data, especially over time, so your model can avoid the seasonality trap.
  • Check for missing values, identify them, and assess their impact on the overall analysis.
  • Watch out for extreme values (outliers) and decide on whether to include them in the analysis.
  • Confirm that the pool of training and test data is large enough.
  • Make sure data type (integers, decimal values, or characters, and so forth) is correct and set the upper and lower bounds of possible values.
  • Pay extra attention to data integration when your data comes from multiple sources.
Be sure you understand your data sources and their impact on the overall quality of your data.
  • Choose a relevant dataset that is representative of the whole population.
  • Choose the right parameters for your analysis.

Even after all this care and attention, don’t be surprised if your data still needs preprocessing before you can analyze it accurately. Preprocessing often takes a long time and significant effort because it has to address several issues related to the original data — these issues include:

  • Any values missing from the data.
  • Any inconsistencies and/or errors existing in the data.
  • Any duplicates or outliers in the data.
  • Any normalization or other transformation of the data.
  • Any derived data needed for the analysis.

ABOUT THE AUTHOR

Vartul Mittal is Technology & Innovation Specialist. Vartul Mittal focuses on helping clients accelerate their digital transformation journey. He has 14+ years of Global Business Transformation experience in Management Consulting and Global In-house Centres in managing technology & business teams in Intelligent Automation, Advanced Analytics and Cloud Adoption. He is passionate about extending customer relationships beyond the current project with a longer term goal of becoming a trusted adviser and bringing greater value to businesses via digital disruption.

He has lived and worked across multiple countries and cultures involving senior client stakeholders from various industries like Financial Services Sector, FMCG and Retail. He has delivered engagements for Fortune 100 organizations such as Coca Cola India, Kotak Mahindra Bank, IBM, Royal Bank of Scotland, Standard Life Insurance, Citibank and Barclays. Vartul Mittal is also renowned speaker on Analytics, Automation, AI and Innovation among Top Global Universities and International Conferences.

To view or add a comment, sign in

More articles by Vartul Mittal

  • Data Mining For Rookies

    Data Mining For Rookies

    Data mining is the way that ordinary businesspeople use a range of data analysis techniques to uncover useful…

    2 Comments
  • Amazon Web Services for Rookies

    Amazon Web Services for Rookies

    Amazon Web Services (AWS) is a cloud service provider that offers easy access to a variety of useful computing…

    2 Comments
  • Hybrid Cloud For Rookies

    Hybrid Cloud For Rookies

    A company deploys a hybrid cloud when it uses public and private cloud services in combination with its internal data…

  • API for Rookies

    API for Rookies

    What is an API? (Application Programming Interface) API is the acronym for Application Programming Interface, which is…

  • Machine Learning For Rookies

    Machine Learning For Rookies

    Machine learning is an incredible technology that you use more often than you think today and that has the potential to…

  • Brief Guide to Understanding Bayes’ Theorem

    Brief Guide to Understanding Bayes’ Theorem

    Before you begin using Bayes’ Theorem to perform practical tasks, knowing a little about its history is helpful. The…

  • AI in the Professional Services Industry

    AI in the Professional Services Industry

    Business organizations look to professional services firms to offload existing processes such as payroll, claims…

  • 10 Applications that require Deep Learning

    10 Applications that require Deep Learning

    This article is too short. It can’t even begin to describe the ways in which deep learning will affect you in the…

    1 Comment
  • Linear Regression vs. Logistic Regression

    Linear Regression vs. Logistic Regression

    Both linear and logistic regression see a lot of use in data science but are commonly used for different kinds of…

  • Predictive Analytics For Rookies

    Predictive Analytics For Rookies

    A predictive analytics project combines execution of details with big-picture thinking. These handy tips and checklists…

Insights from the community

Others also viewed

Explore topics