Key questions for finding data to build ML models

Key questions for finding data to build ML models

Since data is the essential ingredient to power ML algorithms, it always helps to build an understanding of the patterns in data before attempting to train models. Exploratory data analysis (EDA) techniques can help build hypotheses about the data, identify data cleaning requirements, and inform the process of selecting potentially significant features. EDA can be carried out visually for intuitive insight. EDA leads naturally into feature engineering and feature selection. Feature engineer‐ ing is the process of taking raw data from the selected datasets and transforming it into “features” that better represent the underlying problem to be solved.

The constraints of data governance bring questions like:

  1. What relevant datasets are available?
  2. Is this data sufficiently accurate and reliable?
  3. How can stakeholders get access to this data?
  4. What data properties can be made by combining multiple sources of data?
  5. Will this data be available in real time?
  6. Is there a need to label some of the data with the “ground truth” that is to be predicted, or does unsupervised learning make sense? If so, how much will this cost in terms of time and resources?
  7. What platform should be used?
  8. How will data be updated once the model is deployed?
  9. Will the use of the model itself reduce the representativeness of the data?
  10. How will the KPIs, which were established along with the business objectives, be measured?
  11. Can the selected datasets be used for this purpose?
  12. What are the terms of use?

Reference: Introducing MLOps How to Scale Machine Learning in the Enterprise: Mark Treveil and the Dataiku Team

To view or add a comment, sign in

More articles by Kishan Rajoria

  • Working of MLOps (Part-3)

    Working of MLOps (Part-3)

    MLOps follows a similar pattern to DevOps the practices that driver’s seamless integration between your development…

    1 Comment
  • Roles and Requirement of MLOPs (Part-2)

    Roles and Requirement of MLOPs (Part-2)

    1. Subject matter experts a.

    1 Comment
  • Machine Learning Model Operationalization (ML Ops): Part-1

    Machine Learning Model Operationalization (ML Ops): Part-1

    During the industrial revolution the rise of the physical machines required organizations to systematize form factories…

    1 Comment
  • Forecasting Error

    Forecasting Error

    When doing forecasting whether our forecasting model is accurate or not because forecasting is an estimation of future…

  • Exponential Smoothing model

    Exponential Smoothing model

    As we know exponential smoothing models are very efficient models of smoothing and these models help us effortless…

  • Forecasting Principles and methods

    Forecasting Principles and methods

    Advanced models of time series analysis and these are known as exponential smoothing models. The name explanation…

  • Forecasting Introduction and Methods-2

    Forecasting Introduction and Methods-2

    For the time series forecasting there are some fundamental requirements. Type of method you are going to use you need…

  • Forecasting Introduction & Methods-1

    Forecasting Introduction & Methods-1

    Utility operation and maintenance management point of view we have to take many decisions based on forecasting…

  • Basic Intro to DSP

    Basic Intro to DSP

    Digital signal processing Lets introduce ourselves to digital signal processing. It is concerned with the…

Insights from the community

Others also viewed

Explore topics