Key questions for finding data to build ML models
Since data is the essential ingredient to power ML algorithms, it always helps to build an understanding of the patterns in data before attempting to train models. Exploratory data analysis (EDA) techniques can help build hypotheses about the data, identify data cleaning requirements, and inform the process of selecting potentially significant features. EDA can be carried out visually for intuitive insight. EDA leads naturally into feature engineering and feature selection. Feature engineer‐ ing is the process of taking raw data from the selected datasets and transforming it into “features” that better represent the underlying problem to be solved.
The constraints of data governance bring questions like:
Reference: Introducing MLOps How to Scale Machine Learning in the Enterprise: Mark Treveil and the Dataiku Team