Understanding feature engineering from a mathematical perspective
Background
In this article, we explore feature engineering from a mathematical perspective. The primary goal of feature engineering is to transform raw data into a representation that captures the underlying patterns in the problem more effectively. In a previous post, I shared that feature engtineering and model evaluation can be thought of as a feedback loop. In this post, we extend this idea further.
From a mathematical perspective, feature engineering is about transforming raw data into a set of variables that can better represent the underlying problem to predictive models, thereby improving their performance. This involves applying maths based transformations and extracting meaningful patterns or new information from the data.
Mathematically, feature engineering transforms the data into a space where it can be more effectively and efficiently processed by machine learning algorithms. This involves a variety of linear and nonlinear transformations, scaling, and extraction techniques designed to expose the underlying patterns in the data to the predictive models.
Overview of feature engineering
Feature engineering involves creating features (input variables) from raw data that make machine learning algorithms work more effectively. Typically, features engineering uses domain knowledge to discern features from the raw data. These engineered features can be used to improve the performance of machine learning algorithms. The goal is to provide meaningful information through these engineered features that the model can use to make accurate predictions or classifications.
Well-designed features can improve the predictive power of machine learning models by capturing important information in the data. Better features can allow a simpler model to perform well, reducing the need for more complex algorithms which are harder to interpret and maintain.By creating features that capture the underlying structure of the data, models are less likely to overfit to the noise in the training set and are better at generalizing to new examples.
In this sense, feature engineering is crucial for machine learning.
Components of feature engineering
There are primarily three techniques/components of feature engineering.
Feature Transformation
Feature transformation involves changing the format or the scale of the data without altering its content. This process can make the data more suitable for modeling by changing its distribution or scaling. Common feature transformations include:
Feature Scaling
Feature scaling is a technique to standardize the independent features present in the data in a fixed range. It is a part of feature transformation but focuses specifically on altering the scale of features so that they can be compared on common grounds. This is particularly important for models that rely on the magnitude of the data, such as distance-based algorithms like K-Nearest Neighbors (KNN) and gradient-based algorithms like linear regression. Examples of feature scaling include:
Feature Extraction
Feature extraction is the process of creating new features from existing data, which captures essential information in a more useful or composite form. This is particularly important in unstructured data types like text and images. Examples of feature extraction include
Recommended by LinkedIn
Interpreting Feature engineering as understanding the underlying distribution of data for inference
You can also think of feature engineering as the ability to understand the underlying distribution for the purpose of inference. The underlying distribution of a phenomenon refers to the statistical properties (mean, variance, skewness, relationships, etc.) of the data that generate observations.Inference in machine learning refers to the model’s ability to make predictions or draw conclusions from data. We can think of feature engineering as a bridge between the underlying statistical properties of the data (as represented by its data distribution) and its predictive performance.
Exploratory Data Analysis (EDA) is often the starting point, revealing insights into the data’s distribution and informing feature engineering strategies.
Feature engineering is, in essence, the process of translating the insights from EDA into features that improve the model’s inference capabilities.
Feature engineering impacts inference in several ways:
Capturing Informative Features: Features that represent the phenomenon's underlying distribution enable the model to learn meaningful relationships and patterns. Example: For predicting house prices, creating features like "proximity to schools" or "number of rooms per occupant" might align better with the real-world distribution of housing prices than raw input variables.
Reducing Noise: Feature engineering helps isolate the signal from noise by emphasizing variables that are statistically significant or have predictive power. This leads to more reliable inference because the model focuses on aspects of the data that matter.
Addressing Non-linear Relationships: Many real-world phenomena have non-linear or complex relationships between variables. Feature engineering can transform data (e.g., log, polynomial, interaction terms) to capture these relationships, making the inference process more robust.
Improving Generalization: By representing the distribution well, the engineered features help models generalize better to unseen data. This reduces overfitting and ensures the model's inferences are applicable beyond the training dataset.
Matching the Model to the Distribution : Feature engineering is also about ensuring that the data aligns with the assumptions of the chosen model based on its distribution:
Overcoming Distribution-Inference Alignment: We can think of feature engineering as overcoming the misalignment in the data for the purpose of inference. Here are some ways in which this could occur.
Conclusion
Understanding the underlying distribution of a phenomenon allows us to engineer features that effectively represent the data's structure and relationships. This representation directly influences the model's ability to make accurate inferences, as the quality and relevance of the features determine the model's success in capturing the true essence of the phenomenon. Feature engineering techniques thus act as the bridge that connects statistical understanding to predictive performance.
If you want to study with us, please see our course on #AI at #universityofoxford (almost full) https://lnkd.in/dcdrjSC2
My job was a general worker. The time I was coaching. I was helping teachers to learn from them. That's why I say I was coaching start.
1dThe goal of feature engineering for a numeric variable is to find a better way of representing the numeric variable in the model, where "better" connotes greater validity, better predictive power, and improved interpretation.