True numerical data
can be meaningfully multiplied. For example, consider a
model that predicts the value of a house based on its area.
Note that a useful model for evaluating house prices typically relies on
hundreds of features. That said, all else being equal, a house of 200 square
meters should be roughly twice as valuable as an identical house of 100 square
meters.
Oftentimes, you should represent features that contain integer values as
categorical data instead of numerical data. For example, consider a postal
code feature in which the values are integers. If you represent this
feature numerically rather than categorically, you're asking the model
to find a numeric relationship
between different postal codes. That is, you're telling the model to
treat postal code 20004 as twice (or half) as large a signal as postal code
10002. Representing postal codes as categorical data lets the model
weight each individual postal code separately.
Encoding
Encoding means converting categorical or other data to numerical vectors
that a model can train on. This conversion is necessary because models can
only train on floating-point values; models can't train on strings such as
"dog" or "maple". This module explains different
encoding methods for categorical data.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2024-10-09 UTC."],[[["This module focuses on differentiating between categorical and numerical data within machine learning."],["You will learn how to represent categorical data using one-hot vectors and address common issues associated with it."],["The module covers encoding techniques for converting categorical data into numerical vectors suitable for model training."],["Feature crosses, a method for combining categorical features to capture interactions, are also discussed."],["It is assumed you have prior knowledge of introductory machine learning and working with numerical data."]]],[]]