SLEP013: n_features_out_
attribute
- Author:
Adrin Jalali
- Status:
Rejected
- Type:
Standards Track
- Created:
2020-02-12
Abstract
This SLEP proposes the introduction of a public n_features_out_
attribute
for most transformers (where relevant).
Motivation
Knowing the number of features that a transformer outputs is useful for inspection purposes. This is in conjunction with *SLEP010: ``n_features_in_``*.
Take the following piece as an example:
X, y = fetch_openml("titanic", version=1, as_frame=True, return_X_y=True)
# We will train our classifier with the following features:
# Numeric Features:
# - age: float.
# - fare: float.
# Categorical Features:
# - embarked: categories encoded as strings {'C', 'S', 'Q'}.
# - sex: categories encoded as strings {'female', 'male'}.
# - pclass: ordinal integers {1, 2, 3}.
# We create the preprocessing pipelines for both numeric and categorical data.
numeric_features = ['age', 'fare']
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())])
categorical_features = ['embarked', 'sex', 'pclass']
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))])
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)])
# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', LogisticRegression())])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf.fit(X_train, y_train)
The user could then inspect the number of features going out from each step:
# Total number of output features from the `ColumnTransformer`
clf[0].n_features_out_
# Number of features as a result of the numerical pipeline:
clf[0].named_transformers_['num'].n_features_out_
# Number of features as a result of the categorical pipeline:
clf[0].named_transformers_['cat'].n_features_out_
Solution
The proposed solution is for the n_features_out_
attribute to be set once a
call to fit
is done. In many cases the value of n_features_out_
is the
same as some other attribute stored in the transformer, e.g.
n_components_
, and in these cases a Mixin
such as a ComponentsMixin
can delegate n_features_out_
to those attributes.
Testing
A test to the common tests is added to ensure the presence of the attribute or
property after calling fit
.
Considerations
The main consideration is that the addition of the common test means that
existing estimators in downstream libraries will not pass our test suite,
unless the estimators also have the n_features_out_
attribute.
The newly introduced checks will only raise a warning instead of an exception for the next 2 releases, so this will give more time for downstream packages to adjust.
There are other minor considerations:
In some meta-estimators, this is delegated to the sub-estimator(s). The
n_features_out_
attribute of the meta-estimator is thus explicitly set to that of the sub-estimator, either via a@property
, or directly infit()
.Some transformers such as
FunctionTransformer
may not know the number of output features since arbitrary arrays can be passed totransform
. In such casesn_features_out_
is set toNone
.
Copyright
This document has been placed in the public domain. [1]