Do You Really Need a Feature Store?

Do You Really Need a Feature Store?

Welcome to Continual Learnings

A weekly newsletter for practitioners building ML-powered products. You can subscribe to the email version here or follow us on Twitter.

What we're reading this week

Stability API docs: Stability AI (the creators of Stable Diffusion) released their first API. More competition in the API market is good for ML product developers. 

GPT-JT release: This open source model is remarkable not for its performance (which is reasonably strong but not state of the art), but for how it was trained. Rather than training on a large, state of the art GPU cluster with fast data transfer, it was trained on idle GPU cycles from heterogenous academic GPU clusters.

Patching open-vocabulary models by interpolating weights: Model patching is a promising approach to fixing acute model errors more quickly and reliably than fine-tuning. This is one of the simpler approaches I’ve seen, so it could be worth a try.

Production ML papers to know

In this series, we cover important papers to know if you build ML-powered products.

Why you probably don’t need a feature store

If you’re building out your ML stack, you’ve probably considered implementing a feature store. Used by companies like Uber, Netflix, Airbnb and Google, a feature store is often presented as a necessary part of production ML.

But, this article asks whether you really need a feature store - and their answer is ‘no’. At least, not unless there are some specific circumstances that justify the additional complexity that using one will create.

This article sets out what those circumstances are, and the alternative approaches that could serve your needs just as well.

What is a Feature Store?

The first mention of feature stores was in this Uber blog describing their ML platform, Michelangelo.

A feature store is, simply, a repository for storing and serving ML features. It is a key-value store, where a timestamp and a key - an entity_id such as a user - is provided by a client, and feature values are passed to the model for either training or prediction. The features can be ingested from various data sources, and transformed as necessary prior to ingestion.

The problem a feature store attempts to solve is something called training-serving skew. A model that is trained on processed data, needs to make predictions on production data that has been processed in an identical way - if it isn’t, we can’t be confident our model predictions will be as good.

Uber’s basis for a feature store was that they had an offline training process and an online prediction process - so an internal feature store enabled both processes to be in sync.

But feature stores might add unnecessary complexity, and they are not the only way of addressing training-serving skew.

What are the alternatives to a Feature Store?

Let’s take a look at the alternate approaches outlined in the article.

The first, and simplest, alternative is to incorporate the preprocessing steps within the model function. Both training data and prediction data are passed to the model in a raw state; this data is processed and a prediction returned by the modeling function.

Incorporating the preprocessing code into the model function
Incorporating the preprocessing code into the model function. Image from the article

This approach is simple and versatile. Because preprocessing code is part of the model function, no extra infrastructure is required. The model can be deployed on the edge or in the cloud relatively easily.

But preprocessing steps will be repeated each time data is sent to the model, which can be computationally expensive. In addition, we reduce flexibility by having to implement the preprocessing code in the same framework as the ML model.

The second approach is to use a transform function to preprocess data prior to passing the data to the model for training or for making a prediction.

Encapsulate the preprocessing code into a transform function that is applied to both the raw dataset and to prediction requests
Encapsulate the preprocessing code into a transform function that is applied to both the raw dataset and to prediction requests. Image from the article

This approach requires an additional step to be inserted between the input and the model, and for this to be invoked for both the training and prediction code.

This step might be encapsulated within a container or an SQL clause, and while this adds efficiency, it can also add complexity, so this approach should only be used if the extra infrastructural and bookkeeping overhead is worth it.

So when should we use a Feature Store?

The article contends that these two approaches should be sufficient for most features - but also that there are times when a feature store might be invaluable.

In particular, we will need a feature store if the feature value is not known by the client (for example, a mobile app), has to be computed on the server side, and injected into prediction requests. An example is the number of visitors to a hotel, which could be a feature of a dynamic pricing model, and which will vary over time.

We might also need a feature store to prevent unnecessary copies of the data, such as when a feature is computationally expensive and used by multiple ML models. It might be more efficient and maintainable to store it centrally.

The diagram below illustrates an example provided in the article, where a feature used by many models - in this case, the output of an embedding algorithm - is updated daily. The models are re-trained regularly, and the feature store ensures that the embedding feature is provided efficiently, and is aligned with the training labels and timestamp required by the models.

Feature Store Use Case example
Feature Store Use Case example. Image from the article

In summary, a feature store is particularly useful for hard-to-compute features that are not available on the client side, are frequently updated, and used by multiple models.

So what?

A lesson I’ve learned again and again in ML is that complexity should be earned, not assumed. ML systems are prone to bugs and long development times. It’s best to take the shortcuts you can to get a minimum viable model into production quickly, and iterate on your approach from there.

Through that lens, this article reminds us that there’s no one-size-fits-all solution to feature serving. Alternate approaches to a feature store will usually better meet our requirements without the additional complexity of a feature store.

The article is available here.

To view or add a comment, sign in

More articles by Gantry

Insights from the community

Others also viewed

Explore topics