Data Science | AI | DataOps | Engineering
backgroundGrey.png

Blog

Data Science & Data Engineering blogs

A Beginner’s Guide To Understanding Feature Stores

Feature stores are rapidly gaining popularity in the machine learning environment. When I initially began my quest to learn more about feature stores, I had a number of questions. The answers to these questions helped me understand what feature stores are and why they're becoming so important in machine learning. I thought it would be best to list the questions I had and the answers as a guide for beginners to help learn about feature stores.

1.      What is the history of Feature Store?

Uber first introduced the concept of a feature store with the launch of its Michelangelo machine learning platform in 2017. The introduction of a feature store played an essential part in helping Uber operationalise their machine learning projects. Since then, there have been a growing number of venture-backed startups along with tech giants such as Google, AWS and Databricks, who have launched feature stores as part of their platform. The timeline below summarises the key feature store milestones.

 

2.      What is the meaning of features?

In machine learning, features are used as inputs into a machine learning model. It is arguably the most critical ingredient for successful machine learning. Features are also sometimes referred to as “variables” or “attributes.” For example, the tabular data below is an example dataset used to build a machine learning model predicting future sales of a store. The features are columns in the table used as inputs into a machine learning model.

 

3. We have now defined features, but what is feature engineering, and why is it so important?

Feature engineering is a machine learning approach that extracts features from a raw dataset, resulting in new features. Feature engineering ranges from basic transformations such as aggregations to more advanced feature transformations such as word embeddings produced by machine learning algorithms. The goal of feature engineering is to essentially create a better dataset to improve the performance of machine learning algorithms.

Data scientists spend considerable time in the feature engineering phase, generating valuable features. I still recall my first lesson in Artificial Intelligence, when I heard of the classic machine learning cliché, “Garbage in, Garbage out!”. You can easily draw parallels to this saying when it comes to feature engineering. A machine learning model is only as good as the features fed into it. If you have a state of the art machine learning algorithm but poor features, then your model will perform poorly. In fact, most Kaggle grandmasters claim that the secret to winning competitions lies within the feature engineering phase!

 

4. What are feature stores?

A feature store is a centralised repository that stores curated features. It is a data management layer that allows data scientists, machine learning engineers and data engineers to collaborate, share and discover features. Another way to think about it is that feature stores interface between the raw data and the models. It takes raw data and then transforms it into features subsequently used for model training and inference. This ensures that the features used across both models are consistent.

 

5. What are the benefits of having a feature store?

To understand why feature stores are beneficial, let’s look at the machine learning lifecycle. There are 4 main phases in the machine learning lifecycle, which are listed below:

1.      Understand business objectives: Understand the problem and the business objectives

2.      Data acquisition: Acquire, explore and clean data

3.      Modelling: Build the model by selecting a suitable ML algorithm, training and evaluating the model.

4.      Production: Deploy and monitor model.

Features for a machine learning model are required in both the modelling and production (deployment) phases of the lifecycle. When we carry out the modelling, we use features that have gone through appropriate transformations. These features are then used as inputs into the machine learning model to be trained. The best model is then deployed into production for inference upon training and validation.

In production, however, machine learning engineers will have to duplicate the feature engineering steps used in the model training phase without a feature store; see illustration below. This leads to having duplicate code, one for computing features for training and another for inference. The sketch below shows how the deployment of Model 3 would have required code to have been duplicated for a machine learning pipeline without a feature store.

This causes several issues, one being that this causes more errors in code as it is a well-known fact that non-DRY code (Don't Repeat Yourself) increases the risk of getting things wrong. This ultimately will make troubleshooting errors a cumbersome process. The other problem is to do with "online/offline skew". Online/offline skew is caused by the difference between model performance during training and performance during inference by using different features. By incorporating feature stores, these problems can be quickly eliminated.

To summarise, , the following are the benefits of using a feature store:

  • Consistency between model training and inference leads to better model accuracy and the eliminating online/offline skew

  • Share and reuse features across models and teams, ensuring faster model development

  • Provide feature versioning, lineage and regulatory compliance

 

6. Do you need a feature store?

It depends! Feature stores are a must-have when working on a project where the intention is to deploy models at scale. However, if you are working within a small team and on a small project, perhaps on a proof of concept, then you could do away with a feature store.

 

7. Who are offering feature store as a tool?

If you would like to start exploring or using feature stores, the following are a few vendors listed in alphabetical order that offer the functionality.

 

Hopefully, this article will help you on your learning journey to accelerate your understanding of feature stores, their importance within a machine learning pipeline and the numerous benefit it has to offer.