Feature stores process data from multiple data sources and turn it into features. The machine learning models in the training pipelines then use these features for model training. The use of feature stores is becoming increasingly popular in machine learning and MLOps.
The common architecture of a feature store is as follows:
Using feature stores can be quite beneficial for us. We primarily use feature stores as they provide:
Collaboration: Different ML teams can share the centralized data features and use them to develop different ML models.
Reusability: Teams can reuse the features instead of creating them from scratch and apply them across problems as well.
Consistency: Using feature stores ensures consistent model performance. It eliminates the problem of offline vs. online performance of the model.
Accelerate: With the power of reusability and automated pipelines, teams can accelerate the model development process.
The type of store we use can depend on the application. There are two major types of feature stores:
Offline stores: Features that don’t change too often and hence don’t need to be updated in real time. For instance, the number of students in a class doesn’t have to be updated regularly. This data is usually stored in data warehouses like IBM Cloud storage or AWS S3 bucket.
Online stores: Features that change rapidly and must be updated in real time. For instance, the stock market data must be updated in real time. This data is usually stored in a database such as Redis or CassandraDB for fast access. It can be stored in real-time data warehouses as well.
Note: Data warehousing is the collection and integration of data from multiple sources. Want to read more about it? Check out our What are the stages of data warehousing? Answer.
Feature stores are now being used more than ever, which means there are several feature stores out in the market, including:
Feathr: A simple and scalable architecture to create and share feature stores.
Hopsworks: Provides a centralized platform for storing and managing features.
Databricks feature store: A cloud-based platform provided by Microsoft Azure used for feature versioning, data exploration and feature standardization.
AWS Sage Maker: Cloud-based platform provided by Amazon Web Services (AWS) to manage features in a centralized location.
Tecton: Enables data scientists to focus on developing ML models and not worry about feature engineering.
Feature stores are becoming increasingly popular in the field of machine learning and data science. Reusability and consistency are the key benefits of using a feature store which can accelerate your development process.
Free Resources