Inference
Explore the process of inference in machine learning, focusing on scaling prediction workloads using aggregators and worker pools. Understand strategies for serving multiple models, managing data distribution shifts, and applying techniques like Thompson Sampling to balance exploration and exploitation in dynamic environments.
Inference
Inference is the process of using a trained machine learning model to make a prediction. Below are some of the techniques to scale inference in the production environment.
1. Imbalance workload
- During inference, one common pattern is to split workloads onto multiple inference servers. We use similar architecture in Load Balancers. It is also sometimes called an Aggregator Service.
-
Clients (upstream process) send requests to the ...