Inference

Learn common techniques to scale inference in production environments.

Inference

Inference is the process of using a trained machine learning model to make a prediction. Below are some of the techniques to scale inference in the production environment.

1. Imbalance workload

  • During inference, one common pattern is to split workloads onto multiple inference servers. We use similar architecture in Load Balancers. It is also sometimes called an Aggregator Service.

Get hands-on with 1200+ tech skills courses.