Search⌘ K
AI Features

Working with Different Providers and Models

Explore how to leverage Llama Stack's provider abstraction to integrate different AI providers and models. Understand configuring and customizing your setup to enable local development and cloud deployment with seamless switching. Gain skills to optimize costs, performance, and scalability by defining agents with varied backends and managing complex workflows.

Deploying AI applications effectively means navigating a diverse landscape of cost, latency, and hardware considerations. You might begin testing on a local CPU and then need the performance of hosted GPUs for production. Perhaps embeddings can run offline to save costs, while core inference tasks require cloud scalability, or you need to switch between these configurations seamlessly. This constant re-evaluation and adaptation can be a major development bottleneck.

Llama Stack simplifies this complexity. Instead of rewriting your application for each new setup, it allows you to abstract these infrastructure differences through a system of providers and distributions. You define your application's needs (like inference, retrieval, or safety), and Llama Stack manages which underlying system fulfills those needs, streamlining your path from development to deployment.

Generating a configuration

We’ll start by initializing a fresh Llama Stack distribution using the CLI:

llama stack build

This will open an interactive session where we can pick and choose our providers for the various APIs. In the ...