Llama Stack: From Fundamentals to Deployment/

...

Setting Up a Development Environment for Llama Stack

Understand the process of installing Llama Stack, setting up a local inference backend, and running your first test application using the Python SDK.

We'll cover the following...

Why local-first development?
Why use uv?
Step 1: Install uv
Step 2: Initialize your project environment
Step 3: Install the Llama Stack Python SDK
Step 4: Install and run Ollama for local inference
Step 5: Run the Llama Stack server with Ollama configuration
Step 6: Testing the Llama Stack server
Closing thoughts

Getting started with Llama Stack doesn’t require a GPU cluster or managed cloud infrastructure. The stack’s design philosophy encourages starting small—on your laptop, using lightweight providers—and scaling up only when your application demands it. This local-first mindset is ideal for rapid prototyping, debugging, and experimentation.

In this lesson, you’ll use uv, a fast Python package and environment manager, to set up a clean development workspace. Then you’ll install the core components of Llama Stack, set up Ollama as your inference backend, and run your first inference call through the SDK. You’ll have a working dev environment ready for more advanced builds by the end.

Why local-first development?

Llama Stack was built with local development in mind. This differentiates it from frameworks that assume access to high-powered GPUs or cloud credits. Local setups are:

Faster to iterate: You can try, break, and rerun without waiting on remote servers.
More transparent: You can access logs, models, and configurations without abstraction layers.
Easier to control: No external dependencies, rate limits, or vendor lock-in.

For this reason, our initial setup will use:

Ollama for running inference locally via Llama 3 models
Llama Stack Python SDK for interacting with the APIs

You’ll eventually be able to swap these out with remote providers, but the interface and logic will remain consistent.

Why use `uv`?

While traditional pip and venv workflows are common, uv provides a faster, more modern alternative with better dependency resolution and caching. It combines the functionality of a virtual environment manager and a Python package installer.

Press + to interact

Benefits of using uv include:

Fast dependency resolution and installation.
Automatically manages virtual environments.
Compatible with pip commands, but faster and cleaner.
Officially used in Llama Stack’s development workflows.

You’ll use uv throughout this course to install, manage, and run Llama Stack apps and providers.

The installation instructions provided here are just for reference. The setup has already been done for you on Educative!

Step 1: Install `uv`

First, install uv globally. You only need to do this once:

Press + to interact

We have already exported the environment variable for you.

Step 5: Run the Llama Stack server with Ollama configuration

Now that Ollama is running, we’ll configure Llama Stack to use it for inference.

Llama Stack runs as a server exposing multiple APIs, and you connect to it using the client SDK. We can build and run the Llama Stack server using a YAML configuration file. The YAML configuration file will allow us to customize the server to our liking; however, we can use a provided template for a quick start with Ollama.

Run the following command within an activated virtual environment to build and run the server using the Ollama template:

The command may seem long and complex, so let’s break it down:

uv run: Executes a command or script inside the managed environment.
--with llama-stack: Indicates that uv should use a tool, plugin, or environment named llama-stack.
llama stack build --template ollama --image-type venv --run: This is the actual command being run by uv. It breaks down into:
- llama: The llama CLI tool helps us set up and use Llama Stack. This was installed when we installed the llama-stack package.
- stack build: Calls a subcommand build on the stack module of llama. This builds our application stack.
- --template ollama: Specifies the template to use. Since we are using Ollama, we will build an existing template and use the template parameter to ollama.
- --image-type venv: This flag specifies the image type to use when running the stack. In this case, we set the type of the image to venv. It can also be set to be a Conda environment or a container (eg, Docker).
- --run: Once the stack is built, it immediately runs the stack.

Once you run this command, you should see logs confirming the server is running on http://0.0.0.0:8321.

You should see a ChatCompletionResponse object as a result. This object will have a lot of parameters that you may or may not be familiar with. Do not worry! We will dive into these parameters soon.

Closing thoughts

You now have a fully functional Llama Stack development environment. You’ve configured a local inference provider, connected through the SDK, and issued your first API call. Most importantly, this setup will continue to work as you introduce new components like retrieval, safety, tools, and evaluation. The goal moving forward is to build layer by layer, adding richer logic, memory, safety filters, and eventually deployment workflows. But the foundation you’ve just created will remain consistent, even as you scale up.

Getting Started with Llama Stack

Core Building Blocks: Architecture and Inference

Agents, Tools, and Retrieval with Llama Stack

Safety, Monitoring, and Evaluation

Advanced Integration and Beyond

Conclusion

Setting Up a Development Environment for Llama Stack

Why local-first development?

Why use `uv`?

Step 1: Install `uv`

Step 2: Initialize your project environment

Step 3: Install the Llama Stack Python SDK

Step 4: Install and run Ollama for local inference

Step 5: Run the Llama Stack server with Ollama configuration

Step 6: Testing the Llama Stack server

Closing thoughts

Setting Up a Development Environment for Llama Stack

Why local-first development?

Why use uv?

Step 1: Install uv

Step 2: Initialize your project environment

Step 3: Install the Llama Stack Python SDK

Step 4: Install and run Ollama for local inference

Step 5: Run the Llama Stack server with Ollama configuration

Step 6: Testing the Llama Stack server

Closing thoughts

Why use `uv`?

Step 1: Install `uv`