...

/

Multi-Turn Document Q&A System with LlamaIndex

Multi-Turn Document Q&A System with LlamaIndex

Learn how to build a conversational assistant that answers questions about uploaded documents using memory and semantic retrieval.

In this lesson, we will build an interactive system that allows users to upload PDF documents and ask natural language questions about their content. The system will retrieve relevant information from the uploaded documents and generate accurate, conversational answers.

In addition to answering individual questions, the system will support multi-turn interactions by remembering prior queries. It will also include the ability to summarize an entire document and display internal reasoning steps—allowing developers or users to understand how each response was generated.

This type of document-aware assistant is useful in real-world scenarios such as reviewing lease agreements, insurance policies, academic syllabi, or company procedures.

Press + to interact
Application interface
Application interface

Note: This application uses RAG for retrieving document content, memory to support follow-up questions, prompt construction to combine memory and context for multi-turn interaction and summarization, and basic tracing for observability.

To implement this application, we will use the following modules and libraries:

Modules and Libraries

Library/Module

Purpose

LlamaIndex

Indexing, retrieval, memory, and LLM integration

Streamlit

Front-end interface for user interaction

Ollama

Local embedding model for document vectors

Groq

LLM backend to generate conversational responses

Let’s start implementing our application step by step.

Setting up the Streamlit interface and RAG pipeline

To make the document Q&A system interactive, we use Streamlit to build a simple web-based interface. Users can upload one or more PDF files and type natural language questions. When a question is submitted, the system retrieves relevant content from the uploaded documents and generates a response using a language model.

We start by importing the necessary libraries:

import streamlit as st
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.llms.groq import Groq
import os
import tempfile
Import required modules

Next, we initialize the language model and the embedding model. The embedding model will convert the document content into vector representations, and the language model will generate conversational answers.

# Initialize Groq LLM
llm = Groq(
model="llama3-70b-8192",
api_key="YOUR_GROQ_API_KEY" # Replace with your actual API key
)
# Initialize embedding model
embedding_model = OllamaEmbedding(model_name="nomic-embed-text")
Initialize the embedding model and LLM

Now, we set up the Streamlit interface. Display a title, a description, a file uploader for PDFs, a text input for user ...