Practice Using ChromaDB for Multimodal Embeddings
Learn to use an open-source Chroma vector database to store and query data.
So far in this chapter, we’ve explored vector databases and their importance in efficiently storing and retrieving high-dimensional data. In this lesson, we’ll dive deeper into using an open-source vector database by practicing with Chroma DB. Having the same
Import necessary libraries and modules
First of all, we import chromadb
to manage embeddings and collections.
We can generate embeddings outside the Chroma or use embedding functions from the Chroma’s embedding_functions
module. We have already explored the first way, and luckily, Chroma supports multimodal embedding functions, enabling the embedding of data from various modalities into a unified embedding space. So, we’ll utilize the multimodal embedding model from Chroma’s embedding_functions
module to generate embeddings for our multimodal data. To do this, we import OpenCLIPEmbeddingFunction
from chromadb.utils.embedding_functions
.
We’ll store embedding to Chroma while our data is placed outside the Chroma. For data placed outside the Chroma, Chroma provides data loaders for loading and saving that data via URIs. Chroma does not store such data directly; instead, it stores the URI and loads the data from the URI as needed. So, to use this data when needed by Chroma, we import ImageLoader
from chromadb.utils.data_loaders
.
We import the os
module for interacting with the operating system, particularly for file handling and pandas
for data manipulation and loading CSV files.
Get hands-on with 1400+ tech skills courses.