How to implement the SmartDataframe of PandasAI

PandasAI is a Python library that extends the capabilities of pandas by providing natural language processing (NLP) capabilities. It uses a large language model (LLM) to generate Python code to answer questions about data, perform data analysis, and generate visualizations. In this answer, we will learn how to use PandasAI for data analysis with a dataframe.

What is SmartDataframe?

SmartDataframe is a class in PandasAI that provides a high-level interface to the library. It allows users to interact with their data in natural language to answer their questions or perform the desired task. We can interact with it in natural language to answer questions about our data, perform data analysis, and generate visualizations.

Implementation

Let’s see the implementation of SmartDataframe in Python to interact with it in natural language.

import pandas as pd
from pandasai import SmartDataframe

# SampleDataFrame
df = {
    "Movie Title": ["The Shawshank Redemption", "The Godfather", "Pulp Fiction", "The Dark Knight", "Forrest Gump", "Inception", "Schindler's List", "The Matrix", "Fight Club", "The Lord of the Rings: The Fellowship of the Ring"],
    "Year": [1994, 1972, 1994, 2008, 1994, 2010, 1993, 1999, 1999, 2001],
    "IMDb Rating": [9.3, 9.2, 8.9, 9.0, 8.8, 8.8, 8.9, 8.7, 8.8, 8.8],
    "Runtime (minutes)": [142, 175, 154, 152, 142, 148, 195, 136, 139, 178],
    "Genre": ["Drama", "Crime", "Crime", "Action", "Drama", "Action", "Biography", "Action", "Drama", "Adventure"]
}

from pandasai.llm import OpenAI
llm = OpenAI(api_token="OpenAI_API_key")

df = SmartDataframe(df, config={"llm": llm})
answer = df.chat('What are the five best movies?')
print(answer)
Implementation of SmartDataframe

Note: Make sure to replace OPENAI_API_KEY with your actual OpenAI API key.

Code explanation

Line 2: We import SmartDataframe from pandasai to answer our questions i.e., for data analysis.

Lines 5–11: We create a sample dataframe of movies including its IMDb rating, its Genre, its Runtime (minutes) and its Year of release.

Lines 12–13: We import and initialize the OpenAI language model (referred to as llm here) from the pandasai.llm module.

Lines 15–17: We instantiated a SmartDataframe object to interact with it in natural language to answer questions about our data.

Now, let's try a new prompt and observe the response generated by the LLM of PandasAI.

import pandas as pd
from pandasai import SmartDataframe

# SampleDataFrame
df = {
    "Movie Title": ["The Shawshank Redemption", "The Godfather", "Pulp Fiction", "The Dark Knight", "Forrest Gump", "Inception", "Schindler's List", "The Matrix", "Fight Club", "The Lord of the Rings: The Fellowship of the Ring"],
    "Year": [1994, 1972, 1994, 2008, 1994, 2010, 1993, 1999, 1999, 2001],
    "IMDb Rating": [9.3, 9.2, 8.9, 9.0, 8.8, 8.8, 8.9, 8.7, 8.8, 8.8],
    "Runtime (minutes)": [142, 175, 154, 152, 142, 148, 195, 136, 139, 178],
    "Genre": ["Drama", "Crime", "Crime", "Action", "Drama", "Action", "Biography", "Action", "Drama", "Adventure"]
}

from pandasai.llm import OpenAI
llm = OpenAI(api_token="OpenAI_API_key")

df = SmartDataframe(df, config={"llm": llm})
answer = df.chat('Which is the second best movie of Crime genre?')
print(answer)
Implementation with a dataframe
Copyright ©2024 Educative, Inc. All rights reserved