Code generation and explanation with LLMs

Using LLMs to generate code snippets can greatly assist developers in their coding. While LLMs cannot perform as well as a senior engineer (yet), they can be used to create code skeletons or provide useful starting points.

To make the most of this approach, start by setting the context with a clear problem statement or outline of your required code. Proceed by asking specific questions about the code you need, such as function definitions, loop structures, or syntax usage.

If the initial response is unsatisfactory, provide additional context or ask follow-up questions to guide the LLM in the right direction. Once the model generates a code snippet, review it for correctness and adapt it to your requirements.

Note: While LLMs can be a valuable resource, applying your programming expertise to validate and refine the generated snippets is essential.

Our task

Let's try to generate some code using an LLM. For this example, we will be using ChatGPT and asking the LLM to help us read the following CSV file in Python.

Name,Age,City
John,25,New York
Emily,30,London
Michael,35,Paris
Sophia,28,Tokyo
Daniel,32,Sydney
Olivia,27,Berlin
David,29,Toronto
Emma,31,Rome
Liam,26,Madrid
The CSV file that we want to read

The first prompt

A prompt is an input or query that guides the model to generate a relevant response or output. Let’s start with a short and simple prompt.

Prompt: How can I read a CSV file in Python?

Output:

import csv
filename = "data.csv"
with open(filename, "r") as file:
reader = csv.reader(file)
for row in reader:
# Process each row here
print(row)
Ouput code for the first prompt

The model returned some generic code, which was expected. Let's provide additional context to the model to fill in the # Process each row here comment. We will pass it to our CSV file as context. While this may work in our case since our file is small, for larger files, sending just the header and their data types might be better.

Newer models such as Google’s Gemini offer context windows of up to 2 million tokens! This can allow us to send entire code bases as context!

A more contextual prompt

Prompt: Process the following CSV file to extract the name and city for each row:

Name,Age,City
John,25,New York
Emily,30,London
Michael,35,Paris
Sophia,28,Tokyo
Daniel,32,Sydney
Olivia,27,Berlin
David,29,Toronto
Emma,31,Rome
Liam,26,Madrid

Output:

import csv
filename = "data.csv"
# Reading data from CSV file
with open(filename, "r") as file:
reader = csv.DictReader(file)
for row in reader:
name = row["Name"
city = row["City"]
print(f"Name: {name}, City: {city}")
Output code for the second prompt

The LLM has used the additional context we provided to process each row based on our requirements. The code appears to be correct; however, let’s validate the code by running it.

While LLMs typically generate safe code, it is good practice to review the code thoroughly before execution. The generated be unsafe or could result in potentially harmful or unexpected behavior.

Validating the code

We have added the generated code in the widget below. Check to see if it works by using the “Run” button.

main.py
data.csv
import csv
filename = "data.csv"
# Reading data from CSV file
with open(filename, "r") as file:
reader = csv.DictReader(file)
for row in reader:
name = row["Name"]
city = row["City"]
print(f"Name: {name}, City: {city}")

Voila! The code is working as expected and printing the names and cities. We can also tune the output format as per our needs.

Generating a code explanation

Modern LLMs excel in code generation and code explanation as well. Let’s prompt the LLM to generate the explanation for the code it generated.

Prompt: Please explain the following code line by line:

import csv
filename = "data.csv"
# Reading data from CSV file
with open(filename, "r") as file:
reader = csv.DictReader(file)
for row in reader:
name = row["Name"]
city = row["City"]
print(f"Name: {name}, City: {city}")

Output:

Certainly! Here's an explanation of each line of the code:

  1. import csv: This line imports the csv module, which provides functionality for working with CSV files in Python.

  2. filename = "data.csv": This line assigns the filename “data.csv” to the variable filename. This is the name of the CSV file that we want to read.

  3. with open(filename, "r") as file: This line opens the CSV file specified by the filename variable in read mode ("r"). The with statement ensures that the file is properly closed after it is processed, even if there are exceptions.

  4. reader = csv.DictReader(file): This line creates a DictReader object called reader using the csv.DictReader class. The DictReader treats the first row of the CSV file as the column headers and allows us to access the data using column names as keys.

  5. for row in reader: This line starts a loop that iterates over each row in the CSV file. The reader object acts as an iterator, and each iteration provides a dictionary-like row object representing a row of data.

  6. name = row["Name"]: This line retrieves the value of the “Name” column from the current row using the column name as the key. It assigns the value to the variable name.

  7. city = row["City"]: This line retrieves the value of the “City” column from the current row using the column name as the key. It assigns the value to the variable city.

  8. print(f"Name: {name}, City: {city}"): This line prints the extracted values of name and city using an f-string. It displays the name and city information for each row in the CSV file.

By executing this code, it reads the CSV file, iterates through each row, extracts the “Name” and “City” values from each row, and prints them out in the specified format.


As you can see, the LLM is able to generate a very verbose explanation for the code. Let's try a more concise explanation prompt.

Prompt: Please give a brief overview (50 words) of the following code:

import csv
filename = "data.csv"
# Reading data from CSV file
with open(filename, "r") as file:
reader = csv.DictReader(file)
for row in reader:
name = row["Name"]
city = row["City"]
print(f"Name: {name}, City: {city}")

Output:

This code imports the CSV module and assigns a file name data.csv. It then opens the file and reads its contents using the DictReader function from the CSV module. The code iterates over each row in the file and extracts the values of the “Name” and “City” columns. It then prints the name and city for each row in the file.


Conclusion

Code generation with LLMs involves giving the LLM a prompt describing the desired code’s functionality. The LLM then generates code that (hopefully) matches your description. Learning how to use prompts effectively is a key skill when it comes to using LLMs. It's like giving an AI assistant instructions on what program to write. This can save time for developers and even inspire new coding ideas. However, LLMs are still under development, so generated code might require review and refinement for accuracy and security.

Copyright ©2024 Educative, Inc. All rights reserved