Code generation and explanation with LLMs
Using LLMs to generate code snippets can greatly assist developers in their coding. While LLMs cannot perform as well as a senior engineer (yet), they can be used to create code skeletons or provide useful starting points.
To make the most of this approach, start by setting the context with a clear problem statement or outline of your required code. Proceed by asking specific questions about the code you need, such as function definitions, loop structures, or syntax usage.
If the initial response is unsatisfactory, provide additional context or ask follow-up questions to guide the LLM in the right direction. Once the model generates a code snippet, review it for correctness and adapt it to your requirements.
Note: While LLMs can be a valuable resource, applying your programming expertise to validate and refine the generated snippets is essential.
Our task
Let's try to generate some code using an LLM. For this example, we will be using ChatGPT and asking the LLM to help us read the following CSV file in Python.
Name,Age,CityJohn,25,New YorkEmily,30,LondonMichael,35,ParisSophia,28,TokyoDaniel,32,SydneyOlivia,27,BerlinDavid,29,TorontoEmma,31,RomeLiam,26,Madrid
The first prompt
A prompt is an input or query that guides the model to generate a relevant response or output. Let’s start with a short and simple prompt.
Prompt: How can I read a CSV file in Python?
Output:
import csvfilename = "data.csv"with open(filename, "r") as file:reader = csv.reader(file)for row in reader:# Process each row hereprint(row)
The model returned some generic code, which was expected. Let's provide additional context to the model to fill in the # Process each row here comment. We will pass it to our CSV file as context. While this may work in our case since our file is small, for larger files, sending just the header and their data types might be better.
Newer models such as Google’s Gemini offer context windows of up to 2 million tokens! This can allow us to send entire code bases as context!
A more contextual prompt
Prompt: Process the following CSV file to extract the name and city for each row:
Name,Age,CityJohn,25,New YorkEmily,30,LondonMichael,35,ParisSophia,28,TokyoDaniel,32,SydneyOlivia,27,BerlinDavid,29,TorontoEmma,31,RomeLiam,26,Madrid
Output:
import csvfilename = "data.csv"# Reading data from CSV filewith open(filename, "r") as file:reader = csv.DictReader(file)for row in reader:name = row["Name"city = row["City"]print(f"Name: {name}, City: {city}")
The LLM has used the additional context we provided to process each row based on our requirements. The code appears to be correct; however, let’s validate the code by running it.
While LLMs typically generate safe code, it is good practice to review the code thoroughly before execution. The generated be unsafe or could result in potentially harmful or unexpected behavior.
Validating the code
We have added the generated code in the widget below. Check to see if it works by using the “Run” button.
import csvfilename = "data.csv"# Reading data from CSV filewith open(filename, "r") as file:reader = csv.DictReader(file)for row in reader:name = row["Name"]city = row["City"]print(f"Name: {name}, City: {city}")
Voila! The code is working as expected and printing the names and cities. We can also tune the output format as per our needs.
Generating a code explanation
Modern LLMs excel in code generation and code explanation as well. Let’s prompt the LLM to generate the explanation for the code it generated.
Prompt: Please explain the following code line by line:
import csvfilename = "data.csv"# Reading data from CSV filewith open(filename, "r") as file:reader = csv.DictReader(file)for row in reader:name = row["Name"]city = row["City"]print(f"Name: {name}, City: {city}")
Output:
Certainly! Here's an explanation of each line of the code:
import csv: This line imports thecsvmodule, which provides functionality for working with CSV files in Python.filename = "data.csv": This line assigns the filename “data.csv” to the variablefilename. This is the name of the CSV file that we want to read.with open(filename, "r") as file: This line opens the CSV file specified by thefilenamevariable in read mode ("r"). Thewithstatement ensures that the file is properly closed after it is processed, even if there are exceptions.reader = csv.DictReader(file): This line creates aDictReaderobject calledreaderusing thecsv.DictReaderclass. TheDictReadertreats the first row of the CSV file as the column headers and allows us to access the data using column names as keys.for row in reader: This line starts a loop that iterates over each row in the CSV file. Thereaderobject acts as an iterator, and each iteration provides a dictionary-likerowobject representing a row of data.name = row["Name"]: This line retrieves the value of the “Name” column from the currentrowusing the column name as the key. It assigns the value to the variablename.city = row["City"]: This line retrieves the value of the “City” column from the currentrowusing the column name as the key. It assigns the value to the variablecity.print(f"Name: {name}, City: {city}"): This line prints the extracted values ofnameandcityusing an f-string. It displays the name and city information for each row in the CSV file.
By executing this code, it reads the CSV file, iterates through each row, extracts the “Name” and “City” values from each row, and prints them out in the specified format.
As you can see, the LLM is able to generate a very verbose explanation for the code. Let's try a more concise explanation prompt.
Prompt: Please give a brief overview (50 words) of the following code:
import csvfilename = "data.csv"# Reading data from CSV filewith open(filename, "r") as file:reader = csv.DictReader(file)for row in reader:name = row["Name"]city = row["City"]print(f"Name: {name}, City: {city}")
Output:
This code imports the CSV module and assigns a file name data.csv. It then opens the file and reads its contents using the DictReader function from the CSV module. The code iterates over each row in the file and extracts the values of the “Name” and “City” columns. It then prints the name and city for each row in the file.
Conclusion
Code generation with LLMs involves giving the LLM a prompt describing the desired code’s functionality. The LLM then generates code that (hopefully) matches your description. Learning how to use prompts effectively is a key skill when it comes to using LLMs. It's like giving an AI assistant instructions on what program to write. This can save time for developers and even inspire new coding ideas. However, LLMs are still under development, so generated code might require review and refinement for accuracy and security.
Free Resources