How to insert column in Julia DataFrame at a particular position
A DataFrame is a two-dimensional size-mutable tabular data structure similar to a table or spreadsheet. A DataFrame is mainly used for storing a set of related data values of various data types in tabular form with labeled rows and columns. Each row represents an observation or a data point. A column represents an observation's properties or attributes.
In Julia's ecosystem, the DataFrames.jl is a fast, powerful, flexible, and easy-to-use open-source data analysis and manipulation package. It provides numerous functions and methods for working with tabular data. This makes it a great general-purpose tool for data wrangling and analysis.
Column insertion in DataFrame
The addition of a new column to an existing DataFrame is a common data analysis operation. In this context, the method insertcols! of the DataFrames.jl package comes in handy when it is required to insert a column at a specific position or index.
Syntax
The syntax of the insertcol! method is the following:
insertcols!(df, col, (name=>val); after=false,makeunique=false, copycols=true)
The main parameters of this function are listed below.
df: The DataFrame to be processed.col: The position where we want to insert the new column.name: The name of the column to add.val: Either an including the contents of the new column or a value of any type other thanAbstractVector Supertype for one-dimensional arrays. which will be repeated to fill a new vector.AbstractArray Supertype for N-dimensional arrays after: When true, the new columns are inserted aftercol.makeunique: Specify the action to be taken if the new column name already exists in the DataFrame. When set to false, an error will be thrown otherwise a unique name will be allotted to the added column by appending
a suffix to it.copycols: Specify whether vectors should be copied when they are passed as columns.
Let's go over the following examples showing how to insert a column in a Julia DataFrame at a particular position.
Code example 1
This example illustrates how to insert a new column containing values at a specific position in a DataFrame.
using DataFramesdf = DataFrame(id = [1,2,3],name = ["Bassem","Celeste","Dominique"],salary = [3000,5000,4000])print("The Dataframe before adding the new column:")print(df)arr = [45,35,28]insertcols!(df, 3, :age => arr)print("\n The Dataframe after adding the new column:")print(df)
Code explanation
Let's examine the code widget above:
Lines 3–7: Create a simple DataFrame of 3 rows containing the employee's information.
Lines 9–10: Display a message and print out the created DataFrame.
Line 12: Define a vector containing the values of the column to add.
Line 13: Invoke the function
insertcols!to create a new column namedageat the third position (starting index = 1) while specifying as values for this newly added column the vector defined earlier.Lines 15-16: Display a message and print out the processed DataFrame.
Code example 2
This example explores additional parameters of the function insertcols!
using DataFramesdf = DataFrame(id = [1,2,3],name = ["Bassem","Celeste","Dominique"],salary = [3000,5000,4000])print("The Dataframe before adding the new column:")print(df)rows = size(df)[1]insertcols!(df,1, :id => 1:rows, after=true, makeunique=true)print("\n The Dataframe after adding the new column:")print(df)
Code explanation
The code widget above looks like the one formerly explained except for:
Line 12: Get the number of rows of the DataFrame previously defined.
Line 13: Execute the function
insertcols!while specifying as parameters:df: The DataFrame is already defined.1: The index of the new column.:id => 1:rows: The new column name and its respective content.after=true: The new column will be added after the specified index (after the first column).makeunique=true: Since the DataFrame already includes a column having the same name as the new column to be addedid, therefore a new name will be automatically assigned to the new column.
Code example 3
This example demonstrates how to insert an empty column at a specific position in a DataFrame.
using DataFramesdf = DataFrame(id = [1,2,3],name = ["Bassem","Celeste","Dominique"],salary = [3000,5000,4000])print("The Dataframe before adding the new column:")print(df)insertcols!(df, 3, :age => missings(Int64))print("\n The Dataframe after adding the new column:")print(df)
Code explanation
The code widget above resembles the one previously elaborated except for:
Line 12: When calling the function
insertcols!we specified as an argumentmissings(Int64)for the columnageto be inserted. This means that the new column will be of typeInt64and initialized with the valuemissing.
Free Resources