DataFrame is one of the most popular data structures that helps users manipulate data easily. When we read data into a DataFrame, it will be structured with columns and rows, making it easy to analyze and work with.
In Julia, several ways exist to select only a subset of DataFrame columns, which we will cover in this Answer.
We can select a subset of columns using their actual column names, as shown below:
df = df[:,[:"A",:"B"]]
The above code selects columns with names A
and B
from df
.
using DataFramesdf = DataFrame(student_id=[1,2,3,4,5],name = ["Amy","Jane","John","Nancy","Peter"],marks=[50,60,40,47,30],age=[15,16,19,18,15])df = df[:,[:"name",:"age"]]println(df)
Let’s explain the code provided above.
Line 1: We upload the already imported library DataFrames
.
Lines 2–5: We create a DataFrame
consisting of four columns and five rows, each containing students’ information.
Line 7: We select the DataFrame columns name
and age
only and assign the DataFrame to a new one named df
.
Line 8: We print the new DataFrame.
We can select a subset of columns by specifying their index numbers. Here’s an example:
df = df[:,[1,3]]
The code df = df[:, [1, 3]]
selects the columns with index 1 and 3 from the DataFrame df
. The resulting DataFrame will only contain those selected columns, creating a subset of the original DataFrame.
using DataFramesdf = DataFrame(student_id=[1,2,3,4,5],name = ["Amy","Jane","John","Nancy","Peter"],marks=[50,60,40,47,30],age=[15,16,19,18,15])df = df[:,[1,3]]println(df)
student_id
) and 3 (marks
) and return a new DataFrame with only these columns. We assign this DataFrame to a new one also named df
.select()
or select!()
We can also use select()
or select!()
functions to select a subset of DataFrame columns, as explained below.
Option 1
select!(df, [:"A", :"B"]))
The select!()
function selects the columns A
and B
and then modifies the original DataFrame,df
. This is referred to as modifying in place.
Option 2
df = select(df,[:"A",:"B"])
The select()
function selects columns A and B from the original DataFrame and creates a copy. We can assign this new DataFrame to a separate variable named df
.
using DataFramesdf = DataFrame(student_id=[1,2,3,4,5],name = ["Amy","Jane","John","Nancy","Peter"],marks=[50,60,40,47,30],age=[15,16,19,18,15])#using selectdf1 = select(df,[:"student_id",:"marks"])println(df1)println("-------------------------")#using select!select!(df, [:"name", :"age"])println(df)
Let’s explain the code provided above.
Lines 8–9: We use select()
to subset the columns and assign the new DataFrame to a variable named df1
and then we print out df1
.
Lines 13–14: We use select!()
to select columns name
and age
. select!()
modifies the original DataFrame, df
, so no variable assignment is needed. We then print out the new df
.
We can use boolean indexing, where we specify True
or False
values, to subset columns in a DataFrame.
df = df[:,[true,false]]
The code above selects 1 out of the 2 columns of the DataFrame.
using DataFramesdf = DataFrame(student_id=[1,2,3,4,5],name = ["Amy","Jane","John","Nancy","Peter"],marks=[50,60,40,47,30],age=[15,16,19,18,15])df= df[:,[true,false,true,true]]println(df)
Line 7: We use boolean indexing to select three columns, where true
returns the column and false
omits the column. Consequently, we choose only the columns student_id
, marks
, and age
. The resulting DataFrame is then assigned to a new variable, also named df
.
Free Resources