How to select a subset of DataFrame columns in Julia

DataFrame is one of the most popular data structures that helps users manipulate data easily. When we read data into a DataFrame, it will be structured with columns and rows, making it easy to analyze and work with.

In Julia, several ways exist to select only a subset of DataFrame columns, which we will cover in this Answer.

Method 1: Using column names

We can select a subset of columns using their actual column names, as shown below:

df = df[:,[:"A",:"B"]]

The above code selects columns with names A and B from df.

Example

Explanation

Let’s explain the code provided above.

Line 1: We upload the already imported library DataFrames.
Lines 2–5: We create a DataFrame consisting of four columns and five rows, each containing students’ information.
Line 7: We select the DataFrame columns name and age only and assign the DataFrame to a new one named df.
Line 8: We print the new DataFrame.

Method 2: Using column index

We can select a subset of columns by specifying their index numbers. Here’s an example:

df = df[:,[1,3]]

The code df = df[:, [1, 3]] selects the columns with index 1 and 3 from the DataFrame df. The resulting DataFrame will only contain those selected columns, creating a subset of the original DataFrame.

Example

Explanation

Line 7: We select the columns at index 1 (student_id) and 3 (marks) and return a new DataFrame with only these columns. We assign this DataFrame to a new one also named df.

Method 3: Using `select()` or `select!()`

We can also use select() or select!() functions to select a subset of DataFrame columns, as explained below.

Option 1

select!(df, [:"A", :"B"]))

The select!() function selects the columns A and B and then modifies the original DataFrame,df. This is referred to as modifying in place.

Option 2

df = select(df,[:"A",:"B"])

The select() function selects columns A and B from the original DataFrame and creates a copy. We can assign this new DataFrame to a separate variable named df.

Example

Explanations

Let’s explain the code provided above.

Lines 8–9: We use select() to subset the columns and assign the new DataFrame to a variable named df1 and then we print out df1.
Lines 13–14: We use select!() to select columns name and age. select!() modifies the original DataFrame, df, so no variable assignment is needed. We then print out the new df.

Method 4: Using boolean indexing

We can use boolean indexing, where we specify True or False values, to subset columns in a DataFrame.

df = df[:,[true,false]]

The code above selects 1 out of the 2 columns of the DataFrame.

How to select a subset of DataFrame columns in Julia

Method 1: Using column names

Example

Explanation

Method 2: Using column index

Example

Explanation

Method 3: Using select() or select!()

Example

Explanations

Method 4: Using boolean indexing

Example

Explanation

Method 3: Using `select()` or `select!()`