How to remove columns by name in Julia DataFrame
Julia is a powerful data science language known for its robust capabilities in numerical computations.
While working with DataFrames, data manipulation is inevitable. One such data manipulation is removing columns that aren’t needed in the analysis.
There are several ways of removing columns by name in a Julia DataFrame, and in this Answer, we will review a few ways to do so.
Method 1: Using the select!() statement
We can use the select!() and Not statements to specify which columns to remove. The Not statement selects all columns except the ones specified. Using the select!() statement, the original DataFrame is modified directly. This is referred to as modification in place.
Here’s the syntax for using the select!() method:
select!(df,Not([:A,:B]))
However, select() can still be used, but this method creates a copy of the original DataFrame and changes it. This method will need a variable to be assigned to it.
We can use the select() method in the following way:
df = select(df,Not([:A,:B]))
Let’s understand the select!() method using the code example below.
Example
using DataFramesdf = DataFrame(student_id=[1,2,3,4,5],name = ["Amy","Jane","John","Nancy","Peter"],marks=[50,60,40,47,30],age=[15,16,19,18,15])select!(df, Not([:"name", :"age"]))println(df)
Let’s explain the code provided above.
-
Line 1: We use the already imported
DataFrameslibrary. -
Lines 2–5: We create a DataFrame with four columns, namely
student_id,name,marks, andage, and five rows, where each row represents students’ information. -
Line 6: We use
select!()to select all columns in the DataFrame exceptnameandage. -
Line 7: We print out the modified DataFrame.
Method 2: Using select!() and setdiff()
The select!() and setdiff() are used in the following way:
select!(df, Not(setdiff(names(df), [:A, :B])))
The setdiff() function returns the set difference between two arrays, so setdiff(names(df), [:A, :B]) returns the names of all columns in df except for A and B. The Not function negates the selection, so the example above only returns columns A and B. After it, select!() modifies the original DataFrame.
Example
using DataFramesdf = DataFrame(student_id=[1,2,3,4,5],name = ["Amy","Jane","John","Nancy","Peter"],marks=[50,60,40,47,30],age=[15,16,19,18,15])select!(df, Not(setdiff(names(df), [:"name", :"age"])))println(df)
In the code above:
- Line 7: We use
select!()andsetdiff()to select all columns in the DataFrame exceptnameandage, however,Notnegates this, and instead,nameandageare the only columns returned.
Method 3: Using Not()
We can also use Not() to select the necessary columns. In this case, Not() subsets the columns we don’t need and returns the remaining columns in the DataFrame as below:
Example
using DataFramesdf = DataFrame(student_id=[1,2,3,4,5],name = ["Amy","Jane","John","Nancy","Peter"],marks=[50,60,40,47,30],age=[15,16,19,18,15])df = df[:, Not([:"name", :"age"])]println(df)
In the code above:
- Line 7: We assign
dfto the original DataFrame, selecting all columns except fornameandageusing theNot()method.
Free Resources