...

Columns

Get hands-on practice performing various operations on the columns of a DataFrame.

We'll cover the following...

Listing all columns
Accessing a single column
Manipulating columns
Sorting columns

Spark allows us to manipulate individual DataFrame columns using relational or computational expressions. Conceptually, columns represent a type of field and are similar to columns in pandas, R DataFrames, or relational tables. Columns are represented by the type Column in Spark’s supported languages. Let’s see some examples of working with columns next.

Listing all columns

We’ll assume we have already read the file BollywoodMovieDetail.csv in the DataFrame variable movies as shown in the previous lesson. We can list the columns as follows:

scala> movies.columns
res2: Array[String] = Array(imdbId, title, releaseYear, releaseDate, genre, writers, actors, directors, sequel, hitFlop)

Accessing a single column

We can access a particular column from the DataFrame, by using the col() method, which is a standard built-in function that ...

Spark Overview

DataFrames

Datasets

Spark SQL

Summary

Columns

Listing all columns

Accessing a single column