Conditional DataFrame column operations

Cleaning Data with PySpark

Mike Metzger

Data Engineering Consultant

Conditional clauses

Conditional Clauses are:

  • Inline version of if / then / else
  • .when()
  • .otherwise()
Cleaning Data with PySpark

Conditional example

.when(<if condition>, <then x>)

df.select(df.Name, df.Age, F.when(df.Age >= 18, "Adult"))

Name Age
Alice 14
Bob 18 Adult
Candice 38 Adult
Cleaning Data with PySpark

Another example

Multiple .when()

df.select(df.Name, df.Age, 
          .when(df.Age >= 18, "Adult")
          .when(df.Age < 18, "Minor"))
Name Age
Alice 14 Minor
Bob 18 Adult
Candice 38 Adult
Cleaning Data with PySpark

Otherwise

.otherwise() is like else

df.select(df.Name, df.Age,
          .when(df.Age >= 18, "Adult")
          .otherwise("Minor"))
Name Age
Alice 14 Minor
Bob 18 Adult
Candice 38 Adult
Cleaning Data with PySpark

Let's practice!

Cleaning Data with PySpark

Preparing Video For Download...