Cleaning Data with PySpark
Mike Metzger
Data Engineering Consultant
Conditional Clauses are:
.when()
.otherwise()
.when(<if condition>, <then x>)
df.select(df.Name, df.Age, F.when(df.Age >= 18, "Adult"))
Name | Age | |
---|---|---|
Alice | 14 | |
Bob | 18 | Adult |
Candice | 38 | Adult |
Multiple .when()
df.select(df.Name, df.Age,
.when(df.Age >= 18, "Adult")
.when(df.Age < 18, "Minor"))
Name | Age | |
---|---|---|
Alice | 14 | Minor |
Bob | 18 | Adult |
Candice | 38 | Adult |
.otherwise()
is like else
df.select(df.Name, df.Age,
.when(df.Age >= 18, "Adult")
.otherwise("Minor"))
Name | Age | |
---|---|---|
Alice | 14 | Minor |
Bob | 18 | Adult |
Candice | 38 | Adult |
Cleaning Data with PySpark