Cleaning Data with PySpark
Mike Metzger
Data Engineering Consultant
Conditional Clauses are:
.when().otherwise().when(<if condition>, <then x>)
df.select(df.Name, df.Age, F.when(df.Age >= 18, "Adult"))
| Name | Age | |
|---|---|---|
| Alice | 14 | |
| Bob | 18 | Adult |
| Candice | 38 | Adult |
Multiple .when()
df.select(df.Name, df.Age,
.when(df.Age >= 18, "Adult")
.when(df.Age < 18, "Minor"))
| Name | Age | |
|---|---|---|
| Alice | 14 | Minor |
| Bob | 18 | Adult |
| Candice | 38 | Adult |
.otherwise() is like else
df.select(df.Name, df.Age,
.when(df.Age >= 18, "Adult")
.otherwise("Minor"))
| Name | Age | |
|---|---|---|
| Alice | 14 | Minor |
| Bob | 18 | Adult |
| Candice | 38 | Adult |
Cleaning Data with PySpark