Data opschonen met PySpark
Mike Metzger
Data Engineering Consultant
User-defined functions of UDF's
pyspark.sql.functions.udf-methodeDefinieer een Python-methode
def reverseString(mystr):
return mystr[::-1]
Wikkel de functie en sla op als variabele
udfReverseString = udf(reverseString, StringType())
Gebruik met Spark
user_df = user_df.withColumn('ReverseName',
udfReverseString(user_df.Name))
def sortingCap():
return random.choice(['G', 'H', 'R', 'S'])
udfSortingCap = udf(sortingCap, StringType())
user_df = user_df.withColumn('Class', udfSortingCap())
| Name | Age | Class |
|---|---|---|
| Alice | 14 | H |
| Bob | 18 | S |
| Candice | 63 | G |
Data opschonen met PySpark