Introduction to Spark SQL in Python
Mark Plutowski
Data Scientist
df_true = df.where("endword in ('she', 'he', 'hers', 'his', 'her', 'him')")\
.withColumn('label', lit(1))
df_false = df.where("endword not in ('she', 'he', 'hers', 'his', 'her', 'him')")\
.withColumn('label', lit(0))
df_examples = df_true.union(df_false)
df_train, df_eval = df_examples.randomSplit((0.60, 0.40), 42)
from pyspark.ml.classification import LogisticRegression
logistic = LogisticRegression(maxIter=50, regParam=0.6, elasticNetParam=0.3)
model = logistic.fit(df_train)
print("Training iterations: ", model.summary.totalIterations)
Introduction to Spark SQL in Python