Predicting and evaluating

Introduction to Spark SQL in Python

Mark Plutowski

Data Scientist

Applying a model to evaluation data

predicted = df_trained.transform(df_test)

x = predicted.first
print("Right!" if x.label == int(x.prediction) else "Wrong")

model_stats = model.evaluate(df_eval)

type(model_stats)

pyspark.ml.classification.BinaryLogisticRegressionSummary)

print("\nPerformance: %.2f" % model_stats.areaUnderROC)

Positive labels:
- ['her', 'him', 'he', 'she', 'them', 'us', 'they', 'himself', 'herself', 'we']
Number of examples: 5746
Number of examples: 2873 positive, 2873 negative
Number of training examples: 4607
Number of test examples: 1139
training iterations: 21
Test AUC: 0.87

Introduction to Spark SQL in Python