Evaluation metrics for text classification

Deep Learning for Text with PyTorch

Shubham Jain

Instructor

Why evaluation metrics matter

Spotlight on Book Reviews:

  • Imagine a model that assesses the sentiment of book reviews
  • The model claims a best-selling novel is poorly reviewed. Do we accept this?
  • Use evaluation metrics

Book review

Deep Learning for Text with PyTorch

Evaluation RNN Models

# Initialize model, criterion, and optimizer
rnn_model = RNNModel(input_size, hidden_size, num_layers, num_classes)
...
# Model training
for epoch in range(10): 
    outputs = rnn_model(X_train)
    ...
    print(f'Epoch: {epoch+1}, Loss: {loss.item()}')

outputs = rnn_model(X_test) _, predicted = torch.max(outputs, 1)
Deep Learning for Text with PyTorch

Accuracy

  • The ratio of correct predictions to the total predictions
from torchmetrics import Accuracy

actual = torch.tensor([0, 1, 1, 0, 1, 0]) predicted = torch.tensor([0, 0, 1, 0, 1, 1])
accuracy = Accuracy(task="binary", num_classes=2)
acc = accuracy(predicted, actual) print(f"Accuracy: {acc}")
Accuracy: 0.6666666666666666
Deep Learning for Text with PyTorch

Beyond accuracy

  • 10,000 reviews: 9,800 are positive
    • A model that always predicts positive: 98% accuracy
      • The model failed to classify negative reviews

 

  • Precision: confidence in labeling a review as negative
  • Recall: how well the model spots negative reviews
  • F1 Score: balance between precision and recall
Deep Learning for Text with PyTorch

Precision and Recall

  • Precision: correctly predicted positive observations / total predicted positives
  • Recall: correctly predicted positive observations / all observations in the positive class
from torchmetrics import Precision, Recall

precision = Precision(task="binary", num_classes=2) recall = Recall(task="binary", num_classes=2)
prec = precision(predicted, actual) rec = recall(predicted, actual)
print(f"Precision: {prec}") print(f"Recall: {rec}")
Precision: 0.6666666666666666
Recall: 0.5
Deep Learning for Text with PyTorch

Precision and Recall

Precision: 0.6666666666666666
Recall: 0.5
  • Precision: 66.66% accurately predicted as positive
  • Recall: captured 50% of positives
Deep Learning for Text with PyTorch

F1 score

  • Harmonizes precision and recall
  • Better measure for imbalanced classes
from torchmetrics import F1Score
f1 = F1Score(task="binary", num_classes=2)
f1_score = f1(predicted, actual)
print(f"F1 Score: {f1_score}")
F1 Score: 0.5714285714285715
  • F1 Score of 1 = perfect precision and recall
  • F1 Score of 0 = worst performance
Deep Learning for Text with PyTorch

Considerations

  • Multiclass cores may be identical

    • Can indicate good model performance
  • Always consider the problem when interpreting results!

Deep Learning for Text with PyTorch

Let's practice!

Deep Learning for Text with PyTorch

Preparing Video For Download...