Training and testing a classification model with scikit-learn

Introduction to Natural Language Processing in Python

Katharine Jarmul

Founder, kjamistan

Naive Bayes classifier

  • Naive Bayes Model
    • Commonly used for testing NLP classification problems
    • Basis in probability
  • Given a particular piece of data, how likely is a particular outcome?
  • Examples:
    • If the plot has a spaceship, how likely is it to be sci-fi?
    • Given a spaceship and an alien, how likely now is it sci-fi?
  • Each word from CountVectorizer acts as a feature
  • Naive Bayes: Simple and effective
Introduction to Natural Language Processing in Python

Naive Bayes with scikit-learn

from sklearn.naive_bayes import MultinomialNB

from sklearn import metrics
nb_classifier = MultinomialNB() nb_classifier.fit(count_train, y_train)
pred = nb_classifier.predict(count_test)
metrics.accuracy_score(y_test, pred)
0.85841849389820424
Introduction to Natural Language Processing in Python

Confusion matrix

metrics.confusion_matrix(y_test, pred, labels=[0,1])
array([[6410,  563],
       [ 864, 2242]])
Action Sci-Fi
Action 6410 563
Sci-Fi 864 2242
Introduction to Natural Language Processing in Python

Let's practice!

Introduction to Natural Language Processing in Python

Preparing Video For Download...