Question similarity and grammatical correctness

Natural Language Processing (NLP) in Python

Fouad Trad

Machine Learning Engineer

Question similarity

  • Identifies when two questions are paraphrases
  • Useful for:
    • Deduplication
    • Clustering similar questions
    • Improving search accuracy
  • Done with models trained on the Quora Question Pairs (QQP) dataset

Image showing three people asking questions.

Natural Language Processing (NLP) in Python

QQP pipeline

from transformers import pipeline

qqp_pipeline = pipeline( task="text-classification", model="textattack/bert-base-uncased-QQP" )
question1 = "How can I learn Python?" question2 = "What is the best way to study Python?"
result = qqp_pipeline({"text": question1, "text_pair": question2})
print(result)
{'label': 'LABEL_1', 'score': 0.6853412985801697}
Natural Language Processing (NLP) in Python

QQP pipeline

from transformers import pipeline
qqp_pipeline = pipeline(
    task="text-classification", 
    model="textattack/bert-base-uncased-QQP"
    )
question1 = "How can I learn Python?"
question2 = "What is the capital of France?"
result = qqp_pipeline({"text": question1, "text_pair": question2})
print(result)
{'label': 'LABEL_0', 'score': 0.9999338388442993}
Natural Language Processing (NLP) in Python

Assessing grammatical correctness

  • Assess how much a text is grammatically correct
  • Useful for:

    • Educational tools
    • Grammar checkers
    • Writing assistants
  • Done with models trained on the Corpus of Linguistic Acceptability (CoLA) dataset

Image showing a person assessing the correctness of a written text.

Natural Language Processing (NLP) in Python

CoLA pipeline

from transformers import pipeline
cola_classifier = pipeline(
  task="text-classification", 
  model="textattack/distilbert-base-uncased-CoLA"
)

result = cola_classifier("The cat sat on the mat.")
print(result)
[{'label': 'LABEL_1', 'score': 0.9918296933174133}]
Natural Language Processing (NLP) in Python

CoLA pipeline

from transformers import pipeline
cola_classifier = pipeline(
  task="text-classification", 
  model="textattack/distilbert-base-uncased-CoLA"
)
result = cola_classifier("The cat on sat mat the.")
print(result)
[{'label': 'LABEL_0', 'score': 0.9628171324729919}]
Natural Language Processing (NLP) in Python

Let's practice!

Natural Language Processing (NLP) in Python

Preparing Video For Download...