Readability tests

Feature Engineering for NLP in Python

Rounak Banik

Data Scientist

Overview of readability tests

  • Determine readability of an English passage
  • Scale ranging from primary school up to college graduate level
  • A mathematical formula utilizing word, syllable and sentence count
  • Used in fake news and opinion spam detection
Feature Engineering for NLP in Python

Readability text examples

  • Flesch reading ease
  • Gunning fog index
  • Simple Measure of Gobbledygook (SMOG)
  • Dale-Chall score
Feature Engineering for NLP in Python

Readability test examples

  • Flesch reading ease
  • Gunning fog index
  • Simple Measure of Gobbledygook (SMOG)
  • Dale-Chall score
Feature Engineering for NLP in Python

Flesch reading ease

  • One of the oldest and most widely used tests
  • Greater the average sentence length, harder the text is to read
    • "This is a short sentence."
    • "This is longer sentence with more words and it is harder to follow than the first sentence."
  • Greater the average number of syllables in a word, harder the text is to read
    • "I live in my home."
    • "I reside in my domicile."
  • Higher the score, greater the readability
Feature Engineering for NLP in Python

Flesch reading ease score interpretation

Reading ease score Grade Level
90-100 5
80-90 6
70-80 7
60-70 8-9
50-60 10-12
30-50 College
0-30 College Graduate
Feature Engineering for NLP in Python

Gunning fog index

  • Developed in 1954
  • Also dependent on average sentence length
  • Greater the percentage of complex words, harder the text is to read
  • Higher the index, lesser the readability
Feature Engineering for NLP in Python

Gunning fog index interpretation

Fog index Grade level
17 College graduate
16 College senior
15 College junior
14 College sophomore
13 College freshman
12 High school senior
11 High school junior
Fog index Grade level
10 High school sophomore
9 High school freshman
8 Eighth grade
7 Seventh grade
6 Sixth grade
Feature Engineering for NLP in Python

The readability library

# Download nltk punkt module
import nltk
nltk.download('punkt_tab')
# Import the Readability class
from readability import Readability

# Create a Readability Object readability_scores = Readability(text)
# Generate scores gf = readability_scores.gunning_fog() print(gf.score())
16.26
Feature Engineering for NLP in Python

Let's practice!

Feature Engineering for NLP in Python

Preparing Video For Download...