Intermediate Regular Expressions in R
Angelo Zehr
Data Journalist
Regular Levenshtein distance:
stringdist(a, b, method = "lv")
Damerau-Levenshtein distance:
stringdist(a, b, method = "dl")
Optimal String Alignment distance:
stringdist(a, b, method = "osa")
qgrams("Honolulu", "Hanolulu", q = 2)
Returns:
Ho on ul no ol lu la
V1 1 1 1 1 1 2 0
V2 1 1 1 1 1 1 1
Sum of qgrams that are not shared
stringdist(a, b, method = "qgram") # equals 4
Not shared qgrams divided by total number of qgrams
stringdist(a, b, method = "jaccard") # equals 0.5
Optimal String Alignment distance
stringdist(a, b, method = "cosine") # equals 0.22
Intermediate Regular Expressions in R