Custom Fuzzy Matching

Intermediate Regular Expressions in R

Angelo Zehr

Data Journalist

Combining two fuzzy matches

movie tables

Intermediate Regular Expressions in R

Combining two fuzzy matches

columns highlighted

Intermediate Regular Expressions in R

Fuzzy matches: Helper functions

For the string comparison:

small_str_distance <- function(left, right) {
  stringdist(left, right) <= 5
}

For the number comparison:

close_to_each_other <- function(left, right) {
  abs(left - right) <= 3
}
Intermediate Regular Expressions in R

The fuzzy join

fuzzy_left_join(
  a, b,
  by = c(
    "title" = "prod_title",
    "year" = "prod_year"
  ),
  match_fun = c(
    "title" = small_str_distance,
    "year" = close_to_each_other
  )
)
Intermediate Regular Expressions in R

The fuzzy join: The result

joined table

Intermediate Regular Expressions in R

Let's practice!

Intermediate Regular Expressions in R

Preparing Video For Download...