Membersihkan Data di R
Maggie Matsui
Content Developer @ DataCamp



Berapa jarak antara typhoon dan baboon?
Berapa banyak typo untuk mengubah satu string ke string lain?

Berapa banyak typo untuk mengubah satu string ke string lain?

Berapa banyak typo untuk mengubah satu string ke string lain?

Berapa banyak typo untuk mengubah satu string ke string lain?









Total: 4

Mana yang terbaik?
library(stringdist)
stringdist("baboon",
"typhoon",
method = "dl")
4

# LCS
stringdist("baboon", "typhoon",
method = "lcs")
7
# Jaccard
stringdist("baboon", "typhoon",
method = "jaccard")
0.75
"EU", "eur", "Europ" $\rightarrow$ "Europe""EU", "eur", "Europ", "Europa", "Erope", "Evropa", ... $\rightarrow$ "Europe"?survey
city move_score
1 chicgo 4
2 los angles 4
3 chicogo 5
4 new yrk 5
5 new yoork 2
6 seatttle 3
7 losangeles 4
8 seeatle 2
...
cities
city
1 new york
2 chicago
3 los angeles
4 seattle
library(fuzzyjoin)
stringdist_left_join(survey, cities, by = "city", method = "dl")
city.x move_score city.y
1 chicgo 4 chicago
2 los angles 4 los angeles
3 chicogo 5 chicago
4 new yrk 5 new york
5 new yoork 2 new york
6 seatttle 3 seattle
7 losangeles 4 los angeles
8 seeatle 2 seattle
9 siattle 1 seattle
...
stringdist_left_join(survey, cities, by = "city", method = "dl", max_dist = 1)
city.x move_score city.y
1 chicgo 4 chicago
2 los angles 4 los angeles
3 chicogo 5 chicago
4 new yrk 5 new york
5 new yoork 2 new york
6 seatttle 3 seattle
7 losangeles 4 los angeles
8 seeatle 2 <NA>
9 siattle 1 seattle
...
Membersihkan Data di R