Tekst opschonen en preprocessen

Text mining met bag-of-words in R

Ted Kwartler

Instructor

Veelgebruikte preprocessing-functies

veelgebruikte preprocessor-functies

Preprocessing in de praktijk

preprocessing

# Make a vector source: coffee_source
coffee_source <- VectorSource(coffee_tweets)

# Make a volatile corpus: coffee_corpus
coffee_corpus <- VCorpus(coffee_source)

# Apply various preprocessing functions
tm_map(coffee_corpus, removeNumbers)
tm_map(coffee_corpus, removePunctuation)

tm_map(coffee_corpus, content_transformer(replace_abbreviation))

Nog een stap: woordstammen (stemming)

# Stem words
stem_words <- stemDocument(c("complicatedly", "complicated","complication"))
stem_words

"complic" "complic" "complic"

# Complete words using single word dictionary
stemCompletion(stem_words, c("complicate"))

     complic      complic      complic 
"complicate" "complicate" "complicate"

# Complete words using entire corpus
stemCompletion(stem_words, my_corpus)

Laten we oefenen!

Text mining met bag-of-words in R