Introduction to Text Analysis in R
Maham Faisal Khan
Senior Data Science Content Developer, DataCamp
library(tidyverse)
-- Attaching packages --------------------------------------- tidyverse 1.2.1 --
v ggplot2 3.0.0 v purrr 0.2.5
v tibble 2.0.0 v dplyr 0.7.8
v tidyr 0.8.2 v stringr 1.3.1
v readr 1.1.1 v forcats 0.3.0
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
review_data <- read_csv("Roomba Reviews.csv")
review_data
# A tibble: 1,833 x 4
Date Product Stars Review
<chr> <chr> <dbl> <chr>
1 2/28/15 iRobot Roomba 650 fo… 5 You would not believe how well...
2 1/12/15 iRobot Roomba 650 fo… 4 You just walk away and it does...
3 12/26/13 iRobot Roomba 650 fo… 5 You have to Roomba proof your...
4 8/4/13 iRobot Roomba 650 fo… 3 Yes, its a fascinating, albeit...
# … with 1,829 more rows
review_data %>%
filter(product == "iRobot Roomba 650 for Pets") %>%
summarize(stars_mean = mean(stars))
# A tibble: 1 x 1
stars_mean
<dbl>
1 4.49
review_data %>%
group_by(product) %>%
summarize(stars_mean = mean(stars))
# A tibble: 2 x 2
product stars_mean
<chr> <dbl>
1 iRobot Roomba 650 for Pets 4.49
2 iRobot Roomba 880 for Pets and Allergies 4.42
review_data %>%
group_by(product) %>%
summarize(review_mean = mean(review))
Warning messages:
1: In mean.default(review) :
argument is not numeric or logical: returning NA
2: In mean.default(review) :
argument is not numeric or logical: returning NA
Introduction to Text Analysis in R