Exploratory Data Analysis in R
Andrew Bray
Assistant Professor, Reed College
str(cars)
'data.frame': 428 obs. of 19 variables:
$ name : chr "Chevrolet Aveo 4dr" "Chevrolet Aveo LS 4dr hatch" "Chevrolet Cavalier 2dr" ...
$ sports_car : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ suv : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ wagon : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ minivan : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ pickup : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ all_wheel : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ rear_wheel : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ msrp : int 11690 12585 14610 14810 16385 13670 15040 13270 13730 15460 ...
$ dealer_cost: int 10965 11802 13697 13884 15357 12849 14086 12482 12906 14496 ...
$ eng_size : num 1.6 1.6 2.2 2.2 2.2 2 2 2 2 2 ...
$ ncyl : int 4 4 4 4 4 4 4 4 4 4 ...
$ horsepwr : int 103 103 140 140 140 132 132 130 110 130 ...
$ city_mpg : int 28 28 26 26 26 29 29 26 27 26 ...
$ hwy_mpg : int 34 34 37 37 37 36 36 33 36 33 ...
$ weight : int 2370 2348 2617 2676 2617 2581 2626 2612 2606 2606 ...
$ wheel_base : int 98 98 104 104 104 105 105 103 103 103 ...
$ length : int 167 153 183 183 183 174 174 168 168 168 ...
$ width : int 66 66 69 68 69 67 67 67 67 67 ...
ggplot(data, aes(x = weight)) +
geom_dotplot(dotsize = 0.4)
ggplot(data, aes(x = weight)) +
geom_histogram()
ggplot(data, aes(x = weight)) +
geom_density()
ggplot(data, aes(x = weight)) +
geom_density()
ggplot(data, aes(x = weight)) +
geom_density()
ggplot(data, aes(x = 1, y = weight)) +
geom_boxplot() +
coord_flip()
ggplot(data, aes(x = 1, y = weight)) +
geom_boxplot() +
coord_flip()
ggplot(data, aes(x = 1, y = weight)) +
geom_boxplot() +
coord_flip()
ggplot(data, aes(x = 1, y = weight)) +
geom_boxplot() +
coord_flip()
ggplot(cars, aes(x = hwy_mpg)) +
geom_histogram() +
facet_wrap(~pickup)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning message:
Removed 14 rows containing non-finite values (stat_bin).
ggplot(cars, aes(x = hwy_mpg)) +
geom_histogram() +
facet_wrap(~pickup)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning message:
Removed 14 rows containing non-finite values (stat_bin).
ggplot(cars, aes(x = hwy_mpg)) +
geom_histogram() +
facet_wrap(~pickup)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning message:
Removed 14 rows containing non-finite values (stat_bin).
Exploratory Data Analysis in R