Case Study: Exploratory Data Analysis in R
Dave Robinson
Chief Data Scientist, DataCamp
afghanistan <- by_year_country %>%
filter(country == "Afghanistan")
afghanistan
# A tibble: 34 × 4
year country total percent_yes
<dbl> <chr> <int> <dbl>
1 1947 Afghanistan 34 0.3823529
2 1949 Afghanistan 51 0.6078431
3 1951 Afghanistan 25 0.7600000
4 1953 Afghanistan 26 0.7692308
5 1955 Afghanistan 37 0.7297297
6 1957 Afghanistan 34 0.5294118
7 1959 Afghanistan 54 0.6111111
8 1961 Afghanistan 76 0.6052632
9 1963 Afghanistan 32 0.7812500
10 1965 Afghanistan 40 0.8500000
# ... with 24 more rows
model <- lm(percent_yes ~ year, data = afghanistan)
summary(model)
Call:
lm(formula = percent_yes ~ year, data = afghanistan)
Residuals:
Min 1Q Median 3Q Max
-0.254667 -0.038650 -0.001945 0.057110 0.140596
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.106e+01 1.471e+00 -7.523 1.44e-08 ***
year 6.009e-03 7.426e-04 8.092 3.06e-09 ***
<hr />
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.08497 on 32 degrees of freedom
Multiple R-squared: 0.6717,\tAdjusted R-squared: 0.6615
F-statistic: 65.48 on 1 and 32 DF, p-value: 3.065e-09
positive slope
3e-09 = .000000003
Case Study: Exploratory Data Analysis in R