Working with many tidy models

Case Study: Exploratory Data Analysis in R

Dave Robinson

Chief Data Scientist, DataCamp

We have a model for each country

country_coefficients
# A tibble: 399 × 6
       country        term      estimate    std.error statistic      p.value
         <chr>       <chr>         <dbl>        <dbl>     <dbl>        <dbl>
1  Afghanistan (Intercept) -11.063084650 1.4705189228 -7.523252 1.444892e-08
2  Afghanistan        year   0.006009299 0.0007426499  8.091698 3.064797e-09
3    Argentina (Intercept)  -9.464512565 2.1008982371 -4.504984 8.322481e-05
4    Argentina        year   0.005148829 0.0010610076  4.852773 3.047078e-05
5    Australia (Intercept)  -4.545492536 2.1479916283 -2.116159 4.220387e-02
6    Australia        year   0.002567161 0.0010847910  2.366503 2.417617e-02
7      Belarus (Intercept)  -7.000692717 1.5024232546 -4.659601 5.329950e-05
8      Belarus        year   0.003907557 0.0007587624  5.149908 1.284924e-05
9      Belgium (Intercept)  -5.845534016 1.5153390521 -3.857575 5.216573e-04
10     Belgium        year   0.003203234 0.0007652852  4.185673 2.072981e-04
# ... with 389 more rows
Case Study: Exploratory Data Analysis in R

Filter for the year term (slope)

country_coefficients %>%
  filter(term == "year")
# A tibble: 199 × 6
                           country  term    estimate    std.error statistic      p.value
                             <chr> <chr>       <dbl>        <dbl>     <dbl>        <dbl>
1                      Afghanistan  year 0.006009299 0.0007426499  8.091698 3.064797e-09
2                        Argentina  year 0.005148829 0.0010610076  4.852773 3.047078e-05
3                        Australia  year 0.002567161 0.0010847910  2.366503 2.417617e-02
4                          Belarus  year 0.003907557 0.0007587624  5.149908 1.284924e-05
5                          Belgium  year 0.003203234 0.0007652852  4.185673 2.072981e-04
6  Bolivia, Plurinational State of  year 0.005802864 0.0009657515  6.008651 1.058595e-06
7                           Brazil  year 0.006107151 0.0008167736  7.477164 1.641169e-08
8                           Canada  year 0.001515867 0.0009552118  1.586943 1.223590e-01
9                            Chile  year 0.006775560 0.0008220463  8.242310 2.045608e-09
10                        Colombia  year 0.006157755 0.0009645084  6.384346 3.584226e-07
# ... with 189 more rows
  • Multiple hypothesis correction because some p-values will be less than .05 by chance
Case Study: Exploratory Data Analysis in R

Filtered by adjusted p-value

country_coefficients %>%
  filter(term == "year") %>%
  filter(p.adjust(p.value) < .05)
# A tibble: 61 × 6
                           country  term    estimate    std.error statistic      p.value
                             <chr> <chr>       <dbl>        <dbl>     <dbl>        <dbl>
1                      Afghanistan  year 0.006009299 0.0007426499  8.091698 3.064797e-09
2                        Argentina  year 0.005148829 0.0010610076  4.852773 3.047078e-05
3                          Belarus  year 0.003907557 0.0007587624  5.149908 1.284924e-05
4                          Belgium  year 0.003203234 0.0007652852  4.185673 2.072981e-04
5  Bolivia, Plurinational State of  year 0.005802864 0.0009657515  6.008651 1.058595e-06
6                           Brazil  year 0.006107151 0.0008167736  7.477164 1.641169e-08
7                            Chile  year 0.006775560 0.0008220463  8.242310 2.045608e-09
8                         Colombia  year 0.006157755 0.0009645084  6.384346 3.584226e-07
9                       Costa Rica  year 0.006539273 0.0008119113  8.054171 3.391094e-09
10                            Cuba  year 0.004610867 0.0007205029  6.399512 3.431579e-07
Case Study: Exploratory Data Analysis in R

Let's practice!

Case Study: Exploratory Data Analysis in R

Preparing Video For Download...