Using mappers to clean up your data

Intermediate Functional Programming with purrr

Colin Fay

Data Scientist & R Hacker at ThinkR

Setting the name of your objects

set_names(): sets the names of an unnamed list

names(visits2016)

length(visits2016)
month.abb
NULL

12
"Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
visits2016 <- set_names(visits2016, month.abb)
names(visits2016)
"Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
Intermediate Functional Programming with purrr

Setting names with map():

all_visits <- list(visits2015, visits2016, visits2017)
named_all_visits <- map(all_visits, ~ set_names(.x, month.abb))
names(named_all_visits[[1]])
"Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
names(named_all_visits[[2]])
"Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
names(named_all_visits[[3]])
"Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
Intermediate Functional Programming with purrr

keep()

keep(): extract elements that satisfy a condition

# Which month has received more than 30000 visits? 
over_30000 <- keep(visits2016, ~ sum(.x) > 30000)
names(over_30000)
"Jan" "Mar" "Apr" "May" "Jul" "Aug" "Oct" "Nov"
limit <- as_mapper(~ sum(.x) > 30000)
# Which month has received more than 30000 visits? 
over_mapper <- keep(visits2016, limit)
names(over_mapper)
"Jan" "Mar" "Apr" "May" "Jul" "Aug" "Oct" "Nov"
Intermediate Functional Programming with purrr

discard()

discard(): remove elements that satisfy a condition

# Which month has received less than 30000 visits? 
under_30000 <- discard(visits2016, ~ sum(.x) > 30000)
names(under_30000)
"Feb" "Jun" "Sep" "Dec"
limit <- as_mapper(~ sum(.x) > 30000)
# Which month has received less than 30000 visits? 
under_mapper <- discard(visits2016, limit)
names(under_mapper)
"Feb" "Jun" "Sep" "Dec"
Intermediate Functional Programming with purrr

keep(), discard(), and map()

Using map() & keep() :

df_list <- list(iris, airquality) %>% map(head)
map(df_list, ~ keep(.x, is.factor))
[[1]]
  Species
1  setosa
2  setosa
3  setosa
4  setosa
5  setosa
6  setosa

[[2]]
data frame with 0 columns and 6 rows
Intermediate Functional Programming with purrr

Let's practice!

Intermediate Functional Programming with purrr

Preparing Video For Download...