Intermediate Importing Data in R
Filip Schouwenaars
Instructor, DataCamp
haven
Hadley Wickham
Goal: consistent, easy, fast
foreign
R Core Team
Support for many data formats
SAS, STATA and SPSS
ReadStat: C library by Evan Miller
Extremely simple to use
Single argument: path to file
Result: R data frame
install.packages("haven")
library(haven)
ontime.sas7bdat
read_sas()
ontime <- read_sas("ontime.sas7bdat")
ontime <- read_sas("ontime.sas7bdat")
str(ontime)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 10 obs. of 4 variables:
$ Airline : atomic TWA Southwest Northwest ...
..- attr(*, "label")= chr "Airline"
$ March_1999 : atomic 84.4 80.3 80.8 72.7 78.7 ...
..- attr(*, "label")= chr "March 1999"
$ June_1999 : atomic 69.4 77 75.1 65.1 72.2 ...
..- attr(*, "label")= chr "June 1999"
$ August_1999: atomic 85 80.4 81 78.3 77.7 75.1 ...
..- attr(*, "label")= chr "August 1999"
ontime <- read_sas("ontime.sas7bdat")
ontime
Airline March_1999 June_1999 August_1999
1 TWA 84.4 69.4 85.0
2 Southwest 80.3 77.0 80.4
3 Northwest 80.8 75.1 81.0
4 American 72.7 65.1 78.3
5 Delta 78.7 72.2 77.7
6 Continental 79.3 68.4 75.1
7 United 78.6 69.2 71.6
8 US Airways 73.6 68.9 70.1
9 Alaska 71.9 75.4 64.4
10 American West 76.5 70.3 62.5
ontime <- read_sas("ontime.sas7bdat")
ontime <- read_sas("ontime.sas7bdat")
ontime <- read_sas("ontime.sas7bdat")
STATA 13
& STATA 14
read_stata()
, read_dta()
ontime <- read_stata("ontime.dta")
ontime <- read_dta("ontime.dta")
ontime
Airline March_1999 June_1999 August_1999
1 8 84.4 69.4 85.0
2 7 80.3 77.0 80.4
3 6 80.8 75.1 81.0
4 2 72.7 65.1 78.3
5 5 78.7 72.2 77.7
6 4 79.3 68.4 75.1
7 9 78.6 69.2 71.6
8 10 73.6 68.9 70.1
9 1 71.9 75.4 64.4
10 3 76.5 70.3 62.5
ontime <- read_stata("ontime.dta")
ontime <- read_dta("ontime.dta")
# R version of common data structure
class(ontime$Airline)
"labelled"
ontime$Airline
<Labelled>
8 7 6 2 5 4 9 10 1 3
attr(,"label")
"Airline"
Labels:
Alaska American American West ... US Airways
1 2 3 ... 10
ontime <- read_stata("ontime.dta")
ontime <- read_dta("ontime.dta")
as_factor(ontime$Airline)
TWA Southwest Northwest American ... American West
Levels: Alaska American American West ... US Airways
as.character(as_factor(ontime$Airline))
"TWA" "Southwest" "Northwest" ... "American West"
ontime$Airline <- as.character(as_factor(ontime$Airline))
ontime
Airline March_1999 June_1999 August_1999
1 TWA 84.4 69.4 85.0
2 Southwest 80.3 77.0 80.4
3 Northwest 80.8 75.1 81.0
4 American 72.7 65.1 78.3
5 Delta 78.7 72.2 77.7
6 Continental 79.3 68.4 75.1
7 United 78.6 69.2 71.6
8 US Airways 73.6 68.9 70.1
9 Alaska 71.9 75.4 64.4
10 American West 76.5 70.3 62.5
read_spss()
.por -> read_por()
.sav -> read_sav()
read_sav(file.path("~","datasets","ontime.sav"))
Airline Mar.99 Jun.99 Aug.99
1 8 84.4 69.4 85.0
2 7 80.3 77.0 80.4
3 6 80.8 75.1 81.0
4 2 72.7 65.1 78.3
5 5 78.7 72.2 77.7
...
10 3 76.5 70.3 62.5
Intermediate Importing Data in R