The importance of flat files in data science

Introduction to Importing Data in Python

Hugo Bowne-Anderson

Data Scientist at DataCamp

Flat files

titanic.csv

PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
Introduction to Importing Data in Python

Flat files

titanic.csv

PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked

1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S

2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C

3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S

arrow

                        Name      Sex  Cabin  Survived
     Braund, Mr. Owen Harris     male    NaN         0
  Cumings, Mrs. John Bradley   female    C85         1
      Heikkinen, Miss. Laina   female    NaN         1
Futrelle, Mrs. Jacques Heath   female   C123         1
    Allen, Mr. William Henry     male    NaN         0
Introduction to Importing Data in Python

Flat files

titanic.csv

titanic.csv with a row highlighted arrow

                        Name      Sex  Cabin  Survived
     Braund, Mr. Owen Harris     male    NaN         0
  Cumings, Mrs. John Bradley   female    C85         1
      Heikkinen, Miss. Laina   female    NaN         1
Introduction to Importing Data in Python

Flat files

titanic.csv

titanic.csv with a column highlighted arrow

                        Name      Sex  Cabin  Survived
     Braund, Mr. Owen Harris     male    NaN         0
  Cumings, Mrs. John Bradley   female    C85         1
      Heikkinen, Miss. Laina   female    NaN         1
Introduction to Importing Data in Python

Flat files

  • Text files containing records
  • That is, table data
  • Record: row of fields or attributes

 

titanic.csv

PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
Introduction to Importing Data in Python

Flat files

  • Text files containing records
  • That is, table data
  • Record: row of fields or attributes
  • Column: feature or attribute

titanic.csv

titanic.csv with a row highlighted arrow

Introduction to Importing Data in Python

Flat files

  • Text files containing records
  • That is, table data
  • Record: row of fields or attributes
  • Column: feature or attribute

titanic.csv

titanic.csv with a column highlighted arrow

Introduction to Importing Data in Python

Header

titanic.csv

________________________________________________________________________
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
________________________________________________________________________
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S   
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
Introduction to Importing Data in Python

Header

titanic.csv

________________________________________________________________________
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
________________________________________________________________________
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S   
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
Introduction to Importing Data in Python

File extension

  • .csv - Comma separated values
  • .txt - Text file
  • commas, tabs - Delimiters
Introduction to Importing Data in Python

Tab-delimited file

MNIST.txt

pixel149    pixel150    pixel151    pixel152    pixel153
0           0           0           0           0    
86          250         254         254         254    
0           0           0           9           254    
0           0           0           0           0    
103         253         253         253         253    
0           0           0           0           0    
0           0           0           0           0        
0           0           0           0           41        
253         253         253         253         253    
Introduction to Importing Data in Python

Tab-delimited file

MNIST.txt

pixel149    pixel150    pixel151    pixel152    pixel153
0           0           0           0           0    
86          250         254         254         254    
0           0           0           9           254    
0           0           0           0           0    
103         253         253         253         253    
0           0           0           0           0    
0           0           0           0           0        
0           0           0           0           41        
253         253         253         253         253    

                                         MNIST image: mnist.png

Introduction to Importing Data in Python

How do you import flat files?

  • Two main packages: NumPy, pandas

ch_1_2.026.png

  • Here, you’ll learn to import:
    • Flat files with numerical data (MNIST)
    • Flat files with numerical data and strings (titanic.csv)
Introduction to Importing Data in Python

Let's practice!

Introduction to Importing Data in Python

Preparing Video For Download...