Importing flat files using NumPy

Introduction to Importing Data in Python

Hugo Bowne-Anderson

Data Scientist at DataCamp

Why NumPy?

  • NumPy arrays: standard for storing numerical data

 

ch_1_3.003.png

Introduction to Importing Data in Python

Why NumPy?

  • NumPy arrays: standard for storing numerical data
  • Essential for other packages: e.g. scikit-learn ch_1_3.004.png
  • loadtxt()
  • genfromtxt()
Introduction to Importing Data in Python

Importing flat files using NumPy

import numpy as np
filename = 'MNIST.txt'
data = np.loadtxt(filename, delimiter=',')
data
[[   0.    0.    0.    0.    0.]
 [  86.  250.  254.  254.  254.]
 [   0.    0.    0.    9.  254.]
 ..., 
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]]
Introduction to Importing Data in Python

Customizing your NumPy import

import numpy as np
filename = 'MNIST_header.txt'
data = np.loadtxt(filename, delimiter=',', skiprows=1)
print(data)
[[   0.    0.    0.    0.    0.]
 [  86.  250.  254.  254.  254.]
 [   0.    0.    0.    9.  254.]
 ..., 
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]
 [   0.    0.    0.    0.    0.]]
  • skiprows: how many rows (not indices) you wish to skip
Introduction to Importing Data in Python

Customizing your NumPy import

import numpy as np
filename = 'MNIST_header.txt'
data = np.loadtxt(filename, delimiter=',', skiprows=1, usecols=[0, 2])
print(data)
[[   0.    0.]
 [  86.  254.]
 [   0.    0.]
 ..., 
 [   0.    0.]
 [   0.    0.]
 [   0.    0.]]
  • usecols: list of the indices of the columns you wish to keep
Introduction to Importing Data in Python

Customizing your NumPy import

data = np.loadtxt(filename, delimiter=',', dtype=str)
Introduction to Importing Data in Python

Mixed datatypes

titanic.csv

                        Name      Sex  Cabin   Fare
     Braund, Mr. Owen Harris     male    NaN    7.3
  Cumings, Mrs. John Bradley   female    C85   71.3
      Heikkinen, Miss. Laina   female    NaN    8.0
Futrelle, Mrs. Jacques Heath   female   C123   53.1
    Allen, Mr. William Henry     male    NaN   8.05


1 Source: Kaggle
Introduction to Importing Data in Python

Mixed datatypes

titanic.csv

                        Name      Sex  Cabin   Fare
     Braund, Mr. Owen Harris     male    NaN    7.3
  Cumings, Mrs. John Bradley   female    C85   71.3
      Heikkinen, Miss. Laina   female    NaN    8.0
Futrelle, Mrs. Jacques Heath   female   C123   53.1
    Allen, Mr. William Henry     male    NaN   8.05
               ^                                 ^
            strings                           floats
1 Source: Kaggle
Introduction to Importing Data in Python

Let's practice!

Introduction to Importing Data in Python

Preparing Video For Download...