Streamlined Data Ingestion with pandas
Amany Mahfouz
Instructor
pandas
-specific structure for two-dimensional datapandas
-specific structure for two-dimensional datapandas
-specific structure for two-dimensional datapandas
function to load them all: read_csv()
us_tax_data_2016.csv
STATEFIPS,STATE,zipcode,agi_stub,...,N11901,A11901,N11902,A11902
1,AL,0,1,...,63420,51444,711580,1831661
import pandas as pd
tax_data = pd.read_csv("us_tax_data_2016.csv")
tax_data.head(4)
STATEFIPS STATE zipcode agi_stub ... N11901 A11901 N11902 A11902
0 1 AL 0 1 ... 63420 51444 711580 1831661
1 1 AL 0 2 ... 74090 110889 416090 1173463
2 1 AL 0 3 ... 64000 143060 195130 543284
3 1 AL 0 4 ... 45020 128920 117410 381329
[4 rows x 147 columns]
sep
us_tax_data_2016.tsv
STATEFIPS STATE zipcode agi_stub ... N11901 A11901 N11902 A11902
1 AL 0 1 ... 63420 51444 711580 1831661
import pandas as pd
tax_data = pd.read_csv("us_tax_data_2016.tsv", sep="\t")
tax_data.head(3)
STATEFIPS STATE zipcode agi_stub ... N11901 A11901 N11902 A11902
0 1 AL 0 1 ... 63420 51444 711580 1831661
1 1 AL 0 2 ... 74090 110889 416090 1173463
2 1 AL 0 3 ... 64000 143060 195130 543284
[3 rows x 147 columns]
Streamlined Data Ingestion with pandas