Streamlined Data Ingestion with pandas
Amany Mahfouz
Instructor

pandas-specific structure for two-dimensional data
pandas-specific structure for two-dimensional data
pandas-specific structure for two-dimensional data
pandas function to load them all: read_csv()us_tax_data_2016.csvSTATEFIPS,STATE,zipcode,agi_stub,...,N11901,A11901,N11902,A11902
1,AL,0,1,...,63420,51444,711580,1831661
import pandas as pdtax_data = pd.read_csv("us_tax_data_2016.csv")tax_data.head(4)
   STATEFIPS STATE  zipcode  agi_stub   ...     N11901  A11901  N11902   A11902
0          1    AL        0         1   ...      63420   51444  711580  1831661
1          1    AL        0         2   ...      74090  110889  416090  1173463
2          1    AL        0         3   ...      64000  143060  195130   543284
3          1    AL        0         4   ...      45020  128920  117410   381329
[4 rows x 147 columns]
sepus_tax_data_2016.tsvSTATEFIPS    STATE    zipcode    agi_stub    ...    N11901    A11901    N11902    A11902
1    AL    0    1    ...    63420    51444    711580    1831661
import pandas as pdtax_data = pd.read_csv("us_tax_data_2016.tsv", sep="\t")tax_data.head(3)
   STATEFIPS STATE  zipcode  agi_stub   ...     N11901  A11901  N11902   A11902
0          1    AL        0         1   ...      63420   51444  711580  1831661
1          1    AL        0         2   ...      74090  110889  416090  1173463
2          1    AL        0         3   ...      64000  143060  195130   543284
[3 rows x 147 columns]
Streamlined Data Ingestion with pandas