Creating a DataFrame

Intermediate Python for Finance

Kennedy Behrman

Data Engineer, Author, Founder

Pandas

import pandas as pd
print(pd)
<module 'pandas' from '.../pandas/__init__.py'>
Intermediate Python for Finance

Pandas DataFrame

pd.DataFrame()
Intermediate Python for Finance

Pandas DataFrame

Col 1 Col 2 Col 3
0 v1 a 00
1 v2 b 01
2 v3 c 13.02
Intermediate Python for Finance

From dict

data = {'Bank Code': ['BA', 'AAD', 'BA'],
        'Account#': ['ajfdk2', '1234nmk', 'mm3d90'],
        'Balance':[1222.00, 390789.11, 13.02]}
df = pd.DataFrame(data=data)
Intermediate Python for Finance

From dict

data = {'Bank Code': ['BA', 'AAD', 'BA'],
        'Account#': ['ajfdk2', '1234nmk', 'mm3d90'],
        'Balance':[1222.00, 390789.11, 13.02]}
df = pd.DataFrame(data=data)
Bank Code Account# Balance
0 BA ajfdk2 1222.00
1 AAD 1234nmk 390789.11
1 BA mm3d90 13.02
Intermediate Python for Finance

From list of dicts

data = [{'Bank Code': 'BA', 'Account#': 'ajfdk2',  'Balance': 1222.00},
        {'Bank Code': 'AAD', 'Account#': '1234nmk', 'Balance': 390789.11},
        {'Bank Code': 'BA', 'Account#': 'mm3d90', 'Balance': 13.02}]
df = pd.DataFrame(data=data)
Intermediate Python for Finance

From list of dicts

data = [{'Bank Code': 'BA', 'Account#': 'ajfdk2',  'Balance': 1222.00},
        {'Bank Code': 'AAD', 'Account#': '1234nmk', 'Balance': 390789.11},
        {'Bank Code': 'BA', 'Account#': 'mm3d90', 'Balance': 13.02}]
df = pd.DataFrame(data=data)
Bank Code Account# Balance
0 BA ajfdk2 1222.00
1 AAD 1234nmk 390789.11
1 BA mm3d90 13.02
Intermediate Python for Finance

From list of lists

data = [['BA',  'ajfdk2',  1222.00],
        ['AAD', '1234nmk', 390789.11],
        ['BA',  'mm3d90',  13.02]]      
df = pd.DataFrame(data=data)
Intermediate Python for Finance

From list of lists

data = [['BA',  'ajfdk2',  1222.00],
        ['AAD', '1234nmk', 390789.11],
        ['BA',  'mm3d90',  13.02]]      
df = pd.DataFrame(data=data)
0 1 2
0 BA ajfdk2 1222.00
1 AAD 1234nmk 390789.11
1 BA mm3d90 13.02
Intermediate Python for Finance

From list of lists with column names

data = [['BA',  'ajfdk2',  1222.00],
        ['AAD', '1234nmk', 390789.11],
        ['BA',  'mm3d90',  13.02]]      
columns = ['Bank Code', 'Account#', 'Balance']
df = pd.DataFrame(data=data, columns=columns)
Bank Code Account# Balance
0 BA ajfdk2 1222.00
1 AAD 1234nmk 390789.11
1 BA mm3d90 13.02
Intermediate Python for Finance

From list of lists with column names

data = [['BA',  'ajfdk2',  1222.00],
        ['AAD', '1234nmk', 390789.11],
        ['BA',  'mm3d90',  13.02]]      
columns = ['Bank Code', 'Account#', 'Balance']
df = pd.DataFrame(data=data, columns=columns)
Bank Code Account# Balance
0 BA ajfdk2 1222.00
1 AAD 1234nmk 390789.11
2 BA mm3d90 13.02
Intermediate Python for Finance

Reading data

  • Excel pd.read_excel
  • JSON pd.read_json
  • HTML pd.read_html
  • Pickle pd.read_pickle
  • Sql pd.read_sql
  • Csv pd.read_csv
Intermediate Python for Finance

CSV

Comma separated values

client id,trans type, amount
14343,buy,23.0
0574,sell,2000
7093,dividend,2234
Intermediate Python for Finance

Reading a csv file

df = pd.read_csv('/data/daily/transactions.csv')
Intermediate Python for Finance

Reading a csv file

df = pd.read_csv('/data/daily/transactions.csv')
client id trans type amount
14343 buy 23.0
0574 sell 2000
7093 dividend 2234
Intermediate Python for Finance

Non-comma csv

client id|trans type| amount
14343|buy|23.0
0574|sell|2000
7093|dividend|2234
Intermediate Python for Finance

Non-comma csv

df = pd.read_csv('/data/daily/transactions.csv', sep='|')
Intermediate Python for Finance

Non-comma csv

df = pd.read_csv('/data/daily/transactions.csv', sep='|')
client id trans type amount
14343 buy 23.0
0574 sell 2000
7093 dividend 2234
Intermediate Python for Finance

Let's practice!

Intermediate Python for Finance

Preparing Video For Download...