Iteration mit pandas DataFrame

Effizienten Python-Code schreiben

Logan Thomas

Scientific Software Technical Trainer, Enthought

pandas Zusammenfassung

  • Schau dir die Übersicht zu Pandas in Python für Fortgeschrittene an.
  • Bibliothek, die für die Datenanalyse genutzt wird.
  • Die Hauptdatenstruktur ist der DataFrame.
    • Tabellarische Daten mit beschrifteten Zeilen und Spalten
    • Auf der NumPy-Array-Struktur aufgebaut
  • Kapitelziel:
    • Bewährte Vorgehensweisen, um über einen Pandas-DataFrame zu iterieren.
Effizienten Python-Code schreiben

Baseball-Statistiken

import pandas as pd

baseball_df = pd.read_csv('baseball_stats.csv')
print(baseball_df.head())
  Team League  Year   RS   RA   W    G  Playoffs
0  ARI     NL  2012  734  688  81  162         0
1  ATL     NL  2012  700  600  94  162         1
2  BAL     AL  2012  712  705  93  162         1
3  BOS     AL  2012  734  806  69  162         0
4  CHC     NL  2012  613  759  61  162         0
Effizienten Python-Code schreiben

Baseball-Statistiken

  Team
0  ARI     
1  ATL     
2  BAL     
3  BOS     
4  CHC

alt=”Arizona Diamondbacks logo with text ARI underneath, Atlanta Braves logo with text ATL underneath, Baltimore Orioles logo with text BAL underneath, Boston Red Sox logo with BOS underneath, and Chicago Cubs logo with CHC underneath”

Effizienten Python-Code schreiben

Baseball-Statistiken

  Team League  Year   RS   RA   W    G  Playoffs
0  ARI     NL  2012  734  688  81  162         0
1  ATL     NL  2012  700  600  94  162         1
2  BAL     AL  2012  712  705  93  162         1
3  BOS     AL  2012  734  806  69  162         0
4  CHC     NL  2012  613  759  61  162         0
Effizienten Python-Code schreiben

Gewinnquote berechnen

import numpy as np

def calc_win_perc(wins, games_played):

    win_perc = wins / games_played

    return np.round(win_perc,2)
win_perc = calc_win_perc(50, 100)
print(win_perc)
0.5
Effizienten Python-Code schreiben

Gewinnquote zum DataFrame hinzufügen

win_perc_list = []

for i in range(len(baseball_df)): row = baseball_df.iloc[i]
wins = row['W'] games_played = row['G']
win_perc = calc_win_perc(wins, games_played)
win_perc_list.append(win_perc)
baseball_df['WP'] = win_perc_list
Effizienten Python-Code schreiben

Gewinnquote zum DataFrame hinzufügen

print(baseball_df.head())
  Team League  Year   RS   RA   W    G  Playoffs    WP
0  ARI     NL  2012  734  688  81  162         0  0.50
1  ATL     NL  2012  700  600  94  162         1  0.58
2  BAL     AL  2012  712  705  93  162         1  0.57
3  BOS     AL  2012  734  806  69  162         0  0.43
4  CHC     NL  2012  613  759  61  162         0  0.38
Effizienten Python-Code schreiben

Iterieren mit .iloc

%%timeit
win_perc_list = []

for i in range(len(baseball_df)):
    row = baseball_df.iloc[i]

    wins = row['W']
    games_played = row['G']

    win_perc = calc_win_perc(wins, games_played)
    win_perc_list.append(win_perc)

baseball_df['WP'] = win_perc_list
183 ms ± 1.73 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Effizienten Python-Code schreiben

Mit .iterrows() iterieren

win_perc_list = []

for i,row in baseball_df.iterrows():

wins = row['W'] games_played = row['G'] win_perc = calc_win_perc(wins, games_played) win_perc_list.append(win_perc) baseball_df['WP'] = win_perc_list
Effizienten Python-Code schreiben

Mit .iterrows() iterieren

%%timeit
win_perc_list = []

for i,row in baseball_df.iterrows():

    wins = row['W']
    games_played = row['G']

    win_perc = calc_win_perc(wins, games_played)
    win_perc_list.append(win_perc)

baseball_df['WP'] = win_perc_list
95.3 ms ± 3.57 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Effizienten Python-Code schreiben

Probiere mit .iterrows() über einen DataFrame zu iterieren

Effizienten Python-Code schreiben

Preparing Video For Download...