Met iterators grote bestanden in het geheugen laden

Python-gereedschapskist

Hugo Bowne-Anderson

Data Scientist at DataCamp

Data in chunks laden

  • Soms past data niet in het geheugen
  • Oplossing: laad in chunks!
  • pandas-functie: read_csv()
    • Geef chunkgrootte op: chunksize
Python-gereedschapskist

Itereren over data

import pandas as pd
result = []

for chunk in pd.read_csv('data.csv', chunksize=1000):
result.append(sum(chunk['x']))
total = sum(result)
print(total)
4252532
Python-gereedschapskist

Itereren over data

import pandas as pd
total = 0

for chunk in pd.read_csv('data.csv', chunksize=1000): total += sum(chunk['x'])
print(total)
4252532
Python-gereedschapskist

Laten we oefenen!

Python-gereedschapskist

Preparing Video For Download...