Using iterators to load large files into memory

Python Toolbox

Hugo Bowne-Anderson

Data Scientist at DataCamp

Loading data in chunks

  • There can be too much data to hold in memory
  • Solution: load data in chunks!
  • pandas function: read_csv()
    • Specify the chunk: chunksize
Python Toolbox

Iterating over data

import pandas as pd
result = []

for chunk in pd.read_csv('data.csv', chunksize=1000):
result.append(sum(chunk['x']))
total = sum(result)
print(total)
4252532
Python Toolbox

Iterating over data

import pandas as pd
total = 0

for chunk in pd.read_csv('data.csv', chunksize=1000): total += sum(chunk['x'])
print(total)
4252532
Python Toolbox

Let's practice!

Python Toolbox

Preparing Video For Download...