ETL and ELT in Python
Jake Roach
Data Engineer
Loading data to a file:
.to_csv()
method
import pandas as pd
# Data extraction and transformation
raw_data = pd.read_csv("raw_stock_data.csv")
stock_data = raw_data.loc[raw_data["open"] > 100, ["timestamps", "open"]]
# Load data to a .csv file
stock_data.to_csv("stock_data.csv")
.to_csv
called on the DataFrame"stock_data.csv"
stock_data.to_csv("./stock_data.csv", header=True)
True
, False
or list of string valuesstock_data.to_csv("./stock_data.csv", index=True)
True
or False
index
column is written to the filestock_data.to_csv("./stock_data.csv", sep="|")
|
character is a common optionHas counterparts:
.to_parquet()
.to_json()
.to_sql()
Was the DataFrame correctly stored to the CSV file?
import pandas
import os # Import the os module
# Extract, transform and load data
raw_data = pd.read_csv("raw_stock_data.csv")
stock_data = raw_data.loc[raw_data["open"] > 100, ["timestamps", "open"]]
stock_data.to_csv("stock_data.csv")
# Check that the path exists
file_exists = os.path.exists("stock_data.csv")
print(file_exists)
True
ETL and ELT in Python