ETL and ELT in Python
Jake Roach
Data Engineer
Most data produced and consumed is unstructured data
API (Application Programming Interface)
$$
JSON (JavaScript Object Notation)
dict
ionaries{
"key": "value",
...
"open": 0.121875
}
{
"timestamps": [863703000, 863789400, ...],
"open": [0.121875, 0.098438, ...],
"close": [...],
"volume": [...]
}
Use the .read_json()
function
# Read in a JSON file in the format above
raw_stock_data = pd.read_json("raw_stock_data.json", orient="columns")
Data is not always DataFrame-ready
{
"863703000": {
"volume": 1443120000,
"price": {
"close": 0.09791,
"open": 0.12187
}
},
"863789400": {
...
}, ...
}
import json
with open("raw_stock_data.json", "r") as file:
# Load the file into a dictionary
raw_stock_data = json.load(file)
# Confirm the type of the raw_stock_data variable
print(type(raw_stock_data))
<class 'dict'>
ETL and ELT in Python