Python ile ETL ve ELT
Jake Roach
Data Engineer

timestamps volume open close
1997-05-15 13:30:00 1443120000 0.121875 0.097917
1997-05-16 13:30:00 294000000 NaN 0.086458
1997-05-19 13:30:00 122136000 0.088021 NaN
# Tüm NaN değerlerini 0 ile doldurun
clean_stock_data = raw_stock_data.fillna(value=0)
timestamps volume open close
1997-05-15 13:30:00 1443120000 0.121875 0.097917
1997-05-16 13:30:00 294000000 0.000000 0.086458
1997-05-19 13:30:00 122136000 0.088021 0.000000
timestamps volume open close
1997-05-15 13:30:00 1443120000 0.121875 0.097917
1997-05-16 13:30:00 294000000 NaN 0.086458
1997-05-19 13:30:00 122136000 0.088021 NaN
# NaN'leri her sütun için belirli değerle doldurun
clean_stock_data = raw_stock_data.fillna(value={"open": 0, "close": .5}, axis=1)
timestamps volume open close
1997-05-15 13:30:00 1443120000 0.121875 0.097917
1997-05-16 13:30:00 294000000 0.000000 0.086458
1997-05-19 13:30:00 122136000 0.088021 0.500000
timestamps volume open close
1997-05-15 13:30:00 1443120000 0.121875 0.097917
1997-05-16 13:30:00 294000000 NaN 0.086458
1997-05-19 13:30:00 122136000 0.088021 NaN
# NaN değerini diğer sütunları kullanarak doldurun
raw_stock_data["open"].fillna(raw_stock_data["close"], inplace=True)
timestamps volume open close
1997-05-15 13:30:00 1443120000 0.121875 0.097917
1997-05-16 13:30:00 294000000 0.086458 0.086458
1997-05-19 13:30:00 122136000 0.088021 NaN
SELECT
ticker,
AVG(volume),
AVG(open),
AVG(close)
FROM raw_stock_data
GROUP BY ticker;
.groupby() yöntemi yukarıdaki sorguyu pandas ile yeniden oluşturabilir
ticker volume open close
AAPL 1443120000 0.121875 0.097917
AAPL 297000000 0.098146 0.086458
AMZN 124186000 0.247511 0.251290
# Python ile verileri 'ticker'a göre gruplayın, kalan sütunların ortalamasını alın
grouped_stock_data = raw_stock_data.groupby(by=["ticker"], axis=0).mean()
volume open close
ticker
AAPL 1.149287e+08 34.998377 34.986851
AMZN 1.434213e+08 30.844692 30.830233
Verileri özetlemek için .min(), .max() ve .sum() da kullanılabilir
.apply() yöntemi daha gelişmiş dönüşümleri yapabilir
def classify_change(row):
change = row["close"] - row["open"]
if change > 0:
return "Increase"
else:
return "Decrease"
# Dönüşümü DataFrame'e uygula
raw_stock_data["change"] = raw_stock_data.apply(
classify_change,
axis=1
)
Dönüşüm öncesi
ticker ... open close
AAPL 0.121875 0.097917
AAPL 0.098146 0.086458
AMZN 0.247511 0.251290
$$
Dönüşüm sonrası
ticker ... open close change
AAPL 0.121875 0.097917 Decrease
AAPL 0.098146 0.086458 Decrease
AMZN 0.247511 0.251290 Increase
Python ile ETL ve ELT