Menangani Data Hilang di Python
Suraj Donthi
Deep Learning & Computer Vision Consultant
| Warna | Color_Red | Color_Green | Color_Blue |
|---|---|---|---|
| Red | 1 | 0 | 0 |
| Green | 0 | 1 | 0 |
| Blue | 0 | 0 | 1 |
| Red | 1 | 0 | 0 |
| Blue | 0 | 0 | 1 |
| Blue | 0 | 0 | 1 |
| Warna | Nilai |
|---|---|
| Red | 0 |
| Green | 1 |
| Blue | 2 |
| Red | 0 |
| Blue | 2 |
| Blue | 2 |
users = pd.read_csv('userprofile.csv')
users.head()
smoker drink_level dress_preference ambience hijos activity budget
0 False abstemious informal family independent student medium
1 False abstemious informal family independent student low
2 False social drinker formal family independent student low
3 False abstemious informal family independent professional medium
4 False abstemious no preference family independent student medium
from sklearn.preprocessing import OrdinalEncoder# Buat Ordinal Encoder ambience_ord_enc = OrdinalEncoder() # Pilih nilai non-null pada ambience ambience = users['ambience'] ambience_not_null = ambience[ambience.notnull()] reshaped_vals = ambience_not_null.values.reshape(-1, 1)# Enkode nilai non-null pada ambience encoded_vals = ambience_ord_enc.fit_transform(reshaped_vals)# Ganti kolom ambience dengan nilai ordinal users.loc[ambience.notnull(), 'ambience'] = np.squeeze(encoded_vals)
# Buat kamus untuk Ordinal encoder
ordinal_enc_dict = {}
# Loop kolom untuk dienkode
for col_name in users:
# Buat ordinal encoder untuk kolom
ordinal_enc_dict[col_name] = OrdinalEncoder()
col = users[col_name]
# Pilih nilai non-null pada kolom
col_not_null = col[col.notnull()]
reshaped_vals = col_not_null.values.reshape(-1, 1)
# Enkode nilai non-null pada kolom
encoded_vals = ordinal_enc_dict[col_name].fit_transform(reshaped_vals)
# Ganti nilai kolom dengan nilai ordinal
users.loc[col.notnull(), col_name] = np.squeeze(encoded_vals)
users_KNN_imputed = users.copy(deep=True)# Buat KNN imputer KNN_imputer = KNN()users_KNN_imputed.iloc[:, :] = np.round(KNN_imputer.fit_transform(users))for col_name in users_KNN_imputed: # Bentuk ulang ke 2 dimensi # untuk hindari error saat menyimpan ke DataFrame reshaped = users_KNN_imputed[col_name].values.reshape(-1, 1) users_KNN_imputed[col_name] = \ ordinal_enc_dict[col_name].inverse_transform(reshaped)
Langkah mengimput nilai kategorikal
Menangani Data Hilang di Python