Imputasi mean, median, & modus

Menangani Data Hilang di Python

Suraj Donthi

Deep Learning & Computer Vision Consultant

Teknik imputasi dasar

  • konstanta (mis. 0)
  • mean
  • median
  • modus atau paling sering
Menangani Data Hilang di Python

Imputasi mean

from sklearn.impute import SimpleImputer

diabetes_mean = diabetes.copy(deep=True)
mean_imputer = SimpleImputer(strategy='mean')
Menangani Data Hilang di Python

Imputasi mean

from sklearn.impute import SimpleImputer
diabetes_mean = diabetes.copy(deep=True)
mean_imputer = SimpleImputer(strategy='mean')
diabetes_mean.iloc[:, :] = mean_imputer.fit_transform(diabetes_mean)
Menangani Data Hilang di Python

Imputasi median

diabetes_median = diabetes.copy(deep=True)
median_imputer = SimpleImputer(strategy='median')
diabetes_median.iloc[:, :] = median_imputer.fit_transform(diabetes_median)
Menangani Data Hilang di Python

Imputasi modus

diabetes_mode = diabetes.copy(deep=True)
mode_imputer = SimpleImputer(strategy='most_frequent')
diabetes_mode.iloc[:, :] = mode_imputer.fit_transform(diabetes_mode)
Menangani Data Hilang di Python

Imputasi konstanta

diabetes_constant = diabetes.copy(deep=True)
constant_imputer = SimpleImputer(strategy='constant', fill_value=0))
diabetes_constant.iloc[:, :] = constant_imputer.fit_transform(diabetes_constant)
Menangani Data Hilang di Python

Plot sebar imputasi

nullity = diabetes['Serum_Insulin'].isnull()+diabetes['Glucose'].isnull()
diabetes_mean.plot(x='Serum_Insulin', y='Glucose', kind='scatter', alpha=0.5,

c=nullity, cmap='rainbow', title='Mean Imputation')

Plot sebar imputasi mean pada dataframe diabetes

Menangani Data Hilang di Python

Memvisualisasikan imputasi

fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 10))

nullity = diabetes['Serum_Insulin'].isnull()+diabetes['Glucose'].isnull()
imputations = {'Mean Imputation': diabetes_mean, 'Median Imputation': diabetes_median, 'Most Frequent Imputation': diabetes_mode, 'Constant Imputation': diabetes_constant}
for ax, df_key in zip(axes.flatten(), imputations):
imputations[df_key].plot(x='Serum_Insulin', y='Glucose', kind='scatter', alpha=0.5, c=nullity, cmap='rainbow', ax=ax, colorbar=False, title=df_key)
Menangani Data Hilang di Python

Visualisasi grafis berbagai imputasi pada dataframe diabetes

Menangani Data Hilang di Python

Ringkasan

Anda telah mempelajari untuk:

  • Mengimputasi dengan parameter statistik: mean, median, dan modus
  • Membandingkan imputasi secara grafis
  • Menganalisis hasil imputasi
Menangani Data Hilang di Python

Ayo berlatih!

Menangani Data Hilang di Python

Preparing Video For Download...