Machine Learning Ujung ke Ujung
Joshua Stapleton
Machine Learning Engineer
Dataset memiliki:
Persiapan data:
df.drop() untuk kolomdf.dropna(how='all') untuk baris# count missing values
print(df['oldpeak'].isnull().sum())
# Drop empty column(s) and row(s)
columns_dropped = heart_disease_df.drop(['oldpeak'], axis='columns')
rows_and_columns_dropped = columns_dropped.dropna(how='all')
Apa yang dilakukan jika hanya sedikit nilai hilang?
Imputasi:
Strategi
# Calculate the mean cholestrol value
mean_value = heart_disease_df['chol'].mean()
# Fill missing cholestrol values with the mean
heart_disease_df['chol'].fillna(mean_value, inplace=True)
Teknik lanjutan:
from sklearn.impute import KNNImputer
# Initialize KNNImputer
imputer = KNNImputer(n_neighbors=2, weights="uniform")
# Perform the imputation on your DataFrame
df_imputed['oldpeak'] = imputer.fit_transform(df['oldpeak'])
# Drop duplicate rows
heart_disease_duplicates_dropped = heart_disease_column_dropped.drop_duplicates()
Machine Learning Ujung ke Ujung