Meningkatkan deteksi berhasil dengan penyeimbangan data

Deteksi Kecurangan di Python

Charlotte Werger

Data Scientist

Undersampling

Deteksi Kecurangan di Python

Oversampling

Deteksi Kecurangan di Python

Oversampling di Python

from imblearn.over_sampling import RandomOverSampler

method = RandomOverSampler() X_resampled, y_resampled = method.fit_resample(X, y)
compare_plots(X_resampled, y_resampled, X, y)

Deteksi Kecurangan di Python

Synthetic Minority Oversampling Technique (SMOTE)

1 https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets
Deteksi Kecurangan di Python

Metode resampling mana yang digunakan?

  • Random Under Sampling (RUS): buang sebagian data, efisien secara komputasi
  • Random Over Sampling (ROS): mudah dan langsung, tetapi banyak duplikat saat melatih model
  • Synthetic Minority Oversampling Technique (SMOTE): lebih canggih dan realistis, tetapi melatih pada data "palsu"
Deteksi Kecurangan di Python

Kapan menggunakan metode resampling

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Define resampling method and split into train and test method = SMOTE() X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, random_state=0)
# Apply resampling to the training data only X_resampled, y_resampled = method.fit_resample(X_train, y_train)
# Continue fitting the model and obtain predictions model = LogisticRegression() model.fit(X_resampled, y_resampled)
# Get your performance metrics predicted = model.predict(X_test) print (classification_report(y_test, predicted))
Deteksi Kecurangan di Python

Ayo berlatih!

Deteksi Kecurangan di Python

Preparing Video For Download...