Histogram dan outlier

Pemodelan Risiko Kredit di R

Lore Dirick

Manager of Data Science Curriculum at Flatiron School

Menggunakan fungsi hist()

hist(loan_data$int_rate)

Histogram suku bunga

Pemodelan Risiko Kredit di R

Menggunakan fungsi hist()

hist(loan_data$int_rate, main = "Histogram of interest rate", xlab = "Interest rate")

Histogram suku bunga

Pemodelan Risiko Kredit di R

Menggunakan fungsi hist() pada annual_inc

hist(loan_data$annual_inc, xlab = "Annual Income", main = "Histogram of Annual Income")

Tangkapan layar 12-06-2020 pukul 13.55.03

Pemodelan Risiko Kredit di R

Menggunakan fungsi hist() pada annual_inc

hist_income <- hist(loan_data$annual_inc,
                    xlab = "Annual Income",
                    main = "Histogram of Annual Income")
hist_income$breaks
0  500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 ...
Pemodelan Risiko Kredit di R

Argumen breaks

n_breaks <- sqrt(nrow(loan_data)) # n_breaks = 170.5638
hist_income_n <- hist(loan_data$annual_inc, breaks = n_breaks, 
                      xlab = "Annual Income", main = "Histogram of Annual Income")

Tangkapan layar 12-06-2020 pukul 13.55.58

Pemodelan Risiko Kredit di R

annual_inc

plot(loan_data$annual_inc, ylab = "Annual Income")

Tangkapan layar 12-06-2020 pukul 13.56.24

Pemodelan Risiko Kredit di R

annual_inc

plot(loan_data$annual_inc, ylab = "Annual Income")

Tangkapan layar 12-06-2020 pukul 13.56.53

Pemodelan Risiko Kredit di R

Outlier

  • Kapan sebuah nilai dianggap outlier?

    • Pertimbangan pakar
    • Aturan praktis, mis.:

      • Q1 - 1,5 × IQR
      • Q3 + 1,5 × IQR
    • Umumnya: gabungan keduanya
Pemodelan Risiko Kredit di R

Pertimbangan pakar

"Gaji tahunan > $3 juta adalah outlier"

$$

# Temukan outlier
index_outlier_expert <- which(loan_data$annual_inc > 3000000)

# Hapus outlier dari data
loan_data_expert <- loan_data[-index_outlier_expert, ]
Pemodelan Risiko Kredit di R

Aturan praktis

Outlier jika lebih besar dari Q3 + 1,5 × IQR

$$

# Hitung Q3 + 1.5 * IQR
outlier_cutoff <- quantile(loan_data$annual_inc, 0.75) + 1.5 * IQR(loan_data$annual_inc)

# Identifikasi outlier index_outlier_ROT <- which(loan_data$annual_inc > outlier_cutoff)
# Hapus outlier loan_data_ROT <- loan_data[-index_outlier_ROT, ]
Pemodelan Risiko Kredit di R
hist(loan_data_expert$annual_inc,
     sqrt(nrow(loan_data_expert)), 
     xlab = "Annual income")

hist(loan_data_ROT$annual_inc,
     sqrt(nrow(loan_data_ROT)), 
     xlab = "Annual income")

Pemodelan Risiko Kredit di R

Plot bivariat

plot(loan_data$emp_length, loan_data$annual_inc, 
     xlab= "Employment length", ylab= "Annual income")

Tangkapan layar 12-06-2020 pukul 13.58.14

Pemodelan Risiko Kredit di R

Plot bivariat

plot(loan_data$emp_length, loan_data$annual_inc, 
     xlab= "Employment length", ylab= "Annual income")

Tangkapan layar 12-06-2020 pukul 13.58.34

Pemodelan Risiko Kredit di R

Ayo berlatih!

Pemodelan Risiko Kredit di R

Preparing Video For Download...