Histogrammen en uitschieters

Kredietrisicomodellering in R

Lore Dirick

Manager of Data Science Curriculum at Flatiron School

Functie hist() gebruiken

hist(loan_data$int_rate)

Histogram van rentepercentage

Kredietrisicomodellering in R

Functie hist() gebruiken

hist(loan_data$int_rate, main = "Histogram of interest rate", xlab = "Interest rate")

Histogram van rentepercentage

Kredietrisicomodellering in R

hist() gebruiken op annual_inc

hist(loan_data$annual_inc, xlab = "Annual Income", main = "Histogram of Annual Income")

Schermafbeelding 12-06-2020 om 13.55.03.png

Kredietrisicomodellering in R

hist() gebruiken op annual_inc

hist_income <- hist(loan_data$annual_inc,
                    xlab = "Annual Income",
                    main = "Histogram of Annual Income")
hist_income$breaks
0  500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 ...
Kredietrisicomodellering in R

Het breaks-argument

n_breaks <- sqrt(nrow(loan_data)) # n_breaks = 170.5638
hist_income_n <- hist(loan_data$annual_inc, breaks = n_breaks, 
                      xlab = "Annual Income", main = "Histogram of Annual Income")

Schermafbeelding 12-06-2020 om 13.55.58.png

Kredietrisicomodellering in R

annual_inc

plot(loan_data$annual_inc, ylab = "Annual Income")

Schermafbeelding 12-06-2020 om 13.56.24.png

Kredietrisicomodellering in R

annual_inc

plot(loan_data$annual_inc, ylab = "Annual Income")

Schermafbeelding 12-06-2020 om 13.56.53.png

Kredietrisicomodellering in R

Uitschieters

  • Wanneer is een waarde een uitschieter?

    • Expertbeoordeling
    • Vuistregel, bijv.

      • Q1 - 1,5 × IQR
      • Q3 + 1,5 × IQR
    • Meestal: combinatie van beide
Kredietrisicomodellering in R

Expertbeoordeling

"Jaarinkomens > $3 miljoen zijn uitschieters"

$$

# Find outlier
index_outlier_expert <- which(loan_data$annual_inc > 3000000)

# Remove outlier from dataset
loan_data_expert <- loan_data[-index_outlier_expert, ]
Kredietrisicomodellering in R

Vuistregel

Uitschieter als groter dan Q3 + 1,5 × IQR

$$

# Calculate Q3 + 1.5 * IQR
outlier_cutoff <- quantile(loan_data$annual_inc, 0.75) + 1.5 * IQR(loan_data$annual_inc)

# Identify outliers index_outlier_ROT <- which(loan_data$annual_inc > outlier_cutoff)
# Remove outliers loan_data_ROT <- loan_data[-index_outlier_ROT, ]
Kredietrisicomodellering in R
hist(loan_data_expert$annual_inc,
     sqrt(nrow(loan_data_expert)), 
     xlab = "Annual income")

hist(loan_data_ROT$annual_inc,
     sqrt(nrow(loan_data_ROT)), 
     xlab = "Annual income")

Kredietrisicomodellering in R

Bivariate plot

plot(loan_data$emp_length, loan_data$annual_inc, 
     xlab= "Employment length", ylab= "Annual income")

Schermafbeelding 12-06-2020 om 13.58.14.png

Kredietrisicomodellering in R

Bivariate plot

plot(loan_data$emp_length, loan_data$annual_inc, 
     xlab= "Employment length", ylab= "Annual income")

Schermafbeelding 12-06-2020 om 13.58.34.png

Kredietrisicomodellering in R

Laten we oefenen!

Kredietrisicomodellering in R

Preparing Video For Download...