Summary and final thoughts

Predictive Analytics using Networked Data in R

Bart Baesens, Ph.D.

Professor of Data Science, KU Leuven and University of Southampton

Labeled networks

edgeList
  from    to
1    1   393
2    1  2573
3    1  4430
4  393   926
5  393  1574
customers
    id churn
1    1     0
2  393     0
3 2573     0
4 4430     0
5  926     1
6 1574     1

Predictive Analytics using Networked Data in R

Homophily

Birds of a feather flock together

Dyadicity: connectedness between nodes with same label

Heterophilicty: connectedness between nodes with opposite labels

Predictive Analytics using Networked Data in R

Network Featurization

g
IGRAPH UN-- 10 19 -- 
 attr: name (v/c), label (e/c)
 edges (vertex names):
 A--B A--C A--D A--E B--C B--D C--D C--G D--E D--F D--G E--F F--G F--I G--I G--H H--I H--J I--J
V(g)$degree<-degree(g)
g
IGRAPH UN-- 10 19 -- 
 attr: name (v/c), degree (v/n), triangles (v/n), transitivity
| (v/n), rNeighbors (v/n), averageAge (v/n), pageRank (v/n),
| pPageRank (v/n), label (e/c)
 edges (vertex names):
 A--B A--C A--D A--E B--C B--D C--D C--G D--E D--F D--G E--F F--G F--I G--I G--H H--I H--J I--J
Predictive Analytics using Networked Data in R
  1. Extract dataframe:
    dataset <- as_data_frame(g, what='vertices')
    
  2. Preprocess data set:
    • Missing values, outliers, correlated variables, and normalization
  3. Build model:
    glm(R~., dataset=training_set, family='binomial')
    
  4. Make predictions:
    logPredictions <- predict(logModel, newdata=test_set, type="response")
    
  5. Measure performance:
    auc(test_set$label, logPredictions)
    TopDecileLift(test_set$label, predictions, plot=TRUE)
    
Predictive Analytics using Networked Data in R

Congratulations!

Predictive Analytics using Networked Data in R

Preparing Video For Download...