Predictive Analytics using Networked Data in R
Bart Baesens, Ph.D.
Professor of Data Science, KU Leuven and University of Southampton
edgeList
from to
1 1 393
2 1 2573
3 1 4430
4 393 926
5 393 1574
customers
id churn
1 1 0
2 393 0
3 2573 0
4 4430 0
5 926 1
6 1574 1
Birds of a feather flock together
Dyadicity: connectedness between nodes with same label
Heterophilicty: connectedness between nodes with opposite labels
g
IGRAPH UN-- 10 19 --
attr: name (v/c), label (e/c)
edges (vertex names):
A--B A--C A--D A--E B--C B--D C--D C--G D--E D--F D--G E--F F--G F--I G--I G--H H--I H--J I--J
V(g)$degree<-degree(g)
g
IGRAPH UN-- 10 19 --
attr: name (v/c), degree (v/n), triangles (v/n), transitivity
| (v/n), rNeighbors (v/n), averageAge (v/n), pageRank (v/n),
| pPageRank (v/n), label (e/c)
edges (vertex names):
A--B A--C A--D A--E B--C B--D C--D C--G D--E D--F D--G E--F F--G F--I G--I G--H H--I H--J I--J
dataset <- as_data_frame(g, what='vertices')
glm(R~., dataset=training_set, family='binomial')
logPredictions <- predict(logModel, newdata=test_set, type="response")
auc(test_set$label, logPredictions)
TopDecileLift(test_set$label, predictions, plot=TRUE)
Predictive Analytics using Networked Data in R