Motivation: social networks and predictive analytics

Predictive Analytics using Networked Data in R

Bart Baesens, Ph.D.

Professor of Data Science, KU Leuven and University of Southampton

Applications

  • Age
  • Gender
  • Fraud

Fraud Network

  • Churn
    • Customer defection
    • Companies predict who is most likely to churn using
      1. Machine learning techniques
      2. Social networks
Predictive Analytics using Networked Data in R

Overview

  • Labeled social networks
    • Construct and label networks
    • Network learning
  • Homophily
    • Measure relational dependency
    • Heterophilicity and dyadicity
  • Network featurization
    • Compute node features
  • Predictive modeling with networks
    • Turn a network into a flat dataset
    • Predict churn among customers
Predictive Analytics using Networked Data in R

Predictive Analytics using Networked Data in R

Collaboration Network

library(igraph); 
DataScienceNetwork <- data.frame(
  from = c('A', 'A', 'A', 'A', 'B', 'B', 'C', 'C', 'D', 'D', 'D', 'E',
          'F', 'F', 'G', 'G', 'H', 'H', 'I'),
  to = c('B','C','D','E','C','D','D', 'G','E', 'F','G','F','G','I', 
          'I','H','I','J','J'))
g <- graph_from_data_frame(DataScienceNetwork, directed = FALSE)
pos <- cbind(c(2, 1, 1.5, 2.5, 4, 4. 5, 3, 3.5, 5, 6), 
             c(10.5, 9.5, 8, 8.5, 9, 7.5, 6, 4.5, 5.5, 4))
plot.igraph(g, edge.label = NA, edge.color = 'black', layout = pos, 
            vertex.label = V(g)$name, vertex.color = 'white', 
            vertex.label.color = 'black', vertex.size = 25)
Predictive Analytics using Networked Data in R

Collaboration Network

V(g)$technology <-
  c('R','R','?','R','R',
    'R','P','P','P','P')
V(g)$color <- V(g)$technology
V(g)$color <- gsub('R',"blue3", V(g)$color)
V(g)$color <- gsub('P',"green4", V(g)$color) 
V(g)$color <- gsub('?',"gray", V(g)$color)

Predictive Analytics using Networked Data in R

Churn Network

edgeList
  from    to
1    1   393
2    1  2573
3    1  4430
4  393   926
5  393  1574 

Churn

Predictive Analytics using Networked Data in R

Let's practice!

Predictive Analytics using Networked Data in R

Preparing Video For Download...