Predictive Analytics using Networked Data in R
Bart Baesens, Ph.D.
Professor of Data Science, KU Leuven and University of Southampton
Birds of a feather flock together
names <- c('A','B','C','D','E','F','G','H','I','J')
tech <- c(rep('R',6),rep('P',4))
DataScientists <- data.frame(name=names,technology=tech)
DataScienceNetwork <- data.frame(
from=c('A','A','A','A','B','B','C','C','D','D',
'D','E','F','F','G','G','H','H','I'),
to=c('B','C','D','E','C','D','D','G','E','F',
'G','F','G','I','I','H','I','J','J'),
label=c(rep('rr',7),'rp','rr','rr','rp','rr','rp','rp',rep('pp',5)))
g <- graph_from_data_frame(DataScienceNetwork,directed = FALSE)
Add the technology as a node attribute
V(g)$label <- as.character(DataScientists$technology)
V(g)$color <- V(g)$label
V(g)$color <- gsub('R',"blue3",V(g)$color))
V(g)$color <- gsub('P',"green4",V(g)$color)
Code to color the edges
E(g)$color<-E(g)$label
E(g)$color=gsub('rp','red',E(g)$color)
E(g)$color=gsub('rr','blue3',E(g)$color)
E(g)$color=gsub('pp','green4',E(g)$color)
Code to visualize the network
pos<-cbind(c(2,1,1.5,2.5,4,4.5,3,3.5,5,6),
c(10.5,9.5,8,8.5,9,7.5,6,4.5,5.5,4))
plot(g,edge.label=NA,vertex.label.color='white',
layout=pos, vertex.size = 25)
# R edges
edge_rr<-sum(E(g)$label=='rr')
# Python edges
edge_pp<-sum(E(g)$label=='pp')
# cross label edges
edge_rp<-sum(E(g)$label=='rp')
edge_rr=
10edge_pp=
5edge_rp=
4$p=\frac{2\cdot edges}{nodes (nodes-1)}$
p <- 2*edges/nodes*(nodes-1)
p = 0.42
Number of edges in a fully connected network: ${{nodes}\choose{2}}=\frac{nodes(nodes-1)}{2}$
Predictive Analytics using Networked Data in R