Homophily

Predictive Analytics using Networked Data in R

Bart Baesens, Ph.D.

Professor of Data Science, KU Leuven and University of Southampton

Homophily explained

Birds of a feather flock together

  • Share common property, hobbies, interest, origin, etc.
  • Depends on:
    • Connectedness between nodes with same label
    • Connectedness between nodes with opposite labels
Predictive Analytics using Networked Data in R

Homophilic Networks

  • Not Homophilic

  • Homophilic
Predictive Analytics using Networked Data in R
names <- c('A','B','C','D','E','F','G','H','I','J')
tech <- c(rep('R',6),rep('P',4))
DataScientists <- data.frame(name=names,technology=tech)
DataScienceNetwork <- data.frame(
 from=c('A','A','A','A','B','B','C','C','D','D',
        'D','E','F','F','G','G','H','H','I'),
 to=c('B','C','D','E','C','D','D','G','E','F',
        'G','F','G','I','I','H','I','J','J'),
 label=c(rep('rr',7),'rp','rr','rr','rp','rr','rp','rp',rep('pp',5)))

g <- graph_from_data_frame(DataScienceNetwork,directed = FALSE)

Add the technology as a node attribute

V(g)$label <- as.character(DataScientists$technology)
V(g)$color <- V(g)$label 
V(g)$color <- gsub('R',"blue3",V(g)$color))
V(g)$color <- gsub('P',"green4",V(g)$color)
Predictive Analytics using Networked Data in R

Types of edges

Code to color the edges

E(g)$color<-E(g)$label
E(g)$color=gsub('rp','red',E(g)$color)
E(g)$color=gsub('rr','blue3',E(g)$color)
E(g)$color=gsub('pp','green4',E(g)$color)

Code to visualize the network

pos<-cbind(c(2,1,1.5,2.5,4,4.5,3,3.5,5,6),
c(10.5,9.5,8,8.5,9,7.5,6,4.5,5.5,4))

plot(g,edge.label=NA,vertex.label.color='white',
layout=pos, vertex.size = 25)
Predictive Analytics using Networked Data in R

Counting edge types

# R edges
edge_rr<-sum(E(g)$label=='rr')

# Python edges
edge_pp<-sum(E(g)$label=='pp')

# cross label edges
edge_rp<-sum(E(g)$label=='rp')
  • edge_rr=10
  • edge_pp=5
  • edge_rp=4
Predictive Analytics using Networked Data in R

Network connectance

$p=\frac{2\cdot edges}{nodes (nodes-1)}$

p <- 2*edges/nodes*(nodes-1)
  • p = 0.42

  • Number of edges in a fully connected network: ${{nodes}\choose{2}}=\frac{nodes(nodes-1)}{2}$

Predictive Analytics using Networked Data in R

Let's practice!

Predictive Analytics using Networked Data in R

Preparing Video For Download...