Challenges of network-based inference

Predictive Analytics using Networked Data in R

María Óskarsdóttir, Ph.D.

Post-doctoral researcher

First challenge

Splitting the data!

set.seed(1001)
sampleVertices <- sample(1:10, 6, replace=FALSE)
plot(induced_subgraph(g, V(g)[sampleVertices]))
plot(induced_subgraph(g, V(g)[-sampleVertices]))

Splitting

Predictive Analytics using Networked Data in R

Second challenge

The observations in the dataset are not independent and identically distributed (iid)

IID

Predictive Analytics using Networked Data in R

Third challenge

Collective Inference!

IID

Predictive Analytics using Networked Data in R

Probabilistic relational neighbor classifier

# probability churn (C)
(0.9 + 0.2 + 0.1 + 0.4 + 0.8) / 5
0.48
# probability non-churn (NC)
(0.1 + 0.8 + 0.9 + 0.6 + 0.2) / 5
0.52
Predictive Analytics using Networked Data in R

Let's practice!

Predictive Analytics using Networked Data in R

Preparing Video For Download...