Building a predictive model

Predictive Analytics using Networked Data in R

María Óskarsdóttir, Ph.D.

Post-doctoral researcher

Predictive modeling

dataset$preference<-c(rep('R',2),'?',
rep('R',3),rep('P',4))
dataset[,c(1,9)]
  name preference
A    A          R
B    B          R
C    C          ?
D    D          R
E    E          R
F    F          R
G    G          P
H    H          P
I    I          P
J    J          P
Predictive Analytics using Networked Data in R

Predictive modeling

dataset$R<-c(1,1,'?',1,1,1,0,0,0,0)
dataset[,c(1,9,10)]
  name preference R
A    A          R 1
B    B          R 1
C    C          ? ?
D    D          R 1
E    E          R 1
F    F          R 1
G    G          P 0
H    H          P 0
I    I          P 0
J    J          P 0
training_set<-dataset[-3,-9]
test_set<-dataset[3,-9]
Predictive Analytics using Networked Data in R

Logistic regression

glm(R~degree+pageRank, dataset=training_set,family='binomial')

glm(R~., dataset=training_set,family='binomial')
Predictive Analytics using Networked Data in R

Random forests

library(randomForest)
rfModel<-randomForest(R~., dataset=training_set)

varImpPlot(rfModel)

Predictive Analytics using Networked Data in R

Let's practice!

Predictive Analytics using Networked Data in R

Preparing Video For Download...