Model Validation in Python
Kasey Jones
Data Scientist
Dataset | Definition |
---|---|
Train | The sample of data used when fitting models |
Test (holdout sample) | The sample of data used to assess model performance |
Ratio Examples
import pandas as pd
tic_tac_toe = pd.read_csv("tic-tac-toe.csv")
X = pd.get_dummies(tic_tac_toe.iloc[:,0:9])
y = tic_tac_toe.iloc[:, 9]
Python courses covering dummy variables:
X_train, X_test, y_train, y_test =\
train_test_split(X, y, test_size=0.2, random_state=1111)
Parameters:
test_size
train_size
random_state
What do we do when testing different model parameters?
X_temp, X_test, y_temp, y_test =\
train_test_split(X, y, test_size=0.2, random_state=1111)
X_train, X_val, y_train, y_val =\
train_test_split(X_temp, y_temp, test_size=0.25, random_state=11111)
Model Validation in Python