Dataverdelingen en transformaties

Machine Learning-sollicitatievragen oefenen in Python

Lisa Stuart

Data Scientist

Verschillende verdelingen

Dataverdelingen

1 https://www.researchgate.net/figure/Bias-Training-and-test-data-sets-are-drawn-from-different-distributions_fig22_330485084
Machine Learning-sollicitatievragen oefenen in Python

Train/test-split

train, test = train_test_split(X, y, test_size=0.3)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

sns.pairplot() --> matrix met verdelingen en scatterplots

Machine Learning-sollicitatievragen oefenen in Python

Datatransformatie

Datatransformatie

1 https://www.researchgate.net/figure/Example-of-the-effect-of-a-log-transformation-on-the-distribution-of-the-dataset_fig20_308007227
Machine Learning-sollicitatievragen oefenen in Python
Box-Cox-transformaties

scipy.stats.boxcox(data, lmbda= )

lmbda (p) $x^p$ transformatie
-2 $x^{-2} = 1/2$ reciproke kwadraat
-1 $x^{-1} = 1/x$ reciproke
-0.5 $x^{-1/2} = 1/\sqrt{x}$ reciproke wortel
0.0 $\log{(x)}$ log
0.5 $x^{1/2} = \sqrt{x}$ wortel
1 $x^1 = x$ geen transformatie
2 $x^2 = x$ kwadraat
Machine Learning-sollicitatievragen oefenen in Python

Laten we oefenen!

Machine Learning-sollicitatievragen oefenen in Python

Preparing Video For Download...