Features met missende waarden of lage variantie

Dimensionality Reduction in Python

Jeroen Boeye

Head of Machine Learning, Faktion

Een featureselector maken

print(ansur_df.shape)
(6068, 94)
from sklearn.feature_selection import VarianceThreshold

sel = VarianceThreshold(threshold=1)

sel.fit(ansur_df) mask = sel.get_support() print(mask)
array([ True,  True, ..., False,  True])
Dimensionality Reduction in Python

Een featureselector toepassen

print(ansur_df.shape)
(6068, 94)
reduced_df = ansur_df.loc[:, mask]
print(reduced_df.shape)
(6068, 93)
Dimensionality Reduction in Python

Kanttekeningen bij variantieselectie

buttock_df.boxplot()

boxplots van features

Dimensionality Reduction in Python

De variantie normaliseren

from sklearn.feature_selection import VarianceThreshold

sel = VarianceThreshold(threshold=0.005)

sel.fit(ansur_df / ansur_df.mean())

mask = sel.get_support() reduced_df = ansur_df.loc[:, mask] print(reduced_df.shape)
(6068, 45)
Dimensionality Reduction in Python

Selector voor missende waarden

pokemon-sample

Dimensionality Reduction in Python

Selector voor missende waarden

pokemon-sample met NaN

Dimensionality Reduction in Python

Missende waarden identificeren

pokemon_df.isna()

pokemon-sample NaN-boolean

Dimensionality Reduction in Python

Missende waarden tellen

pokemon_df.isna().sum()
Name         0
Type 1       0
Type 2     386
Total        0
HP           0
Attack       0
Defense      0
dtype: int64
Dimensionality Reduction in Python

Missende waarden tellen

pokemon_df.isna().sum() / len(pokemon_df)
Name       0.00
Type 1     0.00
Type 2     0.48
Total      0.00
HP         0.00
Attack     0.00
Defense    0.00
dtype: float64
Dimensionality Reduction in Python

Drempel voor missende waarden toepassen

# Minder dan 30% missend = True
mask = pokemon_df.isna().sum() / len(pokemon_df) < 0.3
print(mask)
Name        True
Type 1      True
Type 2     False
Total       True
HP          True
Attack      True
Defense     True
dtype: bool
Dimensionality Reduction in Python

Drempel voor missende waarden toepassen

reduced_df = pokemon_df.loc[:, mask]

reduced_df.head()

pokemon-sample na mask

Dimensionality Reduction in Python

Laten we oefenen!

Dimensionality Reduction in Python

Preparing Video For Download...