Developing Machine Learning Models for Production
Sinan Ozdemir
Data Scientist, Entrepreneur, and Author
Schema tests look for expected data formats and data types
Tools like Great Expectations help automate this process
Set up:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import permutation_importance
# Train a random forest classifier (assuming we have some data)
model = RandomForestClassifier().fit(X_train, y_train)
Running our permutation importance test:
# Calculate feature importances using permutation importance
results = permutation_importance(model, X_test, y_test, n_repeats=10, random_state=42)
# Print the feature importances
feature_names = ['feature_1', 'feature_2', 'feature_3', ...]
importances = results.importances_mean
for i in range(len(feature_names)):
print(f'{feature_names[i]}: {importances[i]}')
Developing Machine Learning Models for Production