Hyperparameter Tuning in Python
Alex Scriven
Data Scientist
Some hyperparameters are more important than others to begin tuning.
But which values to try for hyperparameters?
Let's look at some top tips!
Be aware of conflicting hyperparameter choices.
LogisticRegression()
conflicting parameter options of solver
& penalty
that conflict.The 'newton-cg', 'sag' and 'lbfgs' solvers support only l2 penalties.
Some aren't explicit but will just 'ignore' (from ElasticNet
with the normalize
hyperparameter):
This parameter is ignored when fit_intercept is set to False
Make sure to consult the Scikit Learn documentation!
Be aware of setting 'silly' values for different algorithms:
Spending time documenting sensible values for hyperparameters is a valuable activity.
In the previous exercise, we built models as:
knn_5 = KNeighborsClassifier(n_neighbors=5)
knn_10 = KNeighborsClassifier(n_neighbors=10)
knn_20 = KNeighborsClassifier(n_neighbors=20)
This is quite inefficient. Can we do better?
Try a for loop to iterate through options:
neighbors_list = [3,5,10,20,50,75]
accuracy_list = []
for test_number in neighbors_list: model = KNeighborsClassifier(n_neighbors=test_number) predictions = model.fit(X_train, y_train).predict(X_test)
accuracy = accuracy_score(y_test, predictions) accuracy_list.append(accuracy)
We can store the results in a DataFrame to view:
results_df = pd.DataFrame({'neighbors':neighbors_list, 'accuracy':accuracy_list})
print(results_df)
Let's create a learning curve graph
We'll test many more values this time
neighbors_list = list(range(5,500, 5))
accuracy_list = [] for test_number in neighbors_list: model = KNeighborsClassifier(n_neighbors=test_number) predictions = model.fit(X_train, y_train).predict(X_test) accuracy = accuracy_score(y_test, predictions) accuracy_list.append(accuracy) results_df = pd.DataFrame({'neighbors':neighbors_list, 'accuracy':accuracy_list})
We can plot the larger DataFrame:
plt.plot(results_df['neighbors'], results_df['accuracy'])
# Add the labels and title plt.gca().set(xlabel='n_neighbors', ylabel='Accuracy', title='Accuracy for different n_neighbors') plt.show()
Our graph:
Python's range
function does not work for decimal steps.
A handy trick uses NumPy's np.linspace(start, end, num)
num
) evenly spread within an interval (start
, end
) that you specify.print(np.linspace(1,2,5))
[1. 1.25 1.5 1.75 2. ]
Hyperparameter Tuning in Python