Logistic Regression with Jupyter Notebook Link to heading

3. Hyperparameter Tuning Link to heading

Understanding Hyperparameters Link to heading

C: Regularization Strength The parameter C controls the regularization strength in logistic regression. It is the inverse of regularization strength; smaller values specify stronger regularization. Regularization helps prevent overfitting by penalizing large coefficients.
solver: Optimization Algorithm The solver parameter specifies the algorithm to use for optimization. Common choices include:

newton-cg, lbfgs, liblinear: Suitable for small datasets.
sag, saga: Efficient for large datasets.

max_iter: Maximum Number of Iterations The max_iter parameter sets the maximum number of iterations the solver will use. This is useful for solvers that iterate to converge to a solution.

Training with Cross-Validation Link to heading

Performing cross-validation helps in finding the best parameters for your logistic regression model. Cross-validation divides the dataset into multiple folds and trains the model on different combinations of these folds, ensuring the model generalizes well on unseen data. Here’s how to perform cross-validation with GridSearchCV in scikit-learn:

Setting Up Cross-Validation Link to heading

Define the Parameter Grid: Create a dictionary with parameters and their respective values to test.

from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression

# Define the parameter grid
param_grid = {
    'C': [0.1, 1, 10, 100],
    'solver': ['newton-cg', 'lbfgs', 'liblinear'],
    'max_iter': [50, 100, 200, 300]
}

Initialize GridSearchCV: Set up GridSearchCV with the logistic regression model and the parameter grid.

# Initialize the GridSearchCV
grid_search = GridSearchCV(LogisticRegression(), param_grid, cv=5, scoring='accuracy')

Fit the Model: Perform cross-validation on the training data to find the best parameters.

# Fit the model
grid_search.fit(X_train, y_train)

Training with Cross-Validation

Select the Best Parameters: After fitting, retrieve the best parameters and use the best model for predictions.

# Best parameters
print("Best Parameters:", grid_search.best_params_)

# Best model
best_model = grid_search.best_estimator_

# Evaluate the best model
y_pred_best = best_model.predict(X_test)
print("Best Model Accuracy:", accuracy_score(y_test, y_pred_best))

Best Parameters: {'C': 1, 'max_iter': 50, 'solver': 'liblinear'}
Best Model Accuracy: 0.9090909090909091

This process ensures that the model is tuned for optimal performance and generalizes well to new data.