### Logistic Regression with Jupyter Notebook Link to heading

This part of the series demonstrates how to implement logistic regression on the Iris dataset using Jupyter Notebook. Jupyter Notebook provides an interactive environment for data analysis and visualization, making it a popular choice for data scientists.

#### 1. Dataset Preparation Link to heading

##### Loading the Dataset Link to heading

We will use the Iris dataset, a well-known dataset for classification tasks. It is available in the `sklearn`

library.

```
import pandas as pd
from sklearn.datasets import load_iris
# Load the Iris dataset
iris = load_iris()
data = pd.DataFrame(data=iris.data, columns=iris.feature_names)
data['target'] = iris.target
# Display the first few rows of the dataset
data.head()
```

##### Exploring the Data Link to heading

Use pandas and seaborn for data exploration and visualization.

```
import seaborn as sns
import matplotlib.pyplot as plt
# Pairplot to visualize the data
sns.pairplot(data, hue='target')
plt.show()
```

##### Data Cleaning Link to heading

Handle any missing values or outliers (the Iris dataset is clean, so this step is often skipped).

*Handling Missing Values:*
Link to heading

In many real-world datasets, missing values are common. You can handle them by either removing rows/columns with missing values or filling them with appropriate values (imputation).

```
# Check for missing values
print(data.isnull().sum())
# Option 1: Drop rows with missing values
data_cleaned = data.dropna()
# Option 2: Fill missing values with mean of the column
data_filled = data.fillna(data.mean())
# Display the cleaned dataset
data_cleaned.head()
```

*Handling Outliers:*
Link to heading

Outliers can skew the results of your model. You can detect outliers using statistical methods or visualization and handle them by removing or transforming them.

```
# Detect outliers using Z-score
from scipy import stats
import numpy as np
z_scores = np.abs(stats.zscore(data.drop(columns=['target'])))
outliers = np.where(z_scores > 3)
# Remove outliers
data_no_outliers = data[(z_scores < 3).all(axis=1)]
# Display the dataset without outliers
data_no_outliers.head()
```

##### Feature Engineering Link to heading

Feature engineering involves transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy. Here, we will modify or create features for the Iris dataset.

- Creating Interaction Terms: Interaction terms can capture the interaction between features.

```
# Creating interaction terms
data['sepal length * sepal width'] = data['sepal length (cm)'] * data['sepal width (cm)']
data['petal length * petal width'] = data['petal length (cm)'] * data['petal width (cm)']
```

- Feature Transformation: Apply mathematical transformations to features.

```
# Applying logarithmic transformation
data['log sepal length'] = np.log(data['sepal length (cm)'])
```

- Binning: Convert continuous features into categorical features by binning.

```
# Binning petal length
data['petal length bin'] = pd.cut(data['petal length (cm)'], bins=[0, 2, 4, 6], labels=['short', 'medium', 'long'])
```

- Normalization: Scale features to a similar range.

```
from sklearn.preprocessing import MinMaxScaler
# Normalizing features
scaler = MinMaxScaler()
data[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']] = scaler.fit_transform(
data[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']]
)
```

These steps enhance the features, making them more suitable for machine learning algorithms and improving model performance.

##### Splitting the Data Link to heading

Use train_test_split from sklearn.model_selection to divide the dataset.

```
from sklearn.model_selection import train_test_split
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data.drop(columns=['target']), data['target'], test_size=0.2, random_state=42)
```

#### 2. Model Implementation Link to heading

Choosing the Algorithm Use LogisticRegression from sklearn.linear_model.

```
from sklearn.linear_model import LogisticRegression
# Initialize the Logistic Regression model
model = LogisticRegression(max_iter=200)
```

##### Defining the Model Link to heading

Specify parameters such as C, solver, and max_iter. (This is optional for the first turn. We will focus on it later in Hyperparameter Tuning.)

##### Training the Model Link to heading

Fit the model using the training data.

```
# Train the model
model.fit(X_train, y_train)
```

##### Evaluating the Model Link to heading

Use metrics like accuracy, precision, and recall.

```
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Make predictions on the test set
y_pred = model.predict(X_test)
# Evaluate the model
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
```

##### Visualizing Results Link to heading

Plot decision boundaries and confusion matrices.

```
import numpy as np
import matplotlib.pyplot as plt
from mlxtend.plotting import plot_decision_regions
# Plot decision regions for two features
plot_decision_regions(X_train.values, y_train.values, clf=model, legend=2)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Logistic Regression Decision Boundary')
plt.show()
```

#### 3. Hyperparameter Tuning Link to heading

##### Understanding Hyperparameters Link to heading

**C: Regularization Strength**The parameter C controls the regularization strength in logistic regression. It is the inverse of regularization strength; smaller values specify stronger regularization. Regularization helps prevent overfitting by penalizing large coefficients.**solver: Optimization Algorithm**The solver parameter specifies the algorithm to use for optimization. Common choices include:

- newton-cg, lbfgs, liblinear: Suitable for small datasets.
- sag, saga: Efficient for large datasets.

**max_iter: Maximum Number of Iterations**The max_iter parameter sets the maximum number of iterations the solver will use. This is useful for solvers that iterate to converge to a solution.

##### Training with Cross-Validation Link to heading

Performing cross-validation helps in finding the best parameters for your logistic regression model. Cross-validation divides the dataset into multiple folds and trains the model on different combinations of these folds, ensuring the model generalizes well on unseen data. Here’s how to perform cross-validation with GridSearchCV in scikit-learn:

*Setting Up Cross-Validation*
Link to heading

- Define the Parameter Grid: Create a dictionary with parameters and their respective values to test.

```
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
# Define the parameter grid
param_grid = {
'C': [0.1, 1, 10, 100],
'solver': ['newton-cg', 'lbfgs', 'liblinear'],
'max_iter': [100, 200, 300]
}
```

- Initialize GridSearchCV: Set up GridSearchCV with the logistic regression model and the parameter grid.

```
# Initialize the GridSearchCV
grid_search = GridSearchCV(LogisticRegression(), param_grid, cv=5, scoring='accuracy')
```

- Fit the Model: Perform cross-validation on the training data to find the best parameters.

```
# Fit the model
grid_search.fit(X_train, y_train)
```

- Select the Best Parameters: After fitting, retrieve the best parameters and use the best model for predictions.

```
# Best parameters
print("Best Parameters:", grid_search.best_params_)
# Best model
best_model = grid_search.best_estimator_
# Evaluate the best model
y_pred_best = best_model.predict(X_test)
print("Best Model Accuracy:", accuracy_score(y_test, y_pred_best))
```

This process ensures that the model is tuned for optimal performance and generalizes well to new data.

#### 4. Model Deployment Link to heading

##### Saving the Model Link to heading

Save the model using pickle or joblib.

```
import pickle
# Save the model to a file
with open('logistic_regression_model.pkl', 'wb') as f:
pickle.dump(best_model, f)
```

##### Creating an API Link to heading

Use Flask to set up a simple web service.

```
from flask import Flask, request, jsonify
app = Flask(__name__)
# Load the model
with open('logistic_regression_model.pkl', 'rb') as f:
model = pickle.load(f)
@app.route('/predict', methods=['POST'])
def predict():
data = request.json
prediction = model.predict([data['features']])
return jsonify({'prediction': int(prediction[0])})
if __name__ == '__main__':
app.run(debug=True)
```

##### Deploying to a Server Link to heading

Deploying the API involves setting up a server to host your Flask application. This can be done on cloud services like AWS, Azure, or Google Cloud, or on a local server. The process generally includes configuring the server, installing necessary dependencies, and ensuring the API is accessible over the internet.

##### Monitoring and Updating Link to heading

Monitoring the API’s performance is crucial for maintaining its reliability and efficiency. This involves tracking metrics like response time and error rates. Regular updates may be needed to improve the model’s accuracy or adapt to new data, which requires retraining the model and redeploying the updated version to the server.