Mushroom Naive Bayes Classifier

Overview

The Mushroom Naive Bayes Classifier project utilizes the Naive Bayes algorithm to classify mushroom species based on features such as shape, color, and odor. The data is loaded into a DataFrame and preprocessed before training the classifier. The model’s performance is evaluated using metrics such as accuracy, achieving an accuracy of 93.6%.

Description

This project focuses on classifying mushrooms as edible or poisonous based on features like cap shape, cap color, odor, gill size, and others. The dataset used contains information on different mushroom samples, including whether they are edible or poisonous. After loading and preprocessing the data, the Naive Bayes algorithm is trained on the dataset to build a classification model. The model is evaluated using accuracy and confusion matrix metrics.

Key Features

Data Loading and Preprocessing: The mushroom dataset is loaded into a pandas DataFrame and preprocessed to handle missing values.
Exploratory Data Analysis: Basic statistics and visualizations are used to understand the distribution of features and the class balance.
Model Training and Evaluation: The Naive Bayes algorithm is trained on the preprocessed data, and its performance is evaluated using accuracy and confusion matrix metrics.
Comparison with Gaussian Naive Bayes: The project also includes a comparison with Gaussian Naive Bayes to assess the performance difference between categorical and Gaussian Naive Bayes algorithms.

Results

Categorical Naive Bayes:

Accuracy: 93.6%

Confusion Matrix:

[[827 (True Negative) 16 (False Positive)]
 [88 (False Negative) 694 (True Positive)]]

Gaussian Naive Bayes:

Accuracy: 96.43%

Confusion Matrix:

[[1171 (True Negative) 86 (False Positive)]
 [1 (False Negative), 1180 (True Positive)]]

Conclusion

The Naive Bayes classification models show promising results in predicting the edibility of mushrooms based on their features. The Gaussian Naive Bayes model outperforms the Categorical Naive Bayes model, possibly due to the normal distribution assumption of continuous features.