This repository contains Python code for feature selection using a genetic algorithm and various classification algorithms applied to the well-known WDBC dataset (Wisconsin Diagnosis Breast Cancer dataset). The dataset is loaded from a CSV file using pandas, and preprocessing steps include handling missing values and label encoding.
The genetic algorithm is implemented using the DEAP library to evolve a population of binary strings representing selected features.
The selected features are used to train SVM, Random Forest, Decision Tree, and Artificial Neural Network (ANN) classifiers. Additionally, a Multi-Layer Perceptron (MLP) classifier is included for comparison.
The accuracy, precision, recall, and F1-score are calculated for each classifier. Confusion matrices are generated for the ANN classifier, and precision-recall metrics are computed for the MLP classifier.
The code utilizes popular Python libraries such as NumPy, scikit-learn, pandas, Matplotlib, Seaborn, DEAP, Keras, and others. The data can be loaded either locally or from Google Colab.
-
Environment Setup
- Make sure you have access to Google Colab.
- Upload the dataset (
data.csv
) to your Google Colab environment.
-
Install Dependencies
- Open a new Colab notebook.
- Install the required libraries in requirements.txt file
-
Run the Code
- Copy the provided code into separate cell (or cells) in your Colab notebook.
-
Adjust Parameters (Optional)
- You can customize parameters such as population size, generations, crossover probability, and mutation probability to suit your preferences.
-
Review Results
- After running the code, the results, including classifier accuracies and evaluation metrics, will be printed in the Colab output.
-
Explore and Experiment
- Feel free to experiment with different datasets, tweak the genetic algorithm parameters, or try other classifiers to see how the system behaves.
Takes into account that Google Colab provides a pre-installed environment for many popular libraries, and you only need to install additional dependencies listed in the requirements.txt
file.