Skip to content

Dive into feature selection and classification with this Python repository, utilizing a genetic algorithm and various classifiers on a breast cancer dataset. Achieving high accuracy levels, SVM and ANN stand out with 97.37%. Ideal for machine learning enthusiasts and those interested in cancer diagnostics.

Notifications You must be signed in to change notification settings

NadiaAzri/FeatureSelection-GeneticAlgorithm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Feature Selection Genetic Algorithm and Classification

This repository contains Python code for feature selection using a genetic algorithm and various classification algorithms applied to the well-known WDBC dataset (Wisconsin Diagnosis Breast Cancer dataset). The dataset is loaded from a CSV file using pandas, and preprocessing steps include handling missing values and label encoding.

Feature Selection with Genetic Algorithm

The genetic algorithm is implemented using the DEAP library to evolve a population of binary strings representing selected features.

Classification Algorithms

The selected features are used to train SVM, Random Forest, Decision Tree, and Artificial Neural Network (ANN) classifiers. Additionally, a Multi-Layer Perceptron (MLP) classifier is included for comparison.

Evaluation Metrics

The accuracy, precision, recall, and F1-score are calculated for each classifier. Confusion matrices are generated for the ANN classifier, and precision-recall metrics are computed for the MLP classifier.

Dependencies

The code utilizes popular Python libraries such as NumPy, scikit-learn, pandas, Matplotlib, Seaborn, DEAP, Keras, and others. The data can be loaded either locally or from Google Colab.

How to Use

  1. Environment Setup

    • Make sure you have access to Google Colab.
    • Upload the dataset (data.csv) to your Google Colab environment.
  2. Install Dependencies

    • Open a new Colab notebook.
    • Install the required libraries in requirements.txt file
  3. Run the Code

    • Copy the provided code into separate cell (or cells) in your Colab notebook.
  4. Adjust Parameters (Optional)

    • You can customize parameters such as population size, generations, crossover probability, and mutation probability to suit your preferences.
  5. Review Results

    • After running the code, the results, including classifier accuracies and evaluation metrics, will be printed in the Colab output.
  6. Explore and Experiment

    • Feel free to experiment with different datasets, tweak the genetic algorithm parameters, or try other classifiers to see how the system behaves.

Takes into account that Google Colab provides a pre-installed environment for many popular libraries, and you only need to install additional dependencies listed in the requirements.txt file.


About

Dive into feature selection and classification with this Python repository, utilizing a genetic algorithm and various classifiers on a breast cancer dataset. Achieving high accuracy levels, SVM and ANN stand out with 97.37%. Ideal for machine learning enthusiasts and those interested in cancer diagnostics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages