Skip to content

debnathk/protein_classification_prediction

Repository files navigation

Protein Family Classification from Raw Protein Sequences using Naïve Bayes

•  Labelled Structural Protein Sequence dataset containing 346,325 ‘string’ type datapoints were imported from Kaggle, followed by pre-processing

•  Feature extraction from the raw string data were performed using CountVectorizor

•  Naïve Bayes classifier were utilized for the prediction from the count vectorized features, followed by AdaBoost classifier for comparison

•  Accuracy achieved in the task of classification was 76.38%

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published