GitHub - debnathk/protein_classification_prediction

Protein Family Classification from Raw Protein Sequences using Naïve Bayes

• Labelled Structural Protein Sequence dataset containing 346,325 ‘string’ type datapoints were imported from Kaggle, followed by pre-processing

• Feature extraction from the raw string data were performed using CountVectorizor

• Naïve Bayes classifier were utilized for the prediction from the count vectorized features, followed by AdaBoost classifier for comparison

• Accuracy achieved in the task of classification was 76.38%

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
confusion_matrix.png		confusion_matrix.png
pdb_data_no_dups.zip		pdb_data_no_dups.zip
pdb_data_seq.zip		pdb_data_seq.zip
predicting-protein-classification.ipynb		predicting-protein-classification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

debnathk/protein_classification_prediction

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages