Classify Disease Articles using Deep Neural Networks

Problem Statement

Classify whether an article/text describes an disease. Neural-network variants:

Binary Classification of whether an article accurately describes an disease
Figure out the disease it might be referring to

Source Dataset

Obtained from Wikipedia by scraping through articles

Gather articles pertaining to diseases and otherwise using wget
Label each article depending on if it pertains to an disease: isDisease
HTML Parser to scrape through the essentials from the html document

Logistic Regression (baseline)

LogisticRegression from Scikit-learn is used for:

Feature extraction and transformation
Logistic Regression Classifier is used to train on the dataset and test on the testing dataset

Binary Classification using Deep Neural Networks

Keras with tensorflow as the backend and scikit-learn for feature extraction:

Sentences are extracted from the article and vectorized using CountVectorizer from the scikit-learn library
Sequential deep neural network model with 10 layers with relu activation and adam optimizer is used to train on the data
Verification is accomplished by splitting the dataset into training and test datasets

Multi-label Disease classification using Deep Neural Networks

Keras with tensorflow as the backend and scikit-learn for feature extraction:

Vectorized sentences are tagged along with the disease labels
Labels are LabelEncoded and transformed into a OneHotVector to be processed by the DNN model
Sequential deep neural network model with 10 layers and output layer with multiple classes is used to train on the data

Runner

Models created for both the parts are trained for sample data-sets and stored as .h5 files

Using these, runner modules could be leveraged to provide user with a script to test out DNN model on-demand

$ cd {Part}/trained_models
$ python runner.py
$ (enter text to be classified)

Conclusions

Model performs satisfactorily well, but with caveats. Future scope includes:

Experimentation with Word embeddings and Glove bag-of-words
Convolutional Neural Networks and Deep-NLP

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
binary_classification		binary_classification
dnn_disease_classification		dnn_disease_classification
logistic_regression		logistic_regression
multi_label_classification		multi_label_classification
pretrained_models		pretrained_models
utils		utils
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classify Disease Articles using Deep Neural Networks

Problem Statement

Source Dataset

Logistic Regression (baseline)

Binary Classification using Deep Neural Networks

Multi-label Disease classification using Deep Neural Networks

Runner

Conclusions

About

Releases

Packages

Languages

License

dishamisal/Text-Classification-DNN

Folders and files

Latest commit

History

Repository files navigation

Classify Disease Articles using Deep Neural Networks

Problem Statement

Source Dataset

Logistic Regression (baseline)

Binary Classification using Deep Neural Networks

Multi-label Disease classification using Deep Neural Networks

Runner

Conclusions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages