In this repository two projects are applied, first one is a crawling project where BeautifulSoup is used to crawl wikipedia and prove that starting from any random wikipedia page, you will end up in the Philosophy page. Second project is about Sentiment Analysis, where I have applied three different approaches.
python3 main.py wikipedia_url
cd Sentiment_Analysis
ls
--> returns
ckpt Data_Sentiment_analysis.ipynb raw_data
contractions.py lstm_model.zip training_repo
Data_Sentiment_analysis.ipynb --> In this jupyter notebook, Sentiment Analysis training data is analysed and 3 different Sentiment Analysis models are evaluated and compared.
Pretrained models are loaded in this notebook. To use those download pretrained models and put in ckpt folder.
This will download the lstm model with its vocabulary.
gdown https://drive.google.com/uc?id=1QlO6zWtpZrJDXmEqnK_m66zO8CRQbsnk
gdown https://drive.google.com/uc?id=1sCWHvNBqWP7hHzweckahXdBcn8IZKbpG
This wil download the bert with lstm model
gdown https://drive.google.com/uc?id=1PvfNpkULQxH29gcoO6191o0fkvyn5Y-d
This will download the only bert model
gdown https://drive.google.com/uc?id=1mSuVOOFPAMAxsIcERkmRl2gT7Nq3EWzx
After downloading these models, put them on ckpt folder
To train the models create raw_data folder and put train.csv Sentiment Analydis Dataset to there.
Bidirectional LSTM model --> located in training_repo/train_lstm_model.py
to train
python train_lstm_model.py --train-csv training_file.csv
BERT + Bidirectional LSTM model --> located in training_repo/train_bert_with_lstm_model.py
python train_bert_with_lstm_model.py --train-csv training_file.csv
BERT model --> located in training_repo/train_only_bert_model.py
python train_only_bert_model.py --train-csv training_file.csv