This repository contains a sentiment analysis of the 2024 presidential debate. The approach used includes several stages, ranging from data scraping, data processing, to machine learning modeling.
Data is obtained by scraping comments from the YouTube video titled "[LIVE] Debat Capres 2024, Nobar Debat Ronde Ketiga di Musyawarah | Musyawarah" with this link. A total of 3597 comments were successfully collected.
The scraped data is loaded into the repository for further processing.
The data cleaning process is performed to address noise and ensure the quality of the data to be used.
This stage involves text processing, including tokenization, slang normalization, stopword removal, and stemming using Sastrawi.
In this stage, the collected comments will undergo a translation process into English. This is done to ensure consistency in sentiment analysis, allowing the algorithm to label positive, negative, or neutral sentiments more accurately.
Sentiment labels are assigned to comments using the TextBlob algorithm. The labeling results show 1480 positive comments, 1041 neutral comments, and 492 negative comments.
Visualization is used to provide a better understanding of the distribution of sentiment in the data.
Machine learning classification is done using several algorithms, including Logistic Regression (LR), Multinomial Naive Bayes (NB), Linear Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Multi-Layer Perceptron (MLP).
Model accuracy is evaluated, and the results are as follows:
-
Clone this repository.
git clone https://github.com/bimarakajati/Analisis-Sentimen-Debat-Capres-2024.git cd Analisis-Sentimen-Debat-Capres-2024
-
Install the required packages.
pip install -r requirements.txt
-
Explore the notebooks and scripts for each stage of the analysis.
Feel free to contribute or use this repository for further analysis and improvements.