This project involves processing Reuters news data, performing sentiment analysis on news titles, and storing the results in a MongoDB database.
Objective: To read news articles from provided files, extract relevant information, and store them in a MongoDB database.
-
Data Processing:
- A Java program (ReutRead.java) to scan the required text between <TITLE></TITLE> and tags within each tag.
- Extract the title and body of each news article.
-
Storing in MongoDB:
- Creating a MongoDB database named ReuterDb.
- Storing each news article as a document in the database.
- Each document should contain fields for title and body, structured as follows:
{
"title": "ADVANCED MAGNETICS ADMG IN AGREEMENT",
"body": "Advanced Magnetics Inc said it reached a four mln dlrs research and development agreement with…"
}
Objective: Performing sentiment analysis on news article titles using a Bag-of-Words (BOW) model.
-
Bag-of-Words Creation: Implemented a Java program to create a bag-of-words for each news title.
-
Comparison with Positive and Negative Words:
- Downloaded the lists of positive and negative words from online sources.
- Compared each word in the bag-of-words with the lists of positive and negative words.
- Performed word-by-word comparison to determine sentiment.
-
Tagging News Titles:
- Tagged each news title as "positive", "negative", or "neutral" based on the overall score.
- Inserted the titles and performed automatic matching with score detection using the program.
- Structured the output in a tabular format for presentation.