Reuter-News-Data-Analysis

This project involves processing Reuters news data, performing sentiment analysis on news titles, and storing the results in a MongoDB database.

Problem 1: Reuters News Data Reading & Transformation and Storing in MongoDB

Objective: To read news articles from provided files, extract relevant information, and store them in a MongoDB database.

Data Processing:
- A Java program (ReutRead.java) to scan the required text between <TITLE></TITLE> and tags within each tag.
- Extract the title and body of each news article.
Storing in MongoDB:
- Creating a MongoDB database named ReuterDb.
- Storing each news article as a document in the database.
- Each document should contain fields for title and body, structured as follows:

    {
  "title": "ADVANCED MAGNETICS ADMG IN AGREEMENT",
  "body": "Advanced Magnetics Inc said it reached a four mln dlrs research and development agreement with…"
    }

Problem 2: Sentiment Analysis using Bag-of-Words Model on Reuters News Titles

Objective: Performing sentiment analysis on news article titles using a Bag-of-Words (BOW) model.

Bag-of-Words Creation: Implemented a Java program to create a bag-of-words for each news title.
Comparison with Positive and Negative Words:
- Downloaded the lists of positive and negative words from online sources.
- Compared each word in the bag-of-words with the lists of positive and negative words.
- Performed word-by-word comparison to determine sentiment.
Tagging News Titles:
- Tagged each news title as "positive", "negative", or "neutral" based on the overall score.
- Inserted the titles and performed automatic matching with score detection using the program.
- Structured the output in a tabular format for presentation.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
output		output
src/main/java/org/example		src/main/java/org/example
.gitignore		.gitignore
A2-F23-CSCI5408.pdf		A2-F23-CSCI5408.pdf
README.md		README.md
Report.pdf		Report.pdf
pom.xml		pom.xml
reut2-009.sgm		reut2-009.sgm
reut2-014.sgm		reut2-014.sgm
titles.txt		titles.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reuter-News-Data-Analysis

Problem 1: Reuters News Data Reading & Transformation and Storing in MongoDB

Problem 2: Sentiment Analysis using Bag-of-Words Model on Reuters News Titles

About

Releases

Packages

Languages

shreyakapoor08/Reuter-News-Data-Analysis

Folders and files

Latest commit

History

Repository files navigation

Reuter-News-Data-Analysis

Problem 1: Reuters News Data Reading & Transformation and Storing in MongoDB

Problem 2: Sentiment Analysis using Bag-of-Words Model on Reuters News Titles

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages