Skip to content

My task was to help a journalist with data analysis to investigate the usage of certain words in newspaper articles. Media analysis focuses on text mining and therefore, our data will be text based. The goal is to run topic modeling on the data. We don’t need to know how topic modeling works, since we will just have to run a function.

Notifications You must be signed in to change notification settings

gokulbalagopal/Comprehensive-Analysis-of-Syrian-Conflict-News-Articles-using-NLP

Repository files navigation

##Instructions

Set the working directory in the main.R file and please install, the following packages before you proceed: tm, stringr, wordcloud, SnowballC, RColorBrewer, ggplot2, tidytext, topicmodels

##Implementation

• Loadeded the data.

• Splitted the input data.

• Built a corpus object containing all splitted articles

• Cleaned up the corpus

• Created a document term matrix as a part of the data exploration and stored the output.

• Created some plots for exploration

• Ran the topic model function and stored the model.

##List of Rfiles

1.main.R - the main script will first make a call to the function in dataLoading.R for the loading and splitting. After that, the script should contain all the code for the cleaning, creation of the document term matrix, summaries and write the first output file . It should then call 2 functions in plots.R before it creates the topic model and the last two output files.

2.dataLoading.R - contains a single function that will read the data file, split up into articles and return the splitted articles . The function should have one argument - the path to the data file.

3.plots.R - contains 2 functions. A) a function that will create a historgram of terms and B) a function that will create a wordcloud plot of terms. Each of the functions will have 3 arguments - the terms, the number of occurances for each of the terms (frequencies) and the limit (e.g the histogram should only be created for terms that occur more than a 100 times, the wordcloud only for terms that occur more than 70 times).

About

My task was to help a journalist with data analysis to investigate the usage of certain words in newspaper articles. Media analysis focuses on text mining and therefore, our data will be text based. The goal is to run topic modeling on the data. We don’t need to know how topic modeling works, since we will just have to run a function.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages