Sentiment analysis of movie scripts from Hollywood

This repository provides code and supplementary materials for the paper titled 'Longitudinal Abuse and Sentiment Analysis of Hollywood Oscar and Blockbuster Movie Dialogues using LLMs'.

Seminar

Publication

Task Description

This project explores the trends in abusive language and sentiment in Hollywood movies from 1950 to 2024, with a focus on Oscar-nominated films and top 10 box-office hits. We utilize modern NLP models such as RoBERTa to conduct multi-label classification on movie subtitles, analyzing shifts in emotions and the use of abusive language across time and genres.

It examines changes in sentiment and abusive language in movie dialogues over 75 years, focusing on the influence of social and cultural shifts. Using large-scale language models (LLMs) fine-tuned on movie subtitles, we analyze various genres and decades to identify emotional trends in Hollywood.

Datasets

Movies Subtitles: Subtitles from over 1,000 films, including Oscar-nominated films and the top 10 box-office hits, were collected. These films were categorized into four genres: Action, Comedy, Drama, and Thriller.

SenWave Dataset: This dataset includes sentiment-labeled tweets from the COVID-19 pandemic period. It is used to fine-tune our sentiment classification model for multi-label classification across emotions like optimism, anxiety, and anger. Additionally, the SenWave dataset from GitHub was utilised: SenWave Dataset

RAL-E Dataset: A Reddit-based dataset used for detecting abusive language, focusing on offensive, hateful, or violent content. The dataset was crucial for fine-tuning our abuse detection models. The dataset we used comes from Tommaso Caselli's HateBERT paper:RAL-E Dataset

Models

N-Gram Analysis: We conducted an N-Gram analysis (bigrams, trigrams) to visualize the most frequent word sequences in movie dialogues over time. This helped identify thematic trends and shifts in sentiment.

BERT-based Models: We used pre-trained RoBERTa and HateBERT models for sentiment analysis and abuse detection. RoBERTa was fine-tuned using the SenWave dataset for sentiment analysis, while HateBERT was used to detect abusive language in movie dialogues.

Results

Sentiment Analysis Over Time

We performed sentiment analysis on movie dialogues from 1950 to 2024, identifying significant changes in emotional expression.

Sentiment Polarity Trends (1950-2024) The graph below shows the trend of sentiment polarity in movie dialogues over time, with sentiment polarity scores ranging from -1 to 1, where positive numbers represent positive emotions and negative numbers represent negative emotions.

Sentiment Weights by Decade The sentiment weights chart highlights the relative contribution of different emotions over the decades. Emotions like optimism, anger, and humor fluctuate in prominence across different time periods.

Abusive Language Detection

Abusive Word Frequency by Decade Abusive language frequency peaked in the 2000s and has since declined, reflecting changing societal norms.

Abusive Content Across Genres Action films show a low level of abusive content, while thrillers in the 1950s had the highest abusive word count.

Emotional Sentiment Co-occurrence

The heatmap below shows frequent co-occurrences of humor with anger, especially in comedies, reflecting the use of satire and conflict.

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
Category		Category
RoBERTa_model_plot		RoBERTa_model_plot
Year		Year
coding		coding
data		data
hateBERT_model_plot		hateBERT_model_plot
hatebert_train		hatebert_train
train_data		train_data
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
data_mining.py		data_mining.py
movie_inf_wiki.py		movie_inf_wiki.py
requirements.txt		requirements.txt
work_flow		work_flow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment analysis of movie scripts from Hollywood

Seminar

Publication

Task Description

Datasets

Models

Results

About

Releases

Packages

Contributors 4

Languages

sydney-machine-learning/sentimentanalysis-Hollywood

Folders and files

Latest commit

History

Repository files navigation

Sentiment analysis of movie scripts from Hollywood

Seminar

Publication

Task Description

Datasets

Models

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages