This project examines the challenge of building machine learning models that can predict the rise and fall of the stock market based on what's in the news. For example, will stock prices rise with more positive news? Are there certain key phrases or words that impact the direction of market values? Are certain analytical approaches better for finding these relationships?
Our analysis found:
- The stock market is too unpredictable to say that headlines can reliably predict market gains/losses.
- Of the analysis techniques attempted, sentiment analysis came closest to providing signals for future stock market change over time.
- Of the topic modelling approaches used, Latent Dirichlet Allocation (LDA) provided better data for logistic regression analysis than Non-negative Matrix Factorization (NMF)...but both ultimately performed very poorly when attempting to predict stock market change.
- Topic models fitted to fast-changing data like news headlines become outdated quickly, and thus data pipelines should ensure that models are regularly being refitted.
This analysis follows a specific workflow:
sentiment analysis and topic modeling on news headlines -> enhance existing data ->
run regression and neural network models using headline topics and sentiments plus
market gains/losses to find relationships
You'll want to explore the analysis notebooks in this general order:
- Stock and news headline descriptive analysis
- NLTK (sentiment analysis)
- LDA / NMF (topic modeling)
- Logistic regression / recurrent neural networks
Notebooks containing Golden Cross and SVM analyses were used to help us understand performance benchmarks for market prediction techniques.
You can either download this repository as a zip file, or clone it locally using your favorite command line interface (typically Terminal on Mac or Git Bash on Windows) by running:
git clone [email protected]:micahvandersteen/project-3-team-ifrit.git
- Install the necessary libraries via your CLI. Note: If you are an Anaconda user, you may have most of these libraries pre-installed.
pip install pandas
pip install --user -U nltk
pip install jupyterlab
pip install -U scikit-learn
pip install gensim
python -m pip install -U matplotlib
- Start a Jupyter Notebook by typing
jupyter notebook
in your CLI, then navigate to your desired notebook within theNotebooks
folder of this repository. - Open and run the notebook.
Analysis results and what they mean are described on the project webpage.
- Adam Bilski
- Alan Riveros
- Brandon Uhler
- Julia Revier
- Katrina Koenders
- Micah Vandersteen
- Stacy Konkiel
Copyright for images included on the project webpage belongs to their respective owners.