This folder consists the supplemental files for my Medium Post of Predict Stock Price with Time-Series Statistical Learning. In the post, we have made predictive models for Apple (AAPL) and Lockheed Martin (LMT). This folder consists of the code of predictive models train with the following algorithms:
- Adaptive model (Facebook Prophet)
- ARIMA model (pmdarima)
- Holts-winters method (Statsmodel)
To build a prototype predictive model for 2 stocks in order to build a model to predict 505 stock prices in the S&P 500 index. The prototype shall have a high R-square with the testing data between 2019 and May 2020.
The stock price is downloaded from Yahoo Finance and store in the relational database (Postgre) in local machine. You may find more detail on how the data is downloaded in the ETL Pipeline folder and the tables in the relational database in the Database Table folder.
There are 7 Python codes in this folder:
- Prediction_AAPL_arima.py
- Prediction_AAPL_fph.py
- Prediction_AAPL_hw.py
- Prediction_LMT_arima.py
- Prediction_LMT_fph.py
- Prediction_LMT_hw.py
- Graph.py
All the files started with "Prediction" are the codes for the predictive models trained with different algorithms. Graph.py is the helping code to generate line charts of the prediction of stock price.
There are 2 files trained with this algorithm: Prediction_AAPL_fph.py and Prediction_LMT_fph.py, which are the predictive models for Apple and Lockheed Martin, respectively. Facebook Prophet is used for model training.
Both files obtain 2 different time interval of stock price for model training:
- Between 1997 and 2018 (model_max)
- Between 2010 and 2018 (model_8yr)
The code first obtain data from the local database by querying with psycopg2. Then, change the column name to ds and y which is required by Facebook Prophet. After the model is trained, make a prediction and evaluate the accuracy with the testing data.
The R-square of model_max for Apple stock: Negative R-square
The R-square of model_8yr for Apple stock: 45%
The R-square of model_max for Lockheed Martin: 42%
The R-square of model_8yr for Lockheed Martin: 34%
There are 2 files trained with this algorithm: Prediction_AAPL_arima.py and Prediction_LMT_arima.py, which are the predictive models for Apple and Lockheed Martin, respectively. pmdarima is used for model training.
Both files obtain 2 different time interval of stock price for model training:
- Between 1997 and 2018 (model_max)
- Between 2010 and 2018 (model_8yr)
Fixed hyperparameter:
- m = 12
- seasonal = True
- p between 1 and 3
- q between 1 and 3
- P between 1 and 3
- Q between 1 and 3
- d between 1 and 3
- D between 1 and 3
The code first obtain data from the local database by querying with psycopg2. Then obtain the column of stock price and initalize the Arima model hyperparameters grid search by calling pmdarima.auto_arima(). The function will find the best hyperparameters and return the best model, prediction can be directly made from the returned object afterward. However, the grid search is very time consuming, each file of code is roughly take 30 minutes to run.
The R-square of model_max for Apple stock: Negative R-square
The R-square of model_8yr for Apple stock: Negative R-square
The R-square of model_max for Lockheed Martin: Negative R-square
The R-square of model_8yr for Lockheed Martin: Negative R-square
There are 2 files trained with this algorithm: Prediction_AAPL_hw.py and Prediction_LMT_hw.py, which are the predictive models for Apple and Lockheed Martin, respectively. Statsmodels is used for model training, Statsmodel.tsa.holtwinters.ExponentialSmoothing() is the function to be used.
The length of time interval for model training does not matter too much in this algorithm. Only the stock price of both stocks in 2018 is sufficient for the predictive model.
The code first obtain data from the local database by querying with psycopg2. Then obtain the column of stock price and declare an object by calling ExponentialSmoothing() with the following fixed hyperparameters:
- trend = 'mul'
- seasonal = 'mul'
- seasonal_periods = 4 (Each year has 4 quarters of business cycles)
The following hyperparameters are vary among stocks:
- smoothing_level (Alpha in the equation)
- smoothing_slope (Beta in the equation)
Both smoothing_level and smoothing_slope shall be obtained the best value through grid search, but the best value for both Apple and Lockheed Martin are (smoothin_level = 0.6, smoothing_slope = 0.25) and (smoothing_level = 0.8, smoothing_slope = 0.25), respectively.
The R-square of the model for Apple stock: 41%
The R-square of the model for Lockheed Martin: Negative R-square
Graph.py has 1 function: generate_line_chart() which generate a line chart to visualize the stock price of training data, testing data, and the predictive stock price from either models. The function is designed to visualize training data, testing data and at least 1 predictive stock price data set, the 2nd predictive stock price data set is optional. The layout is fixed, except the chart title. The x-axis is fixed to be between 2016 and 2020. The file relies on the following packages:
- pandas
- datetime
- Plotly