Overview

Project Focus:

Analyze MKR price volatility and predict its price changes using machine learning models

MKR Overview:

Native governance token of the MakerDAO ecosystem
MakerDAO supports DAI, a decentralized stablecoin pegged to the US dollar
MKR holders influence decision-making within the ecosystem

DAI Importance:

Stability is crucial for decentralized finance (DeFi) applications
DAI is created by locking collateral in MakerDAO smart contracts

Value of Predicting MKR Price:

MKR impacts the health of DAI and MakerDAO's protocol
Accurate predictions provide insights into market sentiment, governance decisions, and DAI stability
Beneficial for DeFi participants and investors

Prediction Target (y):

y represents the percentage change in the closing price of MKR over consecutive time periods
It is calculated as:

y = (close - close_lag_1) / close_lag_1

close: Closing price of MKR at the current time step
close_lag_1: Closing price of MKR at the previous time step
Continuous variable representing the relative change in MKR price
Positive values = price increase; negative values = price decrease
Values expressed as decimals (e.g., 0.05 = 5% increase, -0.03 = 3% decrease)

Why Use Percentage Change:

Normalizes price movements, reducing sensitivity to absolute price levels
Captures relative price movements, essential for understanding volatility and predicting trends in a highly volatile asset like MKR

Importance of Analyzing Percentage Change:

Reveals patterns in MKR's volatility and behavior
Highlights MKR's impact on DAI stability and the MakerDAO ecosystem
Provides actionable insights for DeFi participants and investors

Features

Crypto Data Fetcher:

Retrieves OHLC data for selected cryptocurrencies and stablecoins using the Binance and Kraken API
Includes additional derived metrics and timezone conversion

Stock Data Fetcher:

Fetches hourly stock data for predefined tickers using Yahoo Finance
Enriches data with calculated metrics

Feature Engineering:

Creates various technical features and custom calculations
Utilizes the TA-Lib library for advanced technical analysis

MKRUSDT Analysis:

Focuses on the governance token MKR
Examines factors influencing its price growth
Uses machine learning models to predict the target variable (y)

Machine Learning Models:

Implements models such as Linear Regression (LR), Decision Trees (DT), Random Forest (RF), and XGBoost
Designed to predict MKR price trends

Flask:

Included for programmatic interaction with the data
Optional, suitable for deployment

Docker Support:

A Dockerfile is provided for easy deployment in containerized environments

Datasets

Crypto Data:
Link to Cryptocurrencies Binance Dataset

Link to Cryptocurrencies Kraken Dataset

Stock Data:
Link to Stocks Dataset

Merged Data with Features:
Link to merged Dataset

Structure

stable_coin/  
├── images/                                     # Contains the images that are generated through EDA     
│   ├── boxplot_mkr.png      
│   ├── correlation_matrix_mkr.png   
│   ├── distribution_price_change.png            
│   ├── timeseries_mkrusdt.png      
│   ├── timeseries_eur.png    
│   ├── price_change_correlation_with_volume.png 
├── README.md                      
├── notebook.ipynb/       
│   ├── get_coins                               # Fetches and processes cryptocurrency data     
│   ├── get_stocks                              # Fetches and processes stock data      
│   ├── feature engineering                     # Adds derived metrics for ML models      
│   ├── model evaluation and tuning             # Compares models and saves the best as a pickle file      
├── train.py                                    # Trains the best model            
├── predict.py                                  # Flask application for making predictions       
├── requirements.txt                            # List of required Python packages       
├── environment.yml                             # Conda environment file     
├── LICENSE      
├── Dockerfile                                  # For containerized deployment

Data Exploration:

MKRUSDT (Maker):
Number of data points: 1862
Close Price:
Mean: 1566.29
Min: 1063.00
Max: 2411.00
Standard Deviation: 298.14

Price Change:
Mean: 0.50%
Max: 6.62%
Min: -4.03%

Volume:
Mean: 572.94
Min: 16.25
Max: 8915.15

7-Day Moving Average (7d_ma):
Mean: 1564.15
Min: 201.77
Max: 2374.29

7-Day Volatility:
Mean: 17.30%
Max: 751.05%

DAIUSD (DAI Stablecoin):
Number of data points: 613
Maximum price change: 0.47%

Correlation for MKRUSDT

Key observations:

7d_ma and 30d_ma: Highly correlated with close and open, indicating their importance for identifying price trends
atr is moderately correlated with price indicators, emphasizing its role in volatility analysis
Indicators like adx and rsi have weak correlations with price-related variables but useful for providing additional signals
volume' and volume_change are moderately correlated with certain price metrics, making them valuable for demand-supply analysis
growth_future_1h and growth_future_24h have weak correlations with other features, suggesting they may be challenging targets to predict directly
Feature Combination: Moving averages, volatility, and volume-based features form a strong foundation for predicting MKR price trends

Boxplot for Closing Prices for MKRUSDT

Key observations:

Median price around 1500
Outliers visible around 2200, indicating occasional price spikes
Spread and Support Level: Moderate spread within the core trading range, lower whisker extends to ~1000, suggesting a historical support level
Interquartile Range (IQR): Middle 50% of price activity is relatively concentrated
Overall Pattern: Indicates a stable trading range with occasional upside volatility

Timeseries for MKRUSDT and DAIUSD

Key observations for MKRUSDT:

Starting Point: Began around $1200 with initial sideways movement until early November
Upward Trend: Strong rally from November to early December, peaking at ~$2400 in early December
December Volatility: Multiple peaks above $2000, significant price fluctuations
Downward Trend: Gradual decline since mid-December, currently trading around $1400 with bearish momentum
Overall Range: $1000–2400, with most activity between $1400–2000
Market Pattern: Suggests a completed pump-and-distribution phase

Key observations for DAIUSD:

Price Stability: Consistent around $1.00, typical for a stablecoin
Minimal Volatility: Fluctuations mostly within the $0.999–$1.001 range
Notable Spikes: Brief spike to $1.005 on January 9th, small spike to $1.002 on January 1st
Peg Stability: Maintains excellent peg stability around $1.00
Recent Activity (Jan 9–13): Slightly increased volatility, but remains within acceptable ranges

Distribution of Price Change for MKRUSDT

Key observations:

Distribution Shape: Appears normal (bell-shaped) and centered around 0, indicating balanced price movements
Most Frequent Changes: Small fluctuations, typically between -1 and +1
Outliers: A few extreme positive outliers, reaching up to +6
Tails: Distribution tails extend from roughly -4 to +6
Peak Frequency: Most frequent changes occur around 250 occurrences for the smallest price movements

Machine Learning Models

Target Variable Analysis: y

Mean y: 0.0557
Standard Deviation y: 5.3460
Key observation:

Illustrates the frequency distribution of the target variable y across the Train, Validation, and Test datasets
Majority of values concentrated near 0
Extreme outliers present, with some values exceeding 5000
Distribution is highly skewed, with most values in a small range and a few significantly larger ones
Extreme outliers may adversely affect the model by increasing error and reducing prediction accuracy
Skewness suggests the model might face difficulty in accurately predicting y

Key observations:

Interquartile Range (IQR) is small, suggesting that most data points are closely clustered
Numerous strong outliers exceed 1000
This aligns with the histogram: the majority of values are small, with a few extreme values
These outliers can significantly distort metrics like MSE and RMSE during training and validation
Next Step: further analysis to decide whether to remove or transform the outliers

Transformation of y

Addressing the skewness and extreme outliers, a logarithmic transformation was applied to y:

y_train_log = np.log1p(y_train)
y_val_log = np.log1p(y_val)
y_test_log = np.log1p(y_test)

Linear Regression (LR)

Features ppo, trix, atr show positive accuracy drop, meaning removing these features decreases model accuracy
Features like sma20, cci, and roc show negative accuracy drop, meaning removing these features could improves model accuracy
Various features have no influence on the accuracy and could be considered for removal

Key Observations:

Most predictions are centered around 0, with a sharp peak and minimal spread
Indicates that the model is predicting a narrow range of values, which could suggest underfitting or that the target variable has a limited variance

Decision Trees (DT)

Key observations:

Higher values for min_samples_leaf (13-15) yield better results, with MSE around 12.9
max_depth has minimal impact, with stable MSE across depths, except for min_samples_leaf=1, where a significant deterioration occurs at depth=4 (MSE spikes to ~34)
Optimal Configuration: max_depth=4, min_samples_leaf=13, achieving an MSE of 12.9
Interpretation: The model benefits from higher leaf sample restrictions, preventing overfitting

Random Forest (RF)

Key observations:

Best number of trees (n_estimators): 180, achieving an MSE of ~2900.731
Best max_depth: 10, balancing between underfitting (depth=5) and overfitting (depth=15)
Best min_samples_leaf: 1, enabling the model to capture detailed patterns
Final model performance: MSE: 2870.841, RMSE: 53.580, R² Score: 0.085
While RMSE is reasonable, the low R² score (8.5%) indicates the model has limited explanatory power

Key observations:

The residuals (in log scale) are mostly clustered around zero, indicating accurate model predictions in this scale
A few residuals deviate from zero, suggesting areas where the model struggles with accuracy
No clear pattern in the residuals, indicating that the model is well-calibrated in the log scale
Despite the well-calibrated residuals in the log scale, the low R² suggests that the model’s explanatory power remains limited

XGBoost

Key observations:

Majority of points cluster around the diagonal (pink dashed line), indicating model predictions generally align with actual values
Strong linear relationship between predicted and actual values, suggesting the model captures underlying patterns well
Most data points are concentrated around the 0 value on both axes
Data spans from approximately -2 to 6 on both axes, with sparse points in higher value ranges (4-6)
Some outliers visible, particularly around (-2, 0) and (6, 0)
Slight tendency to underpredict at extreme values
Sparsity at higher values: Indicates less reliable predictions in these ranges
Generating more training data for extreme value ranges (4-6) could improve model reliability in these areas

Key observations:

Dominant Features: Technical indicators like trix, roc, ppo, cmo, cci, and bop dominate, suggesting strong predictive power for 1-hour predictions
Time-Based Features: hour, day, month, and year show very low importance, indicating that price movements are more influenced by technical factors than by time (but consider the short time-frame of 3m)
Cross-Crypto Correlation: Most cryptocurrency tickers (e.g., BTCUSDT, ETHUSDT) have minimal impact, showing limited cross-crypto correlation in 1-hour predictions
Traditional Market Indicators: Indicators like ^SPX and ^VIX show low importance, with limited correlation to traditional markets
Fibonacci Levels: Surprisingly low importance across all timeframes, despite their common use in technical analysis
Focusing on technical indicators is more valuable than time-based or cross-crypto features, while traditional market indicators and Fibonacci levels have limited predictive power

So I considered model simplification by focusing on top 10-15 features and looked at SHAP

Key observations:

SHAP Analysis: Provides insight into how individual feature values influence predictions
Each dot represents a data point, with color indicating feature value (blue = low, red = high)
For "price_change", high values (red dots) positively push predictions, while low values (blue dots) negatively influence them
Features like cci and roc exhibit a mix of positive and negative effects, suggesting non-linear relationships with the target variable
Features with narrow SHAP value distributions, like day or ln_volume, have a limited effect on predictions across the dataset

Retraining Result:

Surprisingly, retraining the model with the most important features resulted in lower scores

The best model is:

XGBoost

Best Hyperparameters:
- eta 0.250000
- max_depth 5.000000
- min_child_weight 1.000000
- rmse 1.467122
- MSE on the Validation Set: 2.1524
- MAE on the Validation Set: 0.0163
- R² Score on the Validation Set: 0.9247

Installation

Clone the repository:

  git clone https://github.com/your-repo.git
  cd your-repo

Set up the environment Using Conda:

  conda env create -f environment.yml
  conda activate your-environment-name

Using pip:

  pip install -r requirements.txt

Installation Instructions for TA-Lib

TA-Lib library is required for this project but is not installed automatically via the environment.yaml file
You need to install it manually due to potential platform-specific compilation requirements

To install TA-Lib, follow these steps:

Using Conda (Recommended):

conda install -c conda-forge ta-lib

Using pip: If you prefer pip, ensure you have the required dependencies installed and run:

pip install TA-Lib

On macOS with Homebrew: First, install the TA-Lib C library:

brew install ta-lib

Then install the Python wrapper:

pip install TA-Lib

On Linux: Install the required development library (e.g., for Ubuntu):

sudo apt-get install libta-lib-dev

Then install the Python wrapper:

pip install TA-Lib

On Windows: Download and install the precompiled binaries for your system from the TA-Lib website, then install the Python wrapper:

pip install TA-Lib

Make sure TA-Lib is installed before running the application. If you encounter any issues, refer to the TA-Lib documentation for further assistance

How to Use

Jupyter Notebook (notebook.ipynb):
- Fetch cryptocurrency and stock market data:
  - Fetches data for cryptocurrencies and stablecoins defined in the coins list.
  - Processes the data and adds derived metrics (e.g., price change).
  - Saves the final dataset as stable_coins.csv.
- Fetch hourly stock data for predefined tickers:
  - Adds derived metrics.
  - Formats timestamps.
  - Combines all stock data into a single DataFrame and saves it as a CSV.
  - Logs missing or delisted stocks/cryptos as warnings or errors.
- Perform feature engineering and derive metrics.
- Evaluate multiple machine learning models.
- Save the best models as .pkl files.
Train the Model:
- Use train.py to train the best-performing model (default: XGBoost) on the processed data.
- Save the trained model as a .pkl file.
Deploy the Model with Flask:
- Use predict.py to deploy the model and provide predictions via Flask.

Flask

The repository includes a Flask (predict.py) to interact with the trained XGBoost model. The API allows users to predict the 'close' within the next hour.
Steps to Use:
- Start the Flask Server
- Ensure the conda environment is active and run:

    python predict.py --port=<PORT>

Replace with the desired port number (e.g., 5001). If no port is specified, the server defaults to port 8000
Example:

    python predict.py --port=5001

The server runs at http://0.0.0.0:, for example: http://0.0.0.0:5001

Make Predictions

Send an HTTP POST request with the input features as JSON to the /predict endpoint. Replace with the port you specified earlier
Example Input:

curl -X POST http://127.0.0.1:5001/predict \
-H "Content-Type: application/json" \
-d '{"ln_volume": -25.422721545090816, "bop": -0.44444, "ppo": 1.0097517730496455}'

Example Response:

{
  "predicted_growth_rate": -0.002064734697341919
}

Run with Docker

To simplify deployment, a Dockerfile is provided. To build and run the Docker container:
Build the Docker image:

docker pull continuumio/anaconda3
docker build -t mkr-coin-analysis .

Run the container:

docker run -p 5001:5001 mkr-coin-analysis

License

This project is open-source and licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
images		images
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
features.pkl		features.pkl
final_xgboost_model.pkl		final_xgboost_model.pkl
model_xgb.pkl		model_xgb.pkl
notebook.ipynb		notebook.ipynb
predict.py		predict.py
requirements.txt		requirements.txt
train.py		train.py

License

pynat/mkr_coin

Folders and files

Latest commit

History

Repository files navigation

Overview

Project Focus:

MKR Overview:

DAI Importance:

Value of Predicting MKR Price:

Prediction Target (y):

Why Use Percentage Change:

Importance of Analyzing Percentage Change:

Features

Crypto Data Fetcher:

Stock Data Fetcher:

Feature Engineering:

MKRUSDT Analysis:

Machine Learning Models:

Flask:

Docker Support:

Datasets

Structure

Data Exploration:

Correlation for MKRUSDT

Boxplot for Closing Prices for MKRUSDT

Timeseries for MKRUSDT and DAIUSD

Distribution of Price Change for MKRUSDT

Machine Learning Models

Target Variable Analysis: y

Transformation of y

Linear Regression (LR)

Decision Trees (DT)

Random Forest (RF)

XGBoost

Retraining Result:

The best model is:

XGBoost

Installation

Installation Instructions for TA-Lib

To install TA-Lib, follow these steps:

How to Use

Flask

Run with Docker

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages