Team 1 - INDENG 290 - 21 Spring at Cal
Cryptocurrency is one of the hottest fields in investment, promising immense opportunity for those willing to take risks. While the price of Bitcoin skyrocketed from $5,000 to $60,000 and the daily trading volume reached a billion-dollar level in the past year, people’s interest in trading and swapping crypto assets has grown exponentially. However, the process is not as easy as it sounds - investors are likely to lose money due to a lack of knowledge of the market and the fast-changing exchange rates. Under the guidance of AnChain.ai, six students from the University of California, Berkeley gathered to create a platform that could recommend arbitraging strategies based on users’ preferences and boost people’s portfolios, making it less risky for cryptocurrency investors.
Pulling the latest transaction data with token prices and exchange rates between pairs from Uniswap, the team did behavioral analyses to study the pattern behind extreme value fluctuations to find out the most appropriate scenarios for utilizing the spreads of prices between different currencies. Focusing on the most popular currencies, the next step is to perform token price predictions using historical data through time series models and input the real-time ones into their detection model built on the depth-first search algorithm. The whole process would calculate and produce the optimal investment volume and output potential routes with respective profits. We are still working on building an interactive platform where results will be displayed and listed in users’ preferences with clearity, helping them make better decisions and enjoy their journey of trading. See our demo UI here.
The project consists of three parts - Datasets, API, and Models.
Our data came from two sources. We fetched USDC Token's daily transaction data of the recent year using the GraphQL language on TheGraph Uniswap-V2. The other part was downloaded from Yahoo Finance. The file USDC-USD.csv (Oct 18 to April 21) was solely from Yahoo and was used for anomaly detection and outlier identification, whereas the file USDC Price.csv was two-source concatenated (Uniswap (May 20 to April 21) and USDC 3year.csv (Oct 18 to May 20) from Yahoo) and was used for time series predictive models. sample_uniswap_record.csv was pulled by our API, which would be introduced later, from the Uniswap platform. In addition, in the folder exist jupyter notebooks of our data pre-processing and EDA codes.
We programmed an API that could automatically request and download within-a-day transaction data between a pair of tokens on Uniswap, given the token ID and Unix timestamp. It would output transaction IDs, volumes, and equivalent USD price. It was referenced when we located the specific dates with abnormal prices and performed intra-day outlier detections.
The Models folder includes all four time series predictive models (ARIMA, SARIMA, Prophet, and LSTM) that we used to predict future USDC token price using the past 3 years' data. The files for the Depth-First-Search (DFS) algorithm could also be found here.
Here is a step-by-step introduction on how to reproduce our work:
- Download all files in the three folders to your local environment.
- Open Raw Data Preparation.ipynb and use the GraphQL query in the first box to fetch the USDC Token's daily transaction data of the recent year, as shown in the second box. Load the Yahoo Finance data USDC 3years.csv and contatenate two datasets into USDC Price.csv.
- Run Exploratory Data Analysis.ipynb to perform EDA and identify real-life arbitraging transactions on your dates of prederence that are classified as outlier dates. After that, feel free to go to Etherscan to see transaction details using the transaction IDs given in the end. For intra-day analysis, please use the API we prepared to get transactions within a specific day.
- With the DFS algorithm and the supportive document that records exchange rates between pairs, one can imput a token name to get arbitraging recommendations with arbitrage routes and optimal volumes with respective estimated profits. There is also a choice to limit the number of tokens involved and therefore limit the length of routes suggested. Sample output is shown here.
- Run the USDC ARIMA/SARIMA/Prophet/LSTM files in the Models folder to compare the performances of the models' ability to predict future token price through error measures and graphs.
The data collection stage of our project involves the usage of Uniswap API. Details could be found here.
We would like to thank Dr. Victor Fang from AnChain.AI for his guidance and support.
- Zhiyang Han | [email protected]
- Zihao Zhou | [email protected]
- Lei Liang | [email protected]
- Kexin Fang | [email protected]
- Shuyang Yu | [email protected]
- Yuying Chen | [email protected]
© Apache 2.0