Effectiveness Of Machine Learning Algorithms In Cryptocurrency Price Forecasting

Ravi Ranjan

Rowan University

Abstract

Machine learning algorithms have been very effective in conventional stock forecasting, and many organizations use machine learning with various attributes to improve accuracy. According to Research Gate, with machine learning, the prediction accuracy of conventional stocks could go up to 80% when used with political situation attributes.

However, predicting cryptocurrency prices with machine learning algorithms such as LSTM or neural networks is more complex. For example, not only do neural network or KNN models have multiple attributes to include, but they also depend upon the structure of digital currency. However, stablecoins have higher prediction accuracy than altcoins.

Nowadays there are numerous machine learning algorithm-based forecasting tools available that provide insights to traders or investors before they dive into crypto trading. The majority of frequent traders' decisions are based on a combination of forecasting results of algorithms and human analysis of the market and world events. Therefore, it’s important to analyze how effective machine learning algorithms are and what external factors could be used to improve price forecasting accuracy.

This paper’s focus will be to analyze the predictions that neural networks-based machine learning algorithms give for top cryptocurrency stock prices. The primary target attributes in the datasets for the particular cryptocurrencies are closing price, high price, low price, and volume. This paper will apply LSTM and XGBoost models.

Introduction

A Pew Research survey explained that 88% of the US general population have at least heard about cryptocurrencies and almost half (49%) are aware of non-fungible tokens (NFTs). The detailed survey was conducted on the 16% of people who invested in any of the many cryptocurrencies or NFTs. The results were that ¾ of investors acknowledged they invested because it’s easy to invest and a good way to make money. This could also be one reason for high expected returns.

However, cryptocurrencies are volatile, so if there’s just a slight change in price, it’s all over the news headlines. And in the past, even simply investing for a short period of time in crypto could lead to high returns. In 2022, it’s established that ¾ of grown investors have profited from cryptocurrency trading. This could be the reason why the majority of traders have started investing in cryptocurrencies along with conventional stocks.

But moving forward, traders and investors can benefit by incorporating advanced mathematics and machine learning to help in their price forecasts. Therefore, this paper will explore how machine learning algorithms can help investors set expectations for trading cryptocurrencies.

Project Structure

The proposed models in this research are evaluated using the mean absolute error (MSE), mean squared error (MSE) and root mean square error (RMSE). The first part covers the fundamentals of cryptocurrencies and machine learning methods. The second part delves into the several types of cryptocurrency and the two most popular alternative blockchain technologies, Litecoin (LTC) and Ethereum (ETH), as well as the goals of each. Lastly, this paper will discuss machine learning algorithms, including Artificial Neural Networks (ANN), Support Vector Machines (SVM), and Deep Learning, which are three of the most extensively used algorithms.

Machine Learning Algorithms

Machine learning algorithms can potentially have magical implications in the finance and investment sectors. Moreover, machine learning and artificial intelligence is a very important aspect behind the tremendous growth of cryptocurrencies and blockchain-based technologies. For example, financial institutions and trading firms are implementing machine learning algorithms to help improve customer experience, which has become more personalized and less time consuming.

Machine learning uses raw datasets, mostly the subset of the dataset, applies statistical models on that, and brings magical results in terms of insights and forecasting. Machine learning models learn from previous results, training sets, and data results, and are not required to be programmed individually. Machine learning engineers and data scientists train models on current data sets or by using two different available datasets – and once the model is well trained, they apply that to the real world dataset for various purposes.

Besides the forecasting of stock prices and cryptocurrencies prices, other machine learning applications in the financial domain include process automation, financial monitoring, securing transactions, risk management, and customer data management.

(Machine learning structure)

How Machine Learning Can Help Investors Make Better Trading Decisions?

To be successful in trading, it’s important to know the best time to invest, and having expertise in asset analysis is essential for that. The two main methods for determining future values are technical and fundamental analysis.

Technical analysis predicts the value based on available trading information like market cap, opening and closing price, and other currencies or stock performance. Fundamental analysis is primarily focused on external factors such as interest rates, geopolitical events, economic conditions, and influencer social media posts.

Investors have their own preference for doing crypto related investment research and analysis. For example, some prefer technical analysis over fundamental analysis – and new crypto investors prefer fundamental analysis.

However, nowadays when fundamental analysis triggers are rising, a smart investor is able to better combine that with technical analysis to make their investment decisions.

Hence, the purpose of this paper is to provide greater accuracy and results for doing technical analysis by using machine learning model predictions.

In particular, Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs) are two extensively used methods for anticipating price movement, and both have their learning patterns. ANNs have been widely used for prediction. Though researchers have examined a variety of challenges in ANNs, including parameter selection and training sets. Let’s take a look at both of their definitions and methodologies.

Support Vector Machine (SVM)

The Support Vector Machine (or SVM) is one of the most used and leading methodologies for supervised machine learning. It has been used primarily for outlier detection, regression, and classifications. This provides high accuracy for classification problems. SVM also provides both linear and nonlinear models. In terms of speed and execution time, linear SVMs outperform nonlinear SVMs – however, they fall short when dealing with complicated datasets with numerous training samples but few features. While nonlinear SVMs lose explanatory power, they appear to perform consistently across a wide range of issues, making them the preferred choice over linear SVMs.

SVM takes all data points into consideration and divides them by a plane or line called hyperplane. Though there could be multiple hyperplanes, the one with the highest distance is considered to be the best hyperplane. For linearly non-separable data, it uses kernel tricks to project data points into higher dimensional space and then the data points become easily separable.

In this project, we’ll implement SVM to forecast coin prices in the following way. The training data sets that we’re using to train our SVM model is the daily data of the particular coin – which is the coin’s open, close, highest, and lowest price. In addition, we’re training SVM classifiers on historical tail sets and then testing on recent data to see the accuracy – and after changing the training sets, repeating the same process and recording the accuracy. The combination with the highest accuracy is going to be the final training set for the SVM model – and based on that we’ll do a very short term prediction in future.

SVM performs effectively in a variety of applications, produces quick training results, and is simple to use. It has numerous advantages in a variety of domains, including pattern classification.

Artificial Neural Network (ANN)

Artificial Neural Networks (ANN) are a type of common neural network that does deep learning at its buried layers. It's a system that functions like a human brain, composed of interconnected and interacting processing nodes or neurons. It processes information by interacting with several simple processing features. ANN has three layers in their network architecture, input layer, hidden layer, and output layer. The conventional ANN only accepts numeric and structured data. Therefore, for non-numeric and unstructured data, we’ll use Convolutional Neural Networks (CNN) and Recursive Neural Networks (RNN).

In this project, we’re providing the set of input variables like opening value, closing value, and highest value along with actual output data to the ANN classifier. After that, we’re deciding the hidden layers and the number of iterations by hit and trial method to train the model.

Throughout the training process, nodes multiply by weight value on each iteration and store that node data into the final multilayer system. After training, the ANN is going to provide the output set which is supposed to be close to the actual data. After providing new sets of data to an already trained model, we’ll calculate the root mean square error by comparing the output data with actual data.

Data & Preliminary Analysis

The primary purpose of this project is to determine the best accurate predicting price and compare it with the actual output. In this project time series, daily historical data from September 2015 to September 2021 is being used as input for the models. However, the model is universal and will work with any kind of data from all time periods. The data is compiled from the daily open, close, high, and low prices for Bitcoin, Ethereum, and Dogecoin.

Training & Test Datasets

Dividing datasets into training and tests is a really important part that affects the model’s accuracy. This test initially tried different parameters for division like on the basis of date range, randomly, and others. We are adjusting the training and test data division based on model accuracy, for example if model accuracy is particularly low for any training sets of input then we are trying another date range to train the model. Following is the sample graph that will show the training and test datasets.

Bitcoin

Bitcoin, abbreviated as BTC, is the first digital currency and one with the largest market capitalization. Bitcoin training data is available from September 2015 to March 2020 and after that – from September 2021 on we’re using it as test data. Here we’re using two layers of LSTM to train the models, similarly in XGBoost we tried different combinations of date ranges for training data sets. We used 100 epocs in both LSTM and XGBoost to train the models.

The above graph is the training and test result. The black parts are the training data which we’re taking as training sets (Bitcoin before May 2021). The red portion indicates the test data which is after May 2021. Because we want to do future forecasting, that’s why we’re testing our model on the latest date.

Ethereum

Ethereum is the second largest digital currency after Bitcoin. In this project, we’re using data from September 2015 to September 2022. The division of training and tests is almost the same as Bitcoin, though we tried various combinations of time duration for training and test data to check the effect and accuracy of the model. XGBoost tried training and test datasets combinations of 80:20, 74:26, 84:16 and 62:38 for several epoc values.

As indicated in the graph, we are using data before June 2021 as training data and after June 2021 as test data to check the model accuracy. For the XGBoost model, the best accuracy is achieved by 80:20 split.

Dogecoin

Dogecoin is the world’s most famous memecoin. Therefore, with regards to determining its price, it depends more on fundamental analysis rather than technical analysis – which is the reason for choosing it for this test despite not being in the top five cryptocurrencies volume-wise.

The data we’re using in this project for Dogecoin is from September 2015 to September 2022. Training and test data are divided randomly and various combinations of time series are used to improve model accuracy. We had to go for various combinations in the train and test data for Dogecoin because these were having the least accuracy and maximum RMSE errors.

Like the other two coins, we decided to stick the same splitting patterns with Dogecoin (before June 2021 training data and after that, test data). As we know, Dogecoin's all time high value $0.69 happened in May 2021 – and both other coins have not shown that much spike in their data. The accuracy for both LSTM and XGBoost models are lowest in the case of Dogecoin.

Proposed Methodology Using LSTM & XGBoost

In this paper, we’re examining the forecasting of returns for Bitcoin, Ethereum and Dogecoin (using machine learning techniques). The framework has various models like random forest, stacked linear model, support vector machine, and artificial neural network. The reason for choosing LSTM and XGBoost is because they’re vastly used and reliable among machine learning engineers.

Long Short-Term Memory (LSTM)

The Recurrent Neural Networks (RNN) are the only neural network with internal memory features – hence it’s considered one of the most professional algorithms. Because this algorithm performs very well with sequential financial data, we’re using that in our cryptocurrency forecasting. The LSTM extends the neural network memory and makes that be able to learn from important experiences. Like computers, LSTM contains information in memory for a long time – and can read, write, and delete when needed.

In this project, we’re using stacked LSTM. The original or normal LSTM has one hidden layer of LSTM followed by a standard output layer. In stacked LSTM, there are several hidden layers and each layer contains multiple memory cells. Usually ML engineers use two layers of LSTM for complex features. However, it’s said that “the more layers used the more accurate it will be” – but adding more layers means it will be harder to train the model.

Extreme Gradient Boosting (XGBoost)

XGBoost is a supervised-learning algorithm used for both classification and regression, also commonly used in ranking problems. It has better handling of overfitting problems and provides results based on sequential based shallow trees. The two fundamental reasons to utilize XGBoost are execution speed and demonstrated execution. XGBoost handles both structured and unstructured datasets very accurately and provides robust classifications results. The commonly used classifier library in XGBoost is XGBRFClassifier and regression library is XGBRFRegressor and both are considered very effective with large data sets.

Results

After applying the above-given data to the LSTM and XGBoost model we achieve the following results for mean absolute error (MSE) and mean squared error (MSE).

(Table 1 : Results of LSTM and XGBoost models)

The mean absolute error (MSE) and mean squared error (MSE) are one the best metrics to use to evaluate the accuracy of a model. The goal of machine learning algorithms is to minimize the loss functions such as MSE and MAE. As a loss function, MSE alone is sufficient to evaluate the accuracy of the model, but sometimes it’s considered to be biased for large error and outliers – hence, that’s the reason why we also calculated MAE, which is considered less biased for outliers.

1 LSTM Bitcoin Prediction

As we can see in the graph above, LSTM predicted pretty accurately, the orange line is the predicted price of the graph by model and the blue line is the actual data. Based on Table 1, and the result of MSE for Bitcoin, we can tell that our model accuracy was very high – the less the error value the higher the accuracy. It has already been established that in the case of Bitcoin and LSTM, more than 85% of accuracy could be achieved.

2 LSTM Ethereum Prediction

In the graph above we can see the predicted price versus the actual price of Ethereum. The orange line, which is the predicted price is quite close to the blue line, which is the actual price until April 2021, then there are some differences between data. From Table 1, the LSTM-MSE value of Ethereum is 0.0072, which is not as low as Bitcoin but still very accurate and gives around 82% of the accuracy.

3 LSTM Dogecoin Prediction

As the graph above indicates, the predicted price versus the actual closing price, Dogecoin has the lowest accuracy of around 52% compared to Bitcoin and Ethereum. From Table 1, we can see the MSE value for Dogecoin is 0.0251, which is one decimal point more than Bitcoin and Ethereum. The reason behind that is the sudden spike in Dogecoin’s price in May 2021.

LSTM Analysis

Here we’re comparing present results with the actual results for the above cryptocurrencies to approve the LSTM model’s performance. It’s been observed that the SVM is the highest-performing classifier and their accuracy could go above 90% for some coins. The classifiers were given an identical dataset of 364 days and they were predicting for 1 day. The overall performance of LSTM models for cryptocurrency coins is very good – with an average accuracy score of 85% (the highest can go up to 95%). With the enhanced LSTM model and proper use of training and test data, the LSTM model has solid trading results that could be beneficial for investors.

1 XGBoost Bitcoin Prediction

As indicated in the graph above, the green line is the prediction for closing prices of Bitcoin and the blue line is the actual closing prices. From Table 1, the MSE value of XGBoost for Bitcoin is 0.036 and has shown relative accuracy of 63%.

2 XGBoost Ethereum Prediction

The MSE value of the XGBoost model for Ethereum is 0.070, which is little higher than Bitcoin. That’s why we can see there are some gaps between the blue and green line in the above mentioned graph. The overall accuracy percentage for Ethereum by XGBoost model is around 56%.

3 XGBoost Dogecoin Prediction

Dogecoin has a 0.215 MSE value, which is the highest for the XGBoost model and the LSTM model. The reason behind this is the sudden spike in Dogecoin closing prices. As we can observe from the graph above, there are differences between the original closing (blue line) price and predicted closing price (green line). The accuracy of the XGBoost model in the case of Dogecoin can go up to 44%.

XGBoost Analysis

XGBoost is a cutting edge algorithm and the proposed model for all cryptocurrencies performs very well for stable prices. There are some inconsistencies when it comes to slight surges in prices or slight jumps in prices. Since cryptocurrency prices are mostly about sudden rises and falls, that’s why XGBoost is not that accurate and it has an average accuracy of 54% for cryptocurrency forecasting. In case of the continuous performance of coins, XGBoost is a well suited algorithm and can produce reasonable results.

Comparing The Test Results From LSTM & XGBoost

Both LSTM and XGBoost algorithms have their advantages and disadvantages – for example, XGBoost is faster than LSTM in execution time. In this project, we witnessed that LSTM accuracy is better than XGBoost. From Table 1, the average MSE score for LSTM is 0.0126 and the average MSE score for XGBoost is 0.1075 – which converts into an average accuracy of 85% and 54% respectively. Hence, we can say that on the one hand, models like XGBoost are more accurate in the cases where we have additional information as an input rather than historical data, but on the other hand, LSTM could be used with or without additional information input. Some other methods like ARIMA and Prophet have also quite accurate and promising results for cryptocurrency forecasting.

Conclusion

This article focused on three major cryptocurrencies Bitcoin, Ethereum, and Dogecoin doing random classifications and using neural networks to do price forecasting. Also, we have analyzed previous research in the field of classifications, regression, and machine learning. The results show that in classifiers, SVM outperforms the other – and between LSTM and XGBoost, LSTM delivers slightly highly accurate forecasting.

Moreover, cryptos with less spike in the closing price graph and less dependency on fundamental analysis factors have the highest chance of getting accurate predictions by ML algorithms.

In conclusion, factors like attribute space, accurate division of training and test data, proper selection and setting of parameters, and better available methods derive one ML model with high accuracy – and the shorter the duration of prediction will give higher accuracy of the result.

References

Bontempi, G., Taieb, S. Ben, & Borgne, L. (2013). Machine Learning Strategies for Time Series Forecasting, 62–77.
Ahmed, N. K., Atiya, A. F., El Gayar, N., & El-Shishiny, H. (2010). An empirical comparison of machine learning models for time series forecasting. Econometric Reviews, 29(5), 594–621. https://doi.org/10.1080/07474938.2010.481556
Kongsilp, W., Mateus, C., Huang, M., Ting-ting, Z., Wan-yi, C., Maita, A. R. C., … de Carvalho, A. F. (2015). Prediction of Stock Trading Signal Based on Support Vector Machine. Engineering Computations, 32(1), 445–463. https://doi.org/10.1108/02644401311286099
Modern Recurrent Neural Networks, Long Short-Term Memory (N.A), Dive into deep learning, https://d2l.ai/chapter_recurrent-modern/lstm.html
Helder Sebastião & Pedro Godinho (2021), Forecasting and trading cryptocurrencies with machine learning under changing market conditions, Financial Innovation, https://jfin-swufe.springeropen.com/articles/10.1186/s40854-020-00217-x

Effectiveness Of Machine Learning Algorithms In Cryptocurrency Price Forecasting

Get The Latest News And Updates From Phemex!

Abstract

Introduction

Project Structure

Machine Learning Algorithms

How Machine Learning Can Help Investors Make Better Trading Decisions?

Support Vector Machine (SVM)

Artificial Neural Network (ANN)

Data & Preliminary Analysis

Training & Test Datasets

Bitcoin

Ethereum

Dogecoin

Proposed Methodology Using LSTM & XGBoost

Long Short-Term Memory (LSTM)

Extreme Gradient Boosting (XGBoost)

Results

1 LSTM Bitcoin Prediction

2 LSTM Ethereum Prediction

3 LSTM Dogecoin Prediction

LSTM Analysis

1 XGBoost Bitcoin Prediction

2 XGBoost Ethereum Prediction

3 XGBoost Dogecoin Prediction

XGBoost Analysis

Comparing The Test Results From LSTM & XGBoost

Conclusion

References

Get The Latest News And Updates From Phemex!

Register On Phemex Now To Begin Trading

Register On Phemex

Get The Latest News And Updates From Phemex!

Get The Latest News & Updates!