By Ljubica Vujovic, SmartCat.
What will Bitcoin do
When asked what the stock market will do, J.P. Morgan replied “It will fluctuate.”. If we could hypothetically ask Mr. Morgan another question, very popular these days, I bet his answer would be “It will fluctuate a lot”.
Of course, the question is about the most hyped thing these days after Deep Learning:
"What will Bitcoin do?"
SmartCat team answers that question with mathematical precision, so by the second paragraph you will start trading and by the end of the post, you will be rich. :)
And by the beginning of this sentence, you’ve probably realized I was joking.
After reading a lot about cryptos, listening to many failure scenarios and blooming future prospects, I was extremely impressed how little we know about main influence factors and how poorly we can quantify risks within investing in Bitcoin. Although the idea of decentralized currency is something I truly believe can bring benefits to the economy and reallocate global resources, before going “all in”, we definitely need to understand better what makes Bitcoin hit the ceiling.
FUD and FBD (Fear, Uncertainty and Doubt and Fundamental Bitcoin Data)
Let’s take a good look at Twitter and assume a possible reaction to these tweets.
"Oh, poor Woz, he was firstly enticed by Jobs, and now deceived by this bad Bitcoin gang. Too fraudulent for me, I'm not gonna have anything with these wanna be currencies! "
"Good job Germany! I knew someone will start noticing the potential and prosperity Cryptos can bring! I should invest anytime soon!"
Although exaggerated, these two examples represent some part of public sentiment which can be an incentive to buy or sell. These actions make the market go up or down, so tracking the sentiment could be a useful price predictor. Our first approach relies exactly on this fact. We tracked many people who proved to be big shots in the crypto world a.k.a influencers on Twitter, collected their tweets and estimated the sentiment. We decided to use Twitter as a social platform for this usage because its content is in micro blog format, which is convenient for processing, but any other network or source of information could be added.
One of the possible ways to estimate the sentiment of the sentence is to train a model on labeled data. In theory, we could read every tweet and label it with a number from -1 to 1 based on our personal impression. This is followed by training, tuning, testing and voila, the estimator is ready. This process is extremely time and energy consuming so we wanted to do better.
The next idea was to use a pretrained model, but the main problem with pretrained models is that their training set is a domain-specific corpus which means that we would be using a model trained on data with a different distribution than our data.
In the end, the solution was to use a rule-based sentiment analysis tool. Most of the features were inspired by the work from trading, but some of the features were the results of our own intuition.
For example, one of the very interesting insights is brought with the feature constructed as a ratio of mean and standard deviation of positive score of tweets based on previous 30 days, which is shown below.
This feature had a high correlation with the Bitcoin price (0.86 for a year) and succeeded to precede to some big price leaps. Although we are aware that correlation is not causation, we can provide an interpretation of this result. If standard deviation is high, which means that the sentiment is highly volatile, even if the overall score is positive, we are not convinced that the Bitcoin market is doing well. However, if everything seems stable and if most of the people are talking good, this will be captured by a high value of this feature and this kind of situation has potential for a bullish scenario. Finally, this may be our way of quantifying FUD :)
On the other hand, we also believe that, besides news and speculations, some very solid things could bring valuable information to our models. Blockchain data is a set of time series which represents diverse things related to Blockchain and Bitcoin. We used these time series also as features of our models.
With all prepared data, the next step is … machine learning.
Firstly, we performed training on a daily level. This was a straightforward decision because one of the data sources was on that granularity. In the previous step we created as many features as we could, so it is very probable that some of them are not useful or are highly correlated with each other. In order to have a robust model with good generalization power, we put some effort in reducing the number of features. Feature selection methods were used to choose a set of features which bring the best metrics. In the first fifteen most important features, according to these methods, ten of them were features created with the sentiment from Twitter.
Then, we plunged these data into several models starting from SVM to ensemble models. In the end, we presented our results with interactive Superset dashboards. You can read more about Superset setup read here.
Comparing our classification accuracy with several baselines, including the one which predicts the same as previous day and the one which randomly predicts class, we beat them with overall accuracy of 72%. Dashboard with results is live with user guest/guest so you can check it out any time you want and the preview of the Bitcoin Trade Signals dashboard is shown below.
Now comes the catchy thing or the big question.
"How can we use outputs from our models to decide whether to buy or sell?"
One of the most simple answers would be to use signals from our dashboard in order to make a better informed decision. But we look for more than just reading numbers and following the gut feeling. We want an automated trader powered by our predictions. The idea was to improve some of the existing trading strategies, for example SMACrossover. SMA Crossover stands for Simple Moving Average Crossover. The main idea behind this strategy is to follow two time series: Short Moving Average series and Long Moving Average series of price. Let’s assume short MA is MA26 which is just a mean of the last twenty six samples of price and that long MA is MA100. As shown on the picture bellow, if short MA is crossing below the long MA, this is a signal for selling and vice versa, if short MA is crossing above long MA this is a signal for buying.
We chose this strategy because it’s simple and interpretable, but our approach could be implemented with far more complex strategies. One of the possible improvements of SMA Crossover with outputs from our classification models is compared to the baseline strategy and in average on daily level, for several test periods in history, it beats the baseline with a 16.9% bigger return.
In a certain manner, we confirmed and quantified the influence of people’s sentiment to price. Although convenient, Twitter is not the only and best source for the information we want to analyze, so adding more sources promises improvement in metrics. Furthermore, developing more sophisticated sentiment estimation, trying new algorithms and adding more complex trading strategies are our future steps.
But the greatest challenge also lies in understanding what would be the most useful feature for people who want to rely on our signals. Is it just pure sentiment, predictions from our models or the net return of strategy? We will put some effort in figuring this out as well. If you have any ideas or want to express your personal opinion, feel free to share them with us. We are looking forward to that.
Bio: Ljubica Vujovic,is a data Scientist at SmartCat.io.
Original. Reposted with permission.
- Blockchains and APIs
- Blockchain Key Terms, Explained
- Unsupervised Investments: A Comprehensive Guide to AI Investors