Beyond Word2Vec Usage For Only Words

A good example on how to use word2vec in order to get recommendations fast and efficiently.

By Stanko Kuveljic, SmartCat.

Making a machine learning model usually takes a lot of crying, pain, feature engineering, suffering, training, debugging, validation, desperation, testing and a little bit of agony due to the infinite pain. After all that, we deploy the model and use it to make predictions for future data. We can run our little devil on a batch of

Awwww, there are even unicorns in the picture. But, what about us not having a predefined set of

From the data we can see that Darth Vader likes to bet on tennis matches, but his son Luke likes to bet on English Premier League football. So how to make magic and learn from IDs only? First of all, we need to squash our information about games even more in order to get templates. These templates can group up data based on some similar features. In this case, templates are made in such a way to group similar games - football games from the same league, same game type and similar quota are grouped in the same template. For example:

  • Template 1 - Tennis : US Open : Wins Break : 1 : quota_range (1.5 - 2.1)
  • Template 2 - Tennis : US Open : Final Score : 2 : quota_range (1.8 - 2.6)
  • Template 3 - Football : English Premier : Goals : 0-2 : quota_range (1.5 - 2.0)
  • Template 4 - Football : Spain 1 : Final Score : 1 : quota_range (1.8 - 2.4)
  • ………………..
  • Template 1000 - Basketball : NBA : Total Score : >200 : quota_range (1.5 - 2.0)

These rules for determining templates can be calculated from data statistics, or defined with some domain knowledge. But, most importantly, these rules run very fast when it comes to determining template ID. This is everything we need from a feature engineering task. When we convert the previous data into a template we get:

Please note that skip-gram uses sentences like “Luke, I am your father” to determine what words are similar to each other. To make something like this, we need to group all history in sessions for every user. In this way we can obtain sentences representing user history, and user ID is now close to templates that he played in the past, like this:


Let’s see how vector representation is learned during training. When an algorithm sees that a user has played some game template, it pushes the vectors representation of the user and played template close to each other in vector space. At the same time (using the sampling negatives technique) it pushes the player vector and non played game template vectors away from each other in vector space. In the image below, we can see how our “vectors” lay in space. There is a happy unicorn in the image again because it likes this solution.

We see that Darth Vader is close to templates that represent tennis matches because he likes to play tennis. Luke is away from his father, because he doesn’t like tennis, but he is into football and he would be closer to vectors representing football games. There is also Yoda who plays football but in a different league. But he is closer to Luke than to Darth because football templates are more similar than football and tennis templates.

The output of algorithm is a vector both for user IDs and template IDs that can be stored somewhere. Now, in prediction time where data changes every few seconds, we can always calculate predictions in real time. We just need to lookup currently available template vectors and do cosine similarity with the user vector in order to get similarity scores between the user and template. Aside from being fun to try, this technique realized good results on data set and also achieved the ultimately fast predictions for a huge number of users and templates that changes frequently. The most important part is that this technique can be applied everywhere and it would work nicely. For example, if we have information about history transactions [user ID - buy - item ID], we can only use the user IDs and item IDs to train vectors out of it. Based on trained vectors, we can recommend the most similar items to each user. By the most similar, we mean items that often occur together.

Merry Christmas and may the Force Vectors be with ya mon.

Original. Reposted with permission.


  • Regularization in Machine Learning
  • How to win Kaggle competition based on NLP task, if you are not an NLP expert
  • Top 10 TED Talks for data Scientists and Machine Learning Engineers