This website uses cookies. By using the website you agree with our use of cookies. Know more

Data

Brand Affinity: A Bi-directional Recommendation Model

By Scott Brownlie
Scott Brownlie
Working with Data to pay for new surfboards and Patagonia t-shirts
View All Posts
Brand Affinity: A Bi-directional Recommendation Model
At FARFETCH, we like to personalise the emails and notifications that we send to our users. To help with this, the Marketing Data Science team developed a machine learning model which is able to predict how much affinity users have for each brand on farfetch.com. The model has two main uses:
  1. Given a user, predict their favourite brands
  2. Given a brand, predict the users with most affinity for the brand
This is essentially a recommendation model, but it has to work from both a user and a brand perspective. 

Recommendation systems are one of the most common applications of machine learning in the industry, including at FARFETCH (see 1, 2, 3, 4 great blog posts on the subject). They are mainly used for recommending items to users - be that movies, news articles or even brands, as in use case 1. Recommendations are often made by first predicting a score for each user-item pair and then ordering these scores to rank the items for each user. 

What if we want to do the opposite? That is, rank the users for each item, as in use case 2? Can we utilise the same user-item scores? In this blog post, we will show that the answer is often no, and describe the simple solution that we came up with at FARFETCH to train a bi-directional recommendation model. 

Collaborative Filtering

As training data for the model, we use past user-brand interactions, where an interaction is defined as an order or add to bag or wishlist event. Using these interactions we construct an interaction matrix. It looks a bit like the matrix below, except with millions of users and thousands of brands. In this example, Scott has interacted with Brand1 and Brand3, Guillermo has interacted with Brand2 and Jinjin has interacted with Brand1, which are represented by 1s in the matrix. We call these positive interactions and the rest negative interactions. 
Interaction Matrix
A common class of algorithms used for learning from this type of data is collaborative filtering, also known as matrix factorisation. These algorithms factorise the interaction matrix into user and brand embeddings matrices. Essentially, each user and brand is represented by a fixed-length numeric vector, where the elements of the vector are learned from the data. After training, the score for any user-brand pair - including those without any past interactions - can be predicted by calculating the dot product between the corresponding user and brand embedding vectors. 

There are many different implementations of matrix factorisation algorithms, but in the Marketing Data Science team at FARFETCH we use the open-source LightFM Python library. This library has the advantage of implementing the WARP (Weighted Approximate-Rank Pairwise) loss for implicit feedback learning-to-rank, which fits our use case better than other loss functions, which rely on explicit feedback. 

The Challenge

After training the model with our data and the WARP loss function, we observed that its recommendations were performing well from a user perspective, but from a brand perspective they were no better than random guessing on average. Moreover, if we transposed the interaction matrix and re-trained then the opposite was true: the model would perform well from a brand perspective, but not from a user perspective. 

To understand what was going on, let’s consider how the WARP loss works. In each training epoch, we loop over all (user, positive item) pairs. For each pair, we randomly sample a negative item for the same user and compute the model score for both the (user, positive item) and (user, negative item) pairs. If the negative item’s score is higher than the positive item’s score plus a margin, then we perform a gradient step to increase the positive item score and reduce the negative item score. 

If we think about it, all the WARP loss is doing is trying to order the brands correctly for each individual user. It has no incentive to order users given a brand. The effect of this is illustrated in the diagram below. For each user - Scott, Guillermo and Jinjin - the predicted order of the brands agrees with the interaction matrix above. However, since the order of the users does not contribute to the WARP loss, the score interval for each user is arbitrary. This leads to Guillermo being ranked highest for Brand1, even though he didn’t interact with Brand1, whereas Scott and Jinjin did.

The Solution

So how can we fix this? One solution would be to train two separate models, one which learns from the user-brand interaction matrix and another which learns from its transpose. This isn’t ideal, since for each user-brand pair we would have to generate and store two separate predictions, one from either perspective. Another option would be to modify the internal LightFM code, but that would add significant technical debt to the project. 

Using the LightFM model’s partial_fit() method, we can devise a simpler solution. This method allows us to train the model for a fixed number of epochs without resetting the user and brand embeddings. This allows us to implement the following bi-directional training algorithm:


  1. Initialise LightFM model with users in rows and brands in columns:
model_user_brand

      2. Initialise LightFM model with brands in rows and users in columns:

model_brand_user

      3. For N iterations:

a. Fit model_user_brand to interaction matrix for a single epoch

                b. Copy learned embeddings from model_user_brand to model_brand_user

                c. Fit model_brand_user to transpose of interaction matrix for a single epoch

                d. Copy learned embeddings from model_brand_user to model_user_brand

     4. Return model_user_brand

Essentially, what we are doing is alternately training the LightFM model with the interaction matrix and the transpose of the interaction matrix. The only extra detail is that we need to copy the embeddings from one model to the other after each epoch to avoid the two models diverging. At the end of the training, we can return either model since their embeddings are identical. This approach is very simple and can be implemented without touching the internal LightFM code. 

After adopting the bi-directional training routine we observed that, given a brand, our model was able to correctly predict users with a high affinity for the brand. For this improvement, we had expected to sacrifice performance from a user perspective. After all, the training routine was now pulling the embeddings in both directions instead of only trying to optimise along the user axis. However, to our surprise and delight, the model continued to predict a user’s favourite brands with the same precision and recall as before. 

One Model, Two Use Cases

The trained model is used to output a table that looks a bit like the example below.



These affinity scores are then used to satisfy the two use cases mentioned at the start. The conventional use case is to recommend users their top-scoring brands via marketing communications. The unmodified LightFM model would already allow us to do this. However, what it did not allow us to do was use the same affinity scores to get the top users for each brand. Our bi-directional recommendation model makes this possible. 

Given a marketing campaign for a specific brand, like the one shown below, the CRM team at FARFETCH uses the affinity scores to select the audience, consisting of those users with high affinity for the targeted brand. This strategy has resulted in significantly higher levels of engagement and a lower unsubscribe rate compared to sending the brand campaign to all users. 

Key Learnings

While developing this model we’ve learned a lot - here are the most important insights:
  • Depending on the loss function, a matrix factorisation model might not be symmetric - if you transpose your user-item interaction matrix it can make a big difference.
  • Don’t assume that your user-item scores will work from both a user and an item perspective - they probably won’t. 
  • By adapting the LightFM training routine to alternately fit the user-item interaction matrix and its transpose we can obtain a model which does work in both directions.
  • This single bi-directional model can be used to fulfil two important use cases for CRM at FARFETCH - get top brands for a user and get top users for a brand. 
  • The single model solution means less complexity, shorter training times and half the space needed to store the affinity scores. 

What’s Next?

We’ve found that our bi-directional recommendation model works well for other problems too, not just predicting user-brand affinity scores. We are already using the same approach to train other models at FARFETCH, including one which predicts user-category affinity. We love the simplicity of the solution and we look forward to applying it to other use cases in the future.

Related Articles