Similarity metrics in recommender systems

Kirill Bondarenko
5 min readFeb 27, 2019

--

Deeper to evaluation process

Introduction

This article is a short explanation of recommeder system technique named KNN (k — nearest neighbors) and collaborative filtering. Here will not be long explanation what is KNN and CF. There are a lot of articles for these topics. Like these: article 1, article 2, article 3.

Here will be an explanation how to choose metrics, used in models based on KNN principle.

Similarities metrics

What is similarity between two objects ? In any case, total similarity is an absence of differences. For example: we are comparing two people, first one male, 25 y.o. , other male 25 y.o. . By two features (gender and age) we may state they are equal.

Next part requires understanding of next techniques: one-hot encoding (OHE), TFIDF , or other feature extraction methods.

Briefly: OHE: we have a user and three features , answering on a three questions: does he like item 1 ? item 2? item 3 ? If yes -1 , else — 0 . For example, user likes item 1 and 3, but dislikes item 2. In this case we got 3 dimensional vector, user = [1,0,1]. It might be float values too. User likes item 1 on 60% , item 2 on 20% and third 85 %. We got user = [0.6,0.2,0.85].

The same case with vectors. We have two 3 D vectors A and B. A = [0,1,0] and B = [0,1,0]. Their coordinates are pairwise equal (0=0,1=1,0=0). But what we should do if we have A = [0,1,1] and B =[1,1,0] ? Are they equal ? What percent of similarity between them ?

There are many techniques to compare similarity between two vectors. Let’s define two the most popular of them in case of recommender systems.

Euclidean distance similarity

It’s very easy. Similarity is a distance between two vectors, calculated by formula:

Wikipedia article

So, if we have two vectors p = [0,0,1,1] and q = [1,0,1,0], distance between them will be: d(p,q) = √̅(̅(̅0̅–̅1̅)̅²̅+̅(̅0̅–̅0̅)̅²̅+̅(̅1̅–̅1̅)̅²̅+̅(̅1̅–̅0̅)̅²̅)̅= √̅̅̅(̅1̅+̅0̅+̅0̅+̅1̅)̅ ≈1.4 . And we have vector q1 = [0,1,1,1]. Is it closer to p than q ? Calculating distance: d(p,q1) = √̅(̅(̅0̅–̅0̅)̅²̅+̅(̅0̅–̅1̅)̅²̅+̅(̅1̅–̅1̅)̅²̅+̅(̅1̅–̅1̅)̅²̅)̅ =√̅̅̅(̅0̅+̅1̅+̅0̅+̅0̅)̅ = 1. We see than d(p,q1) < d(p,q) => q1 is closer to p than q. If vectors were users marks, we would say that user q1 is closer to user p than user q. So, we may say:

Decreasing of Euclidean distance between two user-vectors has indirect influence on similarity between them.

And we may interpret it like a coefficient of similarity Esim:

Esim(p,q) = 1/(1+d(p,q))

Cosine similarity metrics

This method uses cosine of angle between two vectors A and B with start point in the zeros of coordinate axes.

Difference with Euclidean metrics
Formula and Wikipedia article

We have the same vectors p = [0,0,1,1] and q = [1,0,1,0] and q1 = [0,1,1,1].

similarity(p,q) = cosine(p,q) = dot product of p and q / square root of dot product p and p * square root of dot product q and q = 0.9 / 1.36 = 0.66 . We interpret it like 66% of similarity. A similarity (p,q1) = 1.042 / 1.371 = 0.76. It means 76% of similarity. The same results as Euclidean approach.

What is the difference and how to choose ?

Sure I’m here not to rewrite already made articles. The aim is how to analyze difference between approaches and choose the best one for your task. So first is the difference.

Euclidean metrics compare absolute values and says that best variant of similar vectors are two absolute identical vectors with zero distance between them.

But what if we are looking for similar direction vectors ? In case of recommender systems we are matching similar users on the basis of their marks, preferences e.t.c. Mark 0.1 and mark 0.49 are both negative (if we took threshold = 0.5) and both have negative direction in our vision, but absolute distance between them is 0.39. In this case we want a metric, that may takes to account directions of vectors.

The key idea was taken from text classification methods. Using TFIDF vectorizing we transform texts of any size to fixed dimensional vector of frequency of appear each word in a text. We don’t care about frequency of words ‘war’ and ‘suffering’ in two different texts, if they appeared, we understand that both text are about a war conflict (just for example). We may interpret it for out task with user marks. If we understand that both people like the same product, we suppose that they have the same interests.

On this point we may do a little philosophy pause. What is direction of vector of user marks ? User marks is a set of values, describes user’s unique preferences, mood, lifestyle e.t.c. You may call it as you like. But the meaning should be common: we are looking for same “mood” users.

Let’s look on a one example. We have to items X and Y. And users A,B and C. User A has only good rating Y and X=0, user B has good rating for Y and bad for X, user C has rating Y better than B and almost same bad rating for X.

Example of difference

We see, that AB < AC, and if we take huge threshold in distances we may loose data of C user. If we use cosine similarity, users B and C has the same similarity score with A, (A0B = A0C) because they are on the one direction. In this case we can get wider data explanation for future analysis and recommendations.

There are a lot of other metrics like Pearson coefficient, Mean Squared Difference similarity, e.t.c. There is a good library Surprise with well done explained documentation.

Sources

  1. Good article. Thanks to Chris Emmery.
  2. Stackowerflow answer.

From author

Topic of selection the best metrics are the one of the hardest tasks in machine learning. Try to see deeper to see the sense what are you doing.

Many thanks for reading this article! Hope, it helps you to solve your problem.

Bondarenko K. , machine learning engineer.

--

--

Kirill Bondarenko

Young and positive thinking machine learning engineer, living in Ukraine.