sklearn cosine similarity featured image

Sklearn Cosine Similarity : Implementation Step By Step

Popular Domains for just 99 Cents at Namecheap!

We can import sklearn cosine similarity function from sklearn.metrics.pairwise.  It will calculate cosine similarity between two numpy array. In this article, We will implement cosine similarity step by step.

sklearn cosine similarity : Python –

We will implement this function in various small steps. Lets start.

Step 1: Importing package –

Firstly, In this step, We will import cosine_similarity module from sklearn.metrics.pairwise package. Here will also import numpy module for array creation. Here is the syntax for this.

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

Step 2: Vector Creation –

Secondly, In order to demonstrate cosine similarity function we need vectors. Here vectors are numpy array. Lets create numpy array.

array_vec_1 = np.array([[12,41,60,11,21]])
array_vec_2 = np.array([[40,11,04,11,14]]) 

Step 3: Cosine Similarity-

Finally, Once we have vectors, We can call cosine_similarity() by passing both vectors. It will calculate the cosine similarity between these two. It will be a value between [0,1]. If it is 0 then both vectors are complete different. But in the place of that if it is 1, It will be completely similar.

cosine_similarity(array_vec_1 , array_vec_2)

Complete code with output-

 

Lets put the code from each steps together. Here it is-

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
array_vec_1 = np.array([[12,41,60,11,21]])
array_vec_2 = np.array([[40,11,4,11,14]])
print(cosine_similarity(array_vec_1, array_vec_2))

"<yoastmark

Here we have used two different vectors. After applying this function, We got cosine similarity of around 0.45227 . Which signifies that it is not very similar and not very different. In Actuall scenario, We use text embedding as numpy vectors. We can use TF-IDF, Count vectorizer, FastText or bert etc for embedding generation.

Conclusion –

cosine similarity is one the best way to judge or measure the similarity between documents. Irrespective of the size, This similarity measurement tool works fine. We can also implement this without  sklearn module. But It will be a more tedious task. Sklearn simplifies this. I hope this article, must have cleared implementation. Still, if you found, any of the information gap. Please let us know. You may also comment as comment below.

 

Thanks
Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner