My task: To carry out a Jaccard index vs. Cosine Similarity test and record my results.
Review: I am testing the accuracy of the Jaccard index method vs. the Cosine similarity method when determining the similarity between two sentences:
- How can I be a geologist?
- What should I do to be a geologist?
With my knowledge of the English language, I know that these two sentences are asking very similar, if not the exact same, thing. Thus, when comparing the similarity of their meanings, it should be close to, if not exactly, 1.
Based on the two texts strings we used as a case study, there are 3 known facts:
- The value of our target similarity measurement should be 1 or very close to it.
- Jaccard similarity results in a similarity measurement of 0.40.
- Cosine similarity results in a similarity measure of 0.5774
Comparing the results of our case study from Jaccard similarity and Cosine similarity, we can see that cosine similarity has a better score which is closer to our target measurement. Thus, we can conclude that the cosine works better than the Jaccard method.