Skip to content
  • Exploring Semantic Search results of Sentence Transformers
    Recently evaluated the semantic search capability of Sentence Transformers. Their Multi-QA models have been trained on 215M question-answer pairs from various sources and domains, including StackExchange, Yahoo Answers, Google & Bing search queries and many more. I have used multi-qa-mpnet-base-cos-v1. The example code is available at https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-cos-v1. In the given sample, in addition to actual docs, I
  • Why is Azure Data Explorer fast in petabyte-scale analytics?
  • You are here :
  • Home
  • AI/ML
  • Exploring Semantic Search results of Sentence Transformers

Exploring Semantic Search results of Sentence Transformers

November 29, 2023AI/MLSheik Standard

Recently evaluated the semantic search capability of Sentence Transformers. Their Multi-QA models have been trained on 215M question-answer pairs from various sources and domains, including StackExchange, Yahoo Answers, Google & Bing search queries and many more. I have used multi-qa-mpnet-base-cos-v1. The example code is available at https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-cos-v1.

In the given sample, in addition to actual docs, I just added additional item “Around London there are 25 Million people live”

query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district", "Around London there are 25 Million people live"]

Score results are:

0.8975692391395569 Around London there are 25 Million people live 
0.8814704418182373 Around 9 Million people live in London 
0.505085825920105 London is known for its financial district

When changed the query as “How many people live in London only?” still the results are same but minor change in cosine similarities.

0.8868905305862427 Around London there are 25 Million people live
0.8658156394958496 Around 9 Million people live in London
0.5148751735687256 London is known for its financial district

I expected the score will be higher for “Around 9 Million people live in London”, but still first doc gets more score.

When the query is “How many people live in near London?”, score results are promising.

0.880798876285553 Around London there are 25 Million people live
0.8229680061340332 Around 9 Million people live in London
0.5206520557403564 London is known for its financial district

I added more docs for the above query

docs = [
"Around 9 Million people live in London", 
"London is known for its financial district", 
"Around London there are 25 Million people live", 
"100,000 people applied Visa for London", 
"London bridge", 
"In 2025, there will 15 million people live in London only and 40 million people around London"
]

Score results are:

0.8807988166809082 Around London there are 25 Million people live
0.8229680061340332 Around 9 Million people live in London
0.76351398229599 In 2025, there will 15 million people live in London only and 40 million people around London
0.5206520557403564 London is known for its financial district
0.5186482667922974 100,000 people applied Visa for London
0.375369668006897 London bridge
Tags: AI, ML, multi-qa-mpnet-base-cos-v1, Semantic Search, Sentence Transformers

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

AI big data Book data analytics ML multi-qa-mpnet-base-cos-v1 O’Reilly Semantic Search Sentence Transformers

  • AI/ML
  • Data

(c) Copyright Sheik | Theme by ThemeinProgress | Proudly powered by WordPress