PubMed is the most popular search engine for medical professionals. It serves around 3M queries per day and searches over 34M medical publications with a publication added every 30 seconds. In June 2017, PubMed deployed “BestMatch” a relevance search algorithm that registered a 60% increase in “relevance search”. But how does it work behind the curtains?
👀 A step back
PubMed references are mapped into fields, namely title, abstract, authors and metadata (among which MeSH terms). Once mapped, a reference’s output is called a “document”. Each document gets ingested into a search-index ready for being retrieved. The underlying question now is how to retrieve the most relevant documents given a user input.
It is known that Web Search Engines have three kinds of queries: navigational, transactional and informational. When it comes to Medical Search Engines such as PubMed there is no transactional query. When scouting percentages: author search, therapy and drug.
On top of that 80% of clicks are performed over the first page (top 10 results).
With this data PubMed’s needs are to maximize the relevance over the first 10 results, and focus on informational queries since navigational queries are quite easily fixed within a medical search engine.
Before the deployment of BestMatch PubMed was using TF-IDF.
🔎 Previous Search Algorithm: TF-IDF
TF stands for term-frequency. The more times a document contains a term, the more likely it is to be about that term. When considering only this approach, words such as “the” would be ranked as the most relevant. This is why the model uses IDF to penalize words that appear in too many documents.
PubMed has used TF-IDF effectively since 2013. But in 2017 there started to be the need for something smarter: BestMatch.
New Search Algorithm: Best Match
This new search system involves two-stages of ranking: BM25 and then a Learning-to-Rank model called LambdaMART:
BestMatch First Stage: BM25
The main problem with TF-IDF is that if document “x1” contains twice the amount of a word compared to document “x2”, then x1 is deemed as twice as relevant.BM25 fixes this. By adding a bound to the contribution score, the term's frequency has a cap called “term-saturation”. TF/(TF+k), k can be tuned. Here an example of k ranges:
A second problem that BM25’s addresses is the correlation between number of terms and document length. Consider a term that appears twice. Such a term should have a weight in relevancy if document d1 has a certain length, and a lower one if document d2 has three times the length. As in proportion the term is less relevant.
BM25 is responsible for finding the most relevant documents and it represents the first-stage of ranking for PubMed queries.
BestMatch Second Stage: LambdaMART
Imagine having searched for a broad topic that returned us hundreds or even thousands of results. BestMatch now selects the first 500 results, and re-ranks them with a Learning-to-Rank model called LambdaMART. Such model won Track 1 of the 2010 Yahoo! Learning to Rank Challenge.
By comparing the Discounted Cumulative Gain for each pair and swamp when finding a better relevancy. LambdaMART is able to achieve extremely relevant results. The model is computationally heavy and the main reason why it is only used for the top 500 instead of the whole set of results.
🔥 Experiment Conclusions
The experiment ran from 1st of March to 8th of June with 7.5M queries to TF-IDF and 2.5M queries to BestMatch. It recorded an improvement of 40% for CTR@3 and a 22% for CTR@20.
In just 10 months, PubMed saw a spike in users selecting the relevance sorting option of 60%. And even though the model is quite heavy it is able to scale up to 700 queries per second.
Everything starts with search.
With a smart suite of search tools to help you find the information you need, when you need it. Enhance your Search Experience with PapersHive Today!Contact Us