The high level idea here is to apply a faster filter to passages during document retrieval based
on its topic, before it gets ranked along with all the passages.
Concretely, our retrieval algorithm is as follows:
As described in the previous section, there are a number of parameters that lead to different
configurations of the retrieval algorithm. Notably:
• Query Topic Filter Threshold: If less than 1, the topic score of the query is set to 0 if
below the threshold. If greater than 1, this is the number of top topics to remain in the query topic
vector.
• Passage Topic Filter Threshold: If less than 1, the topic score of the passage is set to 0
if below the threshold. If greater than 1, this is the number of top topics to remain in the passage
topic vector.
• Final Score Filter Threshold: When retrieving passages for a given query, the pair’s
relevance score is determined by the dot product of their SPLADE vectors. If the final relevance score
is below the threshold, it’s set to 0 and does not enter the passage ranking.
• Weighed by Topic Overlap: A true/false configuration that determines whether the relevance
scores calculated by SPLADE gets weighted by the topic overlap of the passage and the query too.
With a large number of possible configurations, we’ve taken a manually guided approach and tried a
number of combinations as we aim to optimize for the retrieval time with minimum impact on accuracy.
Below is the full set of results of all experiments we conducted, ranked by accuracy (MRR and
Recall@1000) in descending order.
As can be seen, the baseline SPLADE model still has the highest accuracy metrics. However, there is
a spectrum of different configurations that can achieve different trade-offs between retrieval time
and accuracy, for example Run 19, where only top 2 topics of each query and top 3 topics of passage
are kept and used to filter the passages, can accelerate retrieval time by about 6%, with a only a
slight loss of MRR from 0.19 to 0.17.
Overall, there are some positive results and learnings from this approach, for example:
• We’re able to speed up retrieval time by as much as 20% (Run 28) with still fair
acceptable accuracy
• Weighing SPLADE scores by topic overlap incurs a lot more computation for no gain on
accuracy
• Keeping a broader number of topics on the passages are more important than on the
queries, since the passages are longer and could be relevant to a broader set of topics
•Of course, the pre-trained topic classifier makes a major difference in the timing and
accuracy of the overall algorithm, however we were not able to explore its concrete effects.