Profiting from large-scale training datasets, advances in neural architecture design and efficient inference, joint embeddings have become the dominant approach for tackling cross-modal retrieval. In this work we first show that, despite their effectiveness, state-of-the-art joint embeddings suffer significantly from the longstanding hubness problem in which a small number of gallery embeddings form the nearest neighbours of many queries. Drawing inspiration from the NLP literature, we formulate a simple but effective framework called Querybank Normalisation (QB-Norm) that re-normalises query similarities to account for hubs in the embedding space. QB-Norm improves retrieval performance without requiring retraining. Differently from prior work, we show that QB-Norm works effectively without concurrent access to any test set queries. Within the QB-Norm framework, we also propose a novel similarity normalisation method, the Dynamic Inverted Softmax, that is significantly more robust than existing approaches. We showcase QB-Norm across a range of cross modal retrieval models and benchmarks where it consistently enhances strong baselines beyond the state of the art.


We consider the problem of cross modal retrieval in which queries q1 and q2 are compared against a gallery of samples, x1 and x2. As we show in this work the high-dimensional joint embeddings employed by modern methods for cross-modal retrieval suffer from the hubness problem. A hub (e.g. x2) is the nearest neighbour to multiple queries (q1 and q2), producing poor quality retrieval results (bottom left). QB-Norm (right) employs a querybank to normalise similarities, reducing the similarity of hub x2 to query q1, improving the retrieval results (bottom right).

Paper, code and other resources

  • ArXiv


    For more details please consult the full paper.

  • GitHub


    The code for this paper, as well as the pre-trained models can be found on GitHub.



  title={Cross Modal Retrieval with Querybank Normalisation},
  author={Bogolin, Simion-Vlad and Croitoru, Ioana and Jin, Hailin and Liu, Yang and Albanie, Samuel},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},


This work was supported by Adobe, Google and Zhejiang Lab (NO. 2022NB0AB05). We thank G-Research for a travel grant.