Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Most text retrievers generate \emph{one} query vector to retrieve relevant
documents. Yet, the conditional distribution of relevant documents for the
query may be multimodal, e.g., representing different interpretations of the
query. We first quantify the limitations of existing retrievers. All retrievers
we evaluate struggle more as the distance between target document embeddings
grows. To address this limitation, we develop a new retriever architecture,
\emph{A}utoregressive \emph{M}ulti-\emph{E}mbedding \emph{R}etriever (AMER).
Our model autoregressively generates multiple query vectors, and all the
predicted query vectors are used to retrieve documents from the corpus. We show
that on the synthetic vectorized data, the proposed method could capture
multiple target distributions perfectly, showing 4x better performance than
single embedding model. We also fine-tune our model on real-world multi-answer
retrieval datasets and evaluate in-domain. AMER presents 4 and 21\% relative
gains over single-embedding baselines on two datasets we evaluate on.
Furthermore, we consistently observe larger gains on the subset of dataset
where the embeddings of the target documents are less similar to each other. We
demonstrate the potential of using a multi-query vector retriever and open up a
new direction for future work.
Key Contributions
Addresses the limitations of single-query vector retrievers by proposing the Autoregressive Multi-Embedding Retriever (AMER). AMER autoregressively generates multiple query vectors, allowing it to capture multimodal distributions of relevant documents and significantly improve performance on tasks requiring diverse interpretations.
Business Value
Enhances the effectiveness of search and recommendation systems, leading to better user satisfaction, improved information discovery, and more relevant results in applications like e-commerce, content platforms, and enterprise search.