AIMLSAGA

Gap Discussion

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

What did this paper leave unanswered? Identify research gaps, propose solutions, and connect with others building implementations.

Post a gap discussion to earn +30 XP

Identify a Research Gap

0/2000 characters

1 Discussion

Jordan Kim

Nov 10, 2024

The original RAG paper evaluates on open-domain QA benchmarks. In production, the hard problem is retrieval quality at scale. With 10M+ documents, recall@5 drops significantly. The paper doesn't address: 1) Multi-vector representations (ColBERT), 2) Hierarchical indexing, 3) Query expansion. What's the state of the art for high-recall retrieval over large corpora?