What is Contextual Compression? - Definition & Meaning
Learn what contextual compression is, how retrieved documents are compressed based on the query, and why it makes RAG more efficient and effective.
Definition
Contextual compression is the reduction of retrieved document chunks by keeping only the parts relevant to the specific search query. This reduces context noise and token usage while preserving relevant information.
Technical explanation
After retrieval, chunks are passed through an LLM or extractor with the query as reference. Only query-relevant sentences or paragraphs remain. This reduces token count (lower cost, more room for other context) and often improves answer quality by reducing distracting information. LangChain's ContextualCompressionRetriever supports this. Alternative: extractive QA (only answer-relevant spans). Trade-off: extra LLM call per chunk increases latency.
How AVARC Solutions applies this
AVARC Solutions applies contextual compression when retrieved chunks are large or noisy. We use it for long documents where only specific sections are relevant. For low-latency use cases we consider a lighter approach or reranking only.
Practical examples
- A RAG fetching 5 long chunks and using an LLM to extract only query-relevant sentences per chunk.
- A legal RAG compressing long legal texts to only the articles applicable to the question.
- A support knowledge base where compression ensures only answer-relevant FAQ sections go to the LLM.
Related terms
Frequently asked questions
Related articles
What is RAG (Retrieval Augmented Generation)? - Definition & Meaning
Learn what RAG is, how it combines LLMs with external knowledge sources for accurate and up-to-date answers, and why it is essential for enterprise AI.
What is AI Hallucination? - Definition & Meaning
Learn what AI hallucination is, why LLMs make up facts, and which techniques to use to reduce hallucinations in production.
What are Chunking Strategies? - Definition & Meaning
Learn what chunking strategies are, how to optimally split documents for RAG, and which methods fit your use case best.
RAG Application Template - Retrieval Augmented Generation Setup
Download our RAG application template for knowledge base chatbots and Q&A systems. Includes chunking, embeddings, vector database, and prompt design.