MMR (mmr)
Overview
mmr refines generated-text chunks by keeping a small, representative subset and filtering chunks that are too similar to content already selected.
MMR stands for Maximal Marginal Relevance. The core idea is to balance two forces:
- Keep high-value items.
- Avoid selecting near-duplicates of items already kept.
This is useful in evaluation pipelines because long generations often repeat the same point in slightly different wording. Sending every repeated chunk into claim extraction can waste LLM calls and over-weight duplicated content. MMR reduces that noise before downstream nodes run.
The original MMR work by Carbonell and Goldstein introduced this relevance-versus-novelty tradeoff for retrieval and summarization. nexa-gauge applies the same pattern to chunk refinement: chunks are embedded locally, compared with cosine similarity, and selected using a score that combines item confidence with diversity from already-selected chunks.
In nexa-gauge, mmr is not a scoring metric. It is a zero-cost utility transform that consumes ChunkArtifacts from semchunk and emits a filtered ChunkArtifacts object for later nodes such as claims, grounding, and relevance.
Use Case
Use mmr when you need to reduce repeated or overly broad chunk sets before expensive downstream evaluation:
- Long generations with repeated claims or repeated explanations
- Claim extraction runs where duplicate chunks would create duplicate claims
- Grounding and relevance pipelines that should judge representative content
- Faster evaluation runs with fewer downstream LLM calls
- Debugging workflows where you want to inspect the chunks that survived refinement
Node Overview (nexa-gauge)
In nexa-gauge, mmr is the implementation behind the refiner utility node
What it does:
- Receives
Artifactsfrom the upstream node (Exchunk,claims.. etc) and outputs anArtifact` of same Type with reduced chunks. - Reads each chunk's
Itemtext and confidence - Uses the configured
refinerstrategy; currently onlymmris supported - Embeds chunk text with the local
SentenceTransformer(config.EMBEDDING_MODEL)model - Starts with the highest-confidence chunk
- Iteratively scores remaining candidates with:
mmr_score = lambda * relevance - (1 - lambda) * max_similarity
- Marks candidates as duplicates when their cosine similarity to a selected chunk is at or above the similarity threshold
- Keeps up to
refiner_top_kselected chunks - Projects selected indices back onto the original
Chunkobjects - Emits filtered
Artifactsundergeneration_refined_chunks - Reports zero model cost because no LLM call is made
Relevant constants:
DEFAULT_REFINER_STRATEGY = "mmr"MMR_SIMILARITY_THRESHOLD = 0.7MMR_LAMBDA = 0.5REFINER_TOP_K = 3
Internal output note:
RefinerNode.run(...)returnsRefinerArtifactswith selected items, selected indices, dropped count, anddedup_map.- The graph-level
refinernode then converts those selected indices back intoChunkArtifacts, preserving the original chunk records for downstream nodes.
Skip behavior:
- If generation is unavailable, the graph treats
refineras ineligible and emits the configured empty utility artifact. - If no upstream chunks are available in run mode, the graph returns
generation_refined_chunks: null. - In estimate mode, the graph emits an empty typed chunk artifact and records zero estimated cost.
- If the selected refiner is not
mmr, the node raises an unsupported-strategy error.
Execution Flow
Input
Using your sample input:
{
"case_id": "paris-duplicate-chunks",
"question": "Where is the Eiffel Tower?",
"generation": "Paris is the capital of France. France's capital city is Paris. The Eiffel Tower is located on the Champ de Mars.",
"context": "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France.",
"reference": "The Eiffel Tower is on the Champ de Mars in Paris, France."
}Fields used by the refiner branch:
generation: used upstream to create chunks, then refined bymmrcase_id: used for case identity/reporting, not refinement logic
Direct refiner node signature in code: run(items: list[Item]) -> RefinerArtifacts.
Graph-level input/output contract:
- Input artifact:
ChunkArtifactsfromgeneration_chunk - Output artifact:
ChunkArtifactsatgeneration_refined_chunks
Output
Primary graph-level output type:
ChunkArtifactschunks: list[Chunk]cost: CostEstimate
Example graph-level output after MMR keeps chunks 0 and 2:
{
"chunks": [
{
"index": 0,
"item": {
"id": "557be7eca214f188",
"text": "Paris is the capital of France.",
"tokens": 7.0,
"confidence": 1.0,
"cached": false
},
"char_start": 0,
"char_end": 31,
"sha256": "557be7eca214f1889cdb6dfa348eb7c937648c9d6be72bfc1b8204adf7552a43"
},
{
"index": 2,
"item": {
"id": "1089c52e66c78256",
"text": "The Eiffel Tower is located on the Champ de Mars.",
"tokens": 11.0,
"confidence": 1.0,
"cached": false
},
"char_start": 64,
"char_end": 113,
"sha256": "1089c52e66c782561ff8cd766a06349b7aec10d625a9991f55064fc536e97efa"
}
],
"cost": {
"cost": 0.0,
"input_tokens": null,
"output_tokens": null
}
}Attribute meaning:
chunks: selected original chunks after MMR refinementchunks[].index: original chunk index from the upstream chunk artifactchunks[].item.text: chunk text kept for downstream nodeschunks[].item.tokens: token count for the chunk textchunks[].item.confidence: relevance signal used by MMRchunks[].char_start: start offset in the original generationchunks[].char_end: end offset in the original generationchunks[].sha256: full SHA-256 digest of the chunk textcost.cost: always0.0for the graph-level refiner outputcost.input_tokens,cost.output_tokens:nullin the graph-levelChunkArtifactsoutput because refinement is local
Internal RefinerArtifacts fields:
items: selectedItemobjects before graph-level projection back to chunksindices: selected source item indexesdropped: number of input items not keptdedup_map: duplicate item index mapped to the selected representative indexcost: zero-cost local operation
Usage
OUTPUT_DIR=./out/mmr
mkdir -p "$OUTPUT_DIR"Estimate Cost
nexagauge estimate refiner \
--input ./sample.json \
--limit 5 \
--refiner mmr \
--refiner-top-k 3 \
| tee "$OUTPUT_DIR/refiner-estimate.txt"Note: the refiner estimate itself is zero-cost. Its practical effect is reducing the number of chunks that downstream LLM-backed nodes need to process.
Run Utility
nexagauge run refiner \
--input ./sample.json \
--limit 5 \
--output-dir "$OUTPUT_DIR" \
--chunker semchunk \
--refiner mmr \
--refiner-top-k 3For a full evaluation run that uses Semchunk followed by MMR before claim extraction and metrics:
nexagauge run eval \
--input ./sample.json \
--limit 5 \
--output-dir "$OUTPUT_DIR" \
--chunker semchunk \
--refiner mmr \
--refiner-top-k 3