MMR (`mmr`)

Overview

mmr refines generated-text chunks by keeping a small, representative subset and filtering chunks that are too similar to content already selected.

MMR stands for Maximal Marginal Relevance. The core idea is to balance two forces:

Keep high-value items.
Avoid selecting near-duplicates of items already kept.

This is useful in evaluation pipelines because long generations often repeat the same point in slightly different wording. Sending every repeated chunk into claim extraction can waste LLM calls and over-weight duplicated content. MMR reduces that noise before downstream nodes run.

The original MMR work by Carbonell and Goldstein introduced this relevance-versus-novelty tradeoff for retrieval and summarization. nexa-gauge applies the same pattern to chunk refinement: chunks are embedded locally, compared with cosine similarity, and selected using a score that combines item confidence with diversity from already-selected chunks.

In nexa-gauge, mmr is not a scoring metric. It is a zero-cost utility transform that consumes ChunkArtifacts from semchunk and emits a filtered ChunkArtifacts object for later nodes such as claims, grounding, and relevance.

Use Case

Use mmr when you need to reduce repeated or overly broad chunk sets before expensive downstream evaluation:

Long generations with repeated claims or repeated explanations
Claim extraction runs where duplicate chunks would create duplicate claims
Grounding and relevance pipelines that should judge representative content
Faster evaluation runs with fewer downstream LLM calls
Debugging workflows where you want to inspect the chunks that survived refinement

Node Overview (nexa-gauge)

In nexa-gauge, mmr is the implementation behind the refiner utility node

What it does:

Receives Artifacts from the upstream node (Ex chunk, claims.. etc) and outputs an Artifact` of same Type with reduced chunks.
Reads each chunk's Item text and confidence
Uses the configured refiner strategy; currently only mmr is supported
Embeds chunk text with the local SentenceTransformer(config.EMBEDDING_MODEL) model
Starts with the highest-confidence chunk
Iteratively scores remaining candidates with:
- mmr_score = lambda * relevance - (1 - lambda) * max_similarity
Marks candidates as duplicates when their cosine similarity to a selected chunk is at or above the similarity threshold
Keeps up to refiner_top_k selected chunks
Projects selected indices back onto the original Chunk objects
Emits filtered Artifacts under generation_refined_chunks
Reports zero model cost because no LLM call is made

Relevant constants:

DEFAULT_REFINER_STRATEGY = "mmr"
MMR_SIMILARITY_THRESHOLD = 0.7
MMR_LAMBDA = 0.5
REFINER_TOP_K = 3

Internal output note:

RefinerNode.run(...) returns RefinerArtifacts with selected items, selected indices, dropped count, and dedup_map.
The graph-level refiner node then converts those selected indices back into ChunkArtifacts, preserving the original chunk records for downstream nodes.

Skip behavior:

If generation is unavailable, the graph treats refiner as ineligible and emits the configured empty utility artifact.
If no upstream chunks are available in run mode, the graph returns generation_refined_chunks: null.
In estimate mode, the graph emits an empty typed chunk artifact and records zero estimated cost.
If the selected refiner is not mmr, the node raises an unsupported-strategy error.

Execution Flow

Graph

Rendering diagram...

Input

Using your sample input:

json

{
  "case_id": "paris-duplicate-chunks",
  "question": "Where is the Eiffel Tower?",
  "generation": "Paris is the capital of France. France's capital city is Paris. The Eiffel Tower is located on the Champ de Mars.",
  "context": "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France.",
  "reference": "The Eiffel Tower is on the Champ de Mars in Paris, France."
}

Fields used by the refiner branch:

generation: used upstream to create chunks, then refined by mmr
case_id: used for case identity/reporting, not refinement logic

Direct refiner node signature in code: run(items: list[Item]) -> RefinerArtifacts.

Graph-level input/output contract:

Input artifact: ChunkArtifacts from generation_chunk
Output artifact: ChunkArtifacts at generation_refined_chunks

Output

Primary graph-level output type:

ChunkArtifacts
- chunks: list[Chunk]
- cost: CostEstimate

Example graph-level output after MMR keeps chunks 0 and 2:

json

{
  "chunks": [
    {
      "index": 0,
      "item": {
        "id": "557be7eca214f188",
        "text": "Paris is the capital of France.",
        "tokens": 7.0,
        "confidence": 1.0,
        "cached": false
      },
      "char_start": 0,
      "char_end": 31,
      "sha256": "557be7eca214f1889cdb6dfa348eb7c937648c9d6be72bfc1b8204adf7552a43"
    },
    {
      "index": 2,
      "item": {
        "id": "1089c52e66c78256",
        "text": "The Eiffel Tower is located on the Champ de Mars.",
        "tokens": 11.0,
        "confidence": 1.0,
        "cached": false
      },
      "char_start": 64,
      "char_end": 113,
      "sha256": "1089c52e66c782561ff8cd766a06349b7aec10d625a9991f55064fc536e97efa"
    }
  ],
  "cost": {
    "cost": 0.0,
    "input_tokens": null,
    "output_tokens": null
  }
}

Attribute meaning:

chunks: selected original chunks after MMR refinement
chunks[].index: original chunk index from the upstream chunk artifact
chunks[].item.text: chunk text kept for downstream nodes
chunks[].item.tokens: token count for the chunk text
chunks[].item.confidence: relevance signal used by MMR
chunks[].char_start: start offset in the original generation
chunks[].char_end: end offset in the original generation
chunks[].sha256: full SHA-256 digest of the chunk text
cost.cost: always 0.0 for the graph-level refiner output
cost.input_tokens, cost.output_tokens: null in the graph-level ChunkArtifacts output because refinement is local

Internal RefinerArtifacts fields:

items: selected Item objects before graph-level projection back to chunks
indices: selected source item indexes
dropped: number of input items not kept
dedup_map: duplicate item index mapped to the selected representative index
cost: zero-cost local operation

Usage

bash

OUTPUT_DIR=./out/mmr
mkdir -p "$OUTPUT_DIR"

Estimate Cost

bash

nexagauge estimate refiner \
  --input ./sample.json \
  --limit 5 \
  --refiner mmr \
  --refiner-top-k 3 \
  | tee "$OUTPUT_DIR/refiner-estimate.txt"

Note: the refiner estimate itself is zero-cost. Its practical effect is reducing the number of chunks that downstream LLM-backed nodes need to process.

Run Utility

bash

nexagauge run refiner \
  --input ./sample.json \
  --limit 5 \
  --output-dir "$OUTPUT_DIR" \
  --chunker semchunk \
  --refiner mmr \
  --refiner-top-k 3

For a full evaluation run that uses Semchunk followed by MMR before claim extraction and metrics:

bash

nexagauge run eval \
  --input ./sample.json \
  --limit 5 \
  --output-dir "$OUTPUT_DIR" \
  --chunker semchunk \
  --refiner mmr \
  --refiner-top-k 3

MMR (mmr)

Overview

Use Case

Node Overview (nexa-gauge)

Execution Flow

Input

Output

Usage

Estimate Cost

Run Utility

MMR (`mmr`)