MMR (mmr)

Overview

mmr refines generated-text chunks by keeping a small, representative subset and filtering chunks that are too similar to content already selected.

MMR stands for Maximal Marginal Relevance. The core idea is to balance two forces:

  1. Keep high-value items.
  2. Avoid selecting near-duplicates of items already kept.

This is useful in evaluation pipelines because long generations often repeat the same point in slightly different wording. Sending every repeated chunk into claim extraction can waste LLM calls and over-weight duplicated content. MMR reduces that noise before downstream nodes run.

The original MMR work by Carbonell and Goldstein introduced this relevance-versus-novelty tradeoff for retrieval and summarization. nexa-gauge applies the same pattern to chunk refinement: chunks are embedded locally, compared with cosine similarity, and selected using a score that combines item confidence with diversity from already-selected chunks.

In nexa-gauge, mmr is not a scoring metric. It is a zero-cost utility transform that consumes ChunkArtifacts from semchunk and emits a filtered ChunkArtifacts object for later nodes such as claims, grounding, and relevance.

Use Case

Use mmr when you need to reduce repeated or overly broad chunk sets before expensive downstream evaluation:

  • Long generations with repeated claims or repeated explanations
  • Claim extraction runs where duplicate chunks would create duplicate claims
  • Grounding and relevance pipelines that should judge representative content
  • Faster evaluation runs with fewer downstream LLM calls
  • Debugging workflows where you want to inspect the chunks that survived refinement

Node Overview (nexa-gauge)

In nexa-gauge, mmr is the implementation behind the refiner utility node

What it does:

  • Receives Artifacts from the upstream node (Ex chunk, claims.. etc) and outputs an Artifact` of same Type with reduced chunks.
  • Reads each chunk's Item text and confidence
  • Uses the configured refiner strategy; currently only mmr is supported
  • Embeds chunk text with the local SentenceTransformer(config.EMBEDDING_MODEL) model
  • Starts with the highest-confidence chunk
  • Iteratively scores remaining candidates with:
    • mmr_score = lambda * relevance - (1 - lambda) * max_similarity
  • Marks candidates as duplicates when their cosine similarity to a selected chunk is at or above the similarity threshold
  • Keeps up to refiner_top_k selected chunks
  • Projects selected indices back onto the original Chunk objects
  • Emits filtered Artifacts under generation_refined_chunks
  • Reports zero model cost because no LLM call is made

Relevant constants:

  • DEFAULT_REFINER_STRATEGY = "mmr"
  • MMR_SIMILARITY_THRESHOLD = 0.7
  • MMR_LAMBDA = 0.5
  • REFINER_TOP_K = 3

Internal output note:

  • RefinerNode.run(...) returns RefinerArtifacts with selected items, selected indices, dropped count, and dedup_map.
  • The graph-level refiner node then converts those selected indices back into ChunkArtifacts, preserving the original chunk records for downstream nodes.

Skip behavior:

  • If generation is unavailable, the graph treats refiner as ineligible and emits the configured empty utility artifact.
  • If no upstream chunks are available in run mode, the graph returns generation_refined_chunks: null.
  • In estimate mode, the graph emits an empty typed chunk artifact and records zero estimated cost.
  • If the selected refiner is not mmr, the node raises an unsupported-strategy error.

Execution Flow

Graph
Rendering diagram...

Input

Using your sample input:

json
{
  "case_id": "paris-duplicate-chunks",
  "question": "Where is the Eiffel Tower?",
  "generation": "Paris is the capital of France. France's capital city is Paris. The Eiffel Tower is located on the Champ de Mars.",
  "context": "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France.",
  "reference": "The Eiffel Tower is on the Champ de Mars in Paris, France."
}

Fields used by the refiner branch:

  • generation: used upstream to create chunks, then refined by mmr
  • case_id: used for case identity/reporting, not refinement logic

Direct refiner node signature in code: run(items: list[Item]) -> RefinerArtifacts.

Graph-level input/output contract:

  • Input artifact: ChunkArtifacts from generation_chunk
  • Output artifact: ChunkArtifacts at generation_refined_chunks

Output

Primary graph-level output type:

  • ChunkArtifacts
    • chunks: list[Chunk]
    • cost: CostEstimate

Example graph-level output after MMR keeps chunks 0 and 2:

json
{
  "chunks": [
    {
      "index": 0,
      "item": {
        "id": "557be7eca214f188",
        "text": "Paris is the capital of France.",
        "tokens": 7.0,
        "confidence": 1.0,
        "cached": false
      },
      "char_start": 0,
      "char_end": 31,
      "sha256": "557be7eca214f1889cdb6dfa348eb7c937648c9d6be72bfc1b8204adf7552a43"
    },
    {
      "index": 2,
      "item": {
        "id": "1089c52e66c78256",
        "text": "The Eiffel Tower is located on the Champ de Mars.",
        "tokens": 11.0,
        "confidence": 1.0,
        "cached": false
      },
      "char_start": 64,
      "char_end": 113,
      "sha256": "1089c52e66c782561ff8cd766a06349b7aec10d625a9991f55064fc536e97efa"
    }
  ],
  "cost": {
    "cost": 0.0,
    "input_tokens": null,
    "output_tokens": null
  }
}

Attribute meaning:

  • chunks: selected original chunks after MMR refinement
  • chunks[].index: original chunk index from the upstream chunk artifact
  • chunks[].item.text: chunk text kept for downstream nodes
  • chunks[].item.tokens: token count for the chunk text
  • chunks[].item.confidence: relevance signal used by MMR
  • chunks[].char_start: start offset in the original generation
  • chunks[].char_end: end offset in the original generation
  • chunks[].sha256: full SHA-256 digest of the chunk text
  • cost.cost: always 0.0 for the graph-level refiner output
  • cost.input_tokens, cost.output_tokens: null in the graph-level ChunkArtifacts output because refinement is local

Internal RefinerArtifacts fields:

  • items: selected Item objects before graph-level projection back to chunks
  • indices: selected source item indexes
  • dropped: number of input items not kept
  • dedup_map: duplicate item index mapped to the selected representative index
  • cost: zero-cost local operation

Usage

bash
OUTPUT_DIR=./out/mmr
mkdir -p "$OUTPUT_DIR"

Estimate Cost

bash
nexagauge estimate refiner \
  --input ./sample.json \
  --limit 5 \
  --refiner mmr \
  --refiner-top-k 3 \
  | tee "$OUTPUT_DIR/refiner-estimate.txt"

Note: the refiner estimate itself is zero-cost. Its practical effect is reducing the number of chunks that downstream LLM-backed nodes need to process.

Run Utility

bash
nexagauge run refiner \
  --input ./sample.json \
  --limit 5 \
  --output-dir "$OUTPUT_DIR" \
  --chunker semchunk \
  --refiner mmr \
  --refiner-top-k 3

For a full evaluation run that uses Semchunk followed by MMR before claim extraction and metrics:

bash
nexagauge run eval \
  --input ./sample.json \
  --limit 5 \
  --output-dir "$OUTPUT_DIR" \
  --chunker semchunk \
  --refiner mmr \
  --refiner-top-k 3