Claims (claims)

Overview

The claims utility converts a free-form generation into atomic, verifiable claim units. This is a core evaluation primitive: claim-level decomposition lets you score faithfulness, relevance, and hallucination risk with much higher precision than whole-response scoring.

Why this matters:

  • Long answers often mix correct and incorrect content. A single “good/bad” label hides this mixture.
  • Claim-level units create a stable interface between generation and downstream judges (for example relevance and grounding).
  • Decomposed claims improve traceability: each verdict can point back to one claim and its source chunk.

This design is strongly aligned with prior work:

In nexa-gauge, claims extracts the single most important atomic claim per chunk (with confidence), then outputs ClaimArtifacts. These artifacts become the direct substrate for downstream metrics such as grounding and relevance.

Use Case

Use claims when you need:

  • Fine-grained hallucination analysis (not just response-level pass/fail)
  • Better signal for relevance and grounding metrics
  • Explainable evaluation outputs tied to specific claim units
  • Stable regression comparisons across prompt/model changes
  • Downstream metric pipelines that require normalized factual units

Node Overview

In nexa-gauge, claims sits in the preprocessing branch:

scan -> chunk -> claims

What it does:

  • Reads chunked generation text (Chunk list)
  • Calls an LLM extractor per chunk with a structured schema
  • Produces Claim objects with:
    • extracted claim text
    • source chunk index
    • extractor confidence
    • token count metadata
  • Aggregates per-chunk token/cost usage into one CostEstimate
  • Returns ClaimArtifacts(claims=[...], cost=...)

Implementation note:

  • Prompt asks for exactly one atomic claim per chunk, returned as JSON (claims[], confidences[]).

Execution Flow

Graph
Rendering diagram...

Input

Using your sample input:

json
{
  "case_id": "shakespeare-hamlet-short",
  "generation": "The central theme of Hamlet is mortality and the paralysis that arises from contemplating it. Through the famous 'To be or not to be' soliloquy and repeated encounters with death — the Ghost, Yorick's skull, Ophelia's drowning — Shakespeare explores how consciousness of death impedes decisive action. Hamlet's indecision stems not from cowardice but from his philosophical nature: he cannot act without questioning the meaning and consequences of every action."
}

Fields used by the claims branch:

  • generation: required; this is chunked and then converted to claims
  • case_id: used for case identity/reporting, not claim extraction logic

Direct node-level input to ClaimExtractorNode.run(...) is:

  • chunks: list[Chunk] (produced by upstream chunk node from generation)

Fields not required for claims:

  • question, context, reference

Output

Primary output type:

  • ClaimArtifacts
    • claims: list[Claim]
    • cost: CostEstimate

Example output:

json
{
  "claims": [
    {
      "item": {
        "id": "9df6db8c5d0c9a41",
        "text": "A central theme of Hamlet is mortality and its effect on action.",
        "tokens": 15.0,
        "confidence": 1.0,
        "cached": false
      },
      "source_chunk_index": 0,
      "confidence": 0.91,
      "extraction_failed": false
    },
    {
      "item": {
        "id": "f0f53c3c0119530d",
        "text": "Hamlet's indecision is tied to philosophical reflection rather than simple cowardice.",
        "tokens": 16.0,
        "confidence": 1.0,
        "cached": false
      },
      "source_chunk_index": 1,
      "confidence": 0.88,
      "extraction_failed": false
    }
  ],
  "cost": {
    "cost": 0.00074,
    "input_tokens": 240.0,
    "output_tokens": 56.0
  }
}

Attribute meaning:

  • claims: extracted claim units across all generation chunks
  • claims[].item.id: auto-generated hash-based ID from claim text
  • claims[].item.text: normalized atomic claim text
  • claims[].item.tokens: token count for that claim text
  • claims[].item.confidence: Item-level confidence (default type field)
  • claims[].item.cached: cache marker field on Item
  • claims[].source_chunk_index: originating chunk index
  • claims[].confidence: extractor confidence score for that claim (0–1)
  • claims[].extraction_failed: extraction failure flag (false for valid claims)
  • cost.cost: total USD cost for all claim extraction calls
  • cost.input_tokens: summed prompt tokens across chunks
  • cost.output_tokens: summed completion tokens across chunks

Usage

bash
OUTPUT_DIR=./out/claims
mkdir -p "$OUTPUT_DIR"

Estimate Cost

bash
nexagauge estimate claims \
  --input ./sample.json \
  --limit 5 \
  | tee "$OUTPUT_DIR/claims-estimate.txt"

Note: estimate supports --input and --limit; it does not provide a native --output-dir flag, so tee is used to write output into your chosen output directory.

Run Evaluation

bash
nexagauge run claims \
  --input ./sample.json \
  --limit 5 \
  --output-dir "$OUTPUT_DIR"