Grounding (grounding)

Overview

grounding measures factual faithfulness: are the claims in a model answer actually supported by the provided context?

The metric design aligns with two core papers:

  • RAGAS arXiv:2309.15217 frames faithfulness for RAG as a claim-level support check against retrieved passages, without needing a gold reference answer.
  • FActScore arXiv:2305.14251 argues that factuality should be evaluated as atomic facts, not a single binary judgment, because long-form outputs often mix correct and incorrect statements.

In practice, this means:

  1. Break answer content into verifiable claims.
  2. Check each claim against context evidence.
  3. Aggregate claim verdicts into a faithfulness score.

nexa-gauge’s grounding node operationalizes this pattern using deduplicated claims from the generation and an LLM judge that returns boolean support verdicts per claim. The final score is the fraction of supported claims.

This metric is especially useful when you care about hallucination control and evidence-backed answering. It evaluates factual support, not style or completeness, so it should be combined with other metrics (for example relevance) for broader quality coverage.

Use Case

Use grounding when you need confidence that outputs stay tied to supplied evidence:

  • RAG QA systems (docs, knowledge bases, support bots)
  • Compliance/policy workflows where unsupported claims are risky
  • Regression testing after retrieval, prompt, or model changes
  • Benchmarking hallucination rate across model versions
  • Validating claim-level trustworthiness in generated summaries

Node Overview (nexa-gauge)

In nexa-gauge, grounding is an answer-category metric node.

What it does:

  • Receives list of Claim objects from upstream Claims Node.
  • Receives context from normalized scanner inputs
  • Sends one judge prompt with:
    • full context text
    • numbered claims
  • Expects structured output: {"verdicts": [true/false, ...]}
  • Maps verdicts to claim-level Faithfulness entries with:
    • verdict = "ACCEPTED" if true
    • verdict = "REJECTED" if false
  • Computes score: supported_claims / evaluated_claims

Skip behavior:

  • If no claims, no context, or grounding disabled, returns empty metrics and zero cost.

Execution Flow

Graph
Rendering diagram...

Input

Using your sample input:

json
{
  "case_id": "eiffel-tower-basic",
  "question": "What is the Eiffel Tower and where is it located?",
  "generation": "The Eiffel Tower is a wrought-iron lattice tower located in Paris, France. .......",
  "context": "The Eiffel Tower (/ˈaɪfəl/ EYE-fəl; French: Tour Eiffel) is a wrought-iron lattice tower on the Champ de Mars in Paris, France. ......."
}

Fields used by the grounding branch:

  • generation: used (upstream) to create claims
  • context: used (directly) as evidence text for support verification
  • case_id: used for case identity/reporting, not for scoring logic

Fields not used by grounding:

  • question: not used by grounding (used by relevance)
  • reference: not used by grounding (used by reference)

Output

Primary output type:

  • GroundingMetrics
    • metrics: list[MetricResult]
    • cost: CostEstimate

Example output:

json
{
  "metrics": [
    {
      "name": "grounding",
      "category": "answer",
      "score": 0.5,
      "result": [
        {
          "item": {
            "id": "a1b2c3d4e5f6a7b8",
            "text": "The Eiffel Tower is in Paris, France.",
            "tokens": 10.0,
            "confidence": 1.0,
            "cached": false
          },
          "source_chunk_index": 0,
          "confidence": 0.93,
          "extraction_failed": false,
          "verdict": "ACCEPTED"
        },
        {
          "item": {
            "id": "b2c3d4e5f6a7b8c9",
            "text": "The Eiffel Tower is located in Berlin.",
            "tokens": 10.0,
            "confidence": 1.0,
            "cached": false
          },
          "source_chunk_index": 0,
          "confidence": 0.88,
          "extraction_failed": false,
          "verdict": "REJECTED"
        }
      ],
      "error": null
    }
  ],
  "cost": {
    "cost": 0.00042,
    "input_tokens": 215.0,
    "output_tokens": 18.0
  }
}

Attribute meaning:

  • metrics: one entry for this node (name="grounding"), or empty when skipped
  • name: metric/node identifier
  • category: answer (from MetricCategory.ANSWER)
  • score: supported-claim ratio in [0,1]
  • result: per-claim faithfulness records
  • result[].item: claim text and token metadata
  • result[].source_chunk_index: generation chunk where claim came from
  • result[].confidence: extractor confidence for the claim
  • result[].extraction_failed: extraction failure marker
  • result[].verdict: ACCEPTED or REJECTED
  • error: populated when verdict parsing fails (for example "No verdicts returned")
  • cost.cost: USD cost estimate/actual for this node call
  • cost.input_tokens, cost.output_tokens: model token usage (or null for zero-cost skips)

Usage

bash
OUTPUT_DIR=./out/grounding
mkdir -p "$OUTPUT_DIR"

Estimate Cost

bash
nexagauge estimate grounding \
  --input ./sample.json \
  --limit 5 \
  | tee "$OUTPUT_DIR/estimate.txt"

Note: estimate currently supports --input and --limit, but not --output-dir; use tee to save estimate output.

Run Evaluation

bash
nexagauge run grounding \
  --input ./sample.json \
  --limit 5 \
  --output-dir "$OUTPUT_DIR"

For full per-case report files that include grounding plus other metrics:

bash
nexagauge run eval \
  --input ./sample.json \
  --limit 5 \
  --output-dir "$OUTPUT_DIR"