Grounding (`grounding`)

Overview

grounding measures factual faithfulness: are the claims in a model answer actually supported by the provided context?

The metric design aligns with two core papers:

RAGAS arXiv:2309.15217 frames faithfulness for RAG as a claim-level support check against retrieved passages, without needing a gold reference answer.
FActScore arXiv:2305.14251 argues that factuality should be evaluated as atomic facts, not a single binary judgment, because long-form outputs often mix correct and incorrect statements.

In practice, this means:

Break answer content into verifiable claims.
Check each claim against context evidence.
Aggregate claim verdicts into a faithfulness score.

nexa-gauge’s grounding node operationalizes this pattern using deduplicated claims from the generation and an LLM judge that returns boolean support verdicts per claim. The final score is the fraction of supported claims.

This metric is especially useful when you care about hallucination control and evidence-backed answering. It evaluates factual support, not style or completeness, so it should be combined with other metrics (for example relevance) for broader quality coverage.

Use Case

Use grounding when you need confidence that outputs stay tied to supplied evidence:

RAG QA systems (docs, knowledge bases, support bots)
Compliance/policy workflows where unsupported claims are risky
Regression testing after retrieval, prompt, or model changes
Benchmarking hallucination rate across model versions
Validating claim-level trustworthiness in generated summaries

Node Overview (nexa-gauge)

In nexa-gauge, grounding is an answer-category metric node.

What it does:

Receives list of Claim objects from upstream Claims Node.
Receives context from normalized scanner inputs
Sends one judge prompt with:
- full context text
- numbered claims
Expects structured output: {"verdicts": [true/false, ...]}
Maps verdicts to claim-level Faithfulness entries with:
- verdict = "ACCEPTED" if true
- verdict = "REJECTED" if false
Computes score: supported_claims / evaluated_claims

Skip behavior:

If no claims, no context, or grounding disabled, returns empty metrics and zero cost.

Execution Flow

Graph

Rendering diagram...

Input

Using your sample input:

json

{
  "case_id": "eiffel-tower-basic",
  "question": "What is the Eiffel Tower and where is it located?",
  "generation": "The Eiffel Tower is a wrought-iron lattice tower located in Paris, France. .......",
  "context": "The Eiffel Tower (/ˈaɪfəl/ EYE-fəl; French: Tour Eiffel) is a wrought-iron lattice tower on the Champ de Mars in Paris, France. ......."
}

Fields used by the grounding branch:

generation: used (upstream) to create claims
context: used (directly) as evidence text for support verification
case_id: used for case identity/reporting, not for scoring logic

Fields not used by grounding:

question: not used by grounding (used by relevance)
reference: not used by grounding (used by reference)

Output

Primary output type:

GroundingMetrics
- metrics: list[MetricResult]
- cost: CostEstimate

Example output:

json

{
  "metrics": [
    {
      "name": "grounding",
      "category": "answer",
      "score": 0.5,
      "result": [
        {
          "item": {
            "id": "a1b2c3d4e5f6a7b8",
            "text": "The Eiffel Tower is in Paris, France.",
            "tokens": 10.0,
            "confidence": 1.0,
            "cached": false
          },
          "source_chunk_index": 0,
          "confidence": 0.93,
          "extraction_failed": false,
          "verdict": "ACCEPTED"
        },
        {
          "item": {
            "id": "b2c3d4e5f6a7b8c9",
            "text": "The Eiffel Tower is located in Berlin.",
            "tokens": 10.0,
            "confidence": 1.0,
            "cached": false
          },
          "source_chunk_index": 0,
          "confidence": 0.88,
          "extraction_failed": false,
          "verdict": "REJECTED"
        }
      ],
      "error": null
    }
  ],
  "cost": {
    "cost": 0.00042,
    "input_tokens": 215.0,
    "output_tokens": 18.0
  }
}

Attribute meaning:

metrics: one entry for this node (name="grounding"), or empty when skipped
name: metric/node identifier
category: answer (from MetricCategory.ANSWER)
score: supported-claim ratio in [0,1]
result: per-claim faithfulness records
result[].item: claim text and token metadata
result[].source_chunk_index: generation chunk where claim came from
result[].confidence: extractor confidence for the claim
result[].extraction_failed: extraction failure marker
result[].verdict: ACCEPTED or REJECTED
error: populated when verdict parsing fails (for example "No verdicts returned")
cost.cost: USD cost estimate/actual for this node call
cost.input_tokens, cost.output_tokens: model token usage (or null for zero-cost skips)

Usage

bash

OUTPUT_DIR=./out/grounding
mkdir -p "$OUTPUT_DIR"

Estimate Cost

bash

nexagauge estimate grounding \
  --input ./sample.json \
  --limit 5 \
  | tee "$OUTPUT_DIR/estimate.txt"

Note: estimate currently supports --input and --limit, but not --output-dir; use tee to save estimate output.

Run Evaluation

bash

nexagauge run grounding \
  --input ./sample.json \
  --limit 5 \
  --output-dir "$OUTPUT_DIR"

For full per-case report files that include grounding plus other metrics:

bash

nexagauge run eval \
  --input ./sample.json \
  --limit 5 \
  --output-dir "$OUTPUT_DIR"

Grounding (grounding)

Overview

Use Case

Node Overview (nexa-gauge)

Execution Flow

Input

Output

Usage

Estimate Cost

Run Evaluation

Grounding (`grounding`)