Grounding (grounding)
Overview
grounding measures factual faithfulness: are the claims in a model answer actually supported by the provided context?
The metric design aligns with two core papers:
- RAGAS arXiv:2309.15217 frames faithfulness for RAG as a claim-level support check against retrieved passages, without needing a gold reference answer.
- FActScore arXiv:2305.14251 argues that factuality should be evaluated as atomic facts, not a single binary judgment, because long-form outputs often mix correct and incorrect statements.
In practice, this means:
- Break answer content into verifiable claims.
- Check each claim against context evidence.
- Aggregate claim verdicts into a faithfulness score.
nexa-gauge’s grounding node operationalizes this pattern using deduplicated claims from the generation and an LLM judge that returns boolean support verdicts per claim. The final score is the fraction of supported claims.
This metric is especially useful when you care about hallucination control and evidence-backed answering. It evaluates factual support, not style or completeness, so it should be combined with other metrics (for example relevance) for broader quality coverage.
Use Case
Use grounding when you need confidence that outputs stay tied to supplied evidence:
- RAG QA systems (docs, knowledge bases, support bots)
- Compliance/policy workflows where unsupported claims are risky
- Regression testing after retrieval, prompt, or model changes
- Benchmarking hallucination rate across model versions
- Validating claim-level trustworthiness in generated summaries
Node Overview (nexa-gauge)
In nexa-gauge, grounding is an answer-category metric node.
What it does:
- Receives list of
Claimobjects from upstreamClaimsNode. - Receives
contextfrom normalized scanner inputs - Sends one judge prompt with:
- full context text
- numbered claims
- Expects structured output:
{"verdicts": [true/false, ...]} - Maps verdicts to claim-level
Faithfulnessentries with:verdict = "ACCEPTED"iftrueverdict = "REJECTED"iffalse
- Computes score:
supported_claims / evaluated_claims
Skip behavior:
- If no claims, no context, or grounding disabled, returns empty metrics and zero cost.
Execution Flow
Input
Using your sample input:
{
"case_id": "eiffel-tower-basic",
"question": "What is the Eiffel Tower and where is it located?",
"generation": "The Eiffel Tower is a wrought-iron lattice tower located in Paris, France. .......",
"context": "The Eiffel Tower (/ˈaɪfəl/ EYE-fəl; French: Tour Eiffel) is a wrought-iron lattice tower on the Champ de Mars in Paris, France. ......."
}Fields used by the grounding branch:
generation: used (upstream) to create claimscontext: used (directly) as evidence text for support verificationcase_id: used for case identity/reporting, not for scoring logic
Fields not used by grounding:
question: not used bygrounding(used byrelevance)reference: not used bygrounding(used byreference)
Output
Primary output type:
GroundingMetricsmetrics: list[MetricResult]cost: CostEstimate
Example output:
{
"metrics": [
{
"name": "grounding",
"category": "answer",
"score": 0.5,
"result": [
{
"item": {
"id": "a1b2c3d4e5f6a7b8",
"text": "The Eiffel Tower is in Paris, France.",
"tokens": 10.0,
"confidence": 1.0,
"cached": false
},
"source_chunk_index": 0,
"confidence": 0.93,
"extraction_failed": false,
"verdict": "ACCEPTED"
},
{
"item": {
"id": "b2c3d4e5f6a7b8c9",
"text": "The Eiffel Tower is located in Berlin.",
"tokens": 10.0,
"confidence": 1.0,
"cached": false
},
"source_chunk_index": 0,
"confidence": 0.88,
"extraction_failed": false,
"verdict": "REJECTED"
}
],
"error": null
}
],
"cost": {
"cost": 0.00042,
"input_tokens": 215.0,
"output_tokens": 18.0
}
}Attribute meaning:
metrics: one entry for this node (name="grounding"), or empty when skippedname: metric/node identifiercategory:answer(fromMetricCategory.ANSWER)score: supported-claim ratio in[0,1]result: per-claim faithfulness recordsresult[].item: claim text and token metadataresult[].source_chunk_index: generation chunk where claim came fromresult[].confidence: extractor confidence for the claimresult[].extraction_failed: extraction failure markerresult[].verdict:ACCEPTEDorREJECTEDerror: populated when verdict parsing fails (for example"No verdicts returned")cost.cost: USD cost estimate/actual for this node callcost.input_tokens,cost.output_tokens: model token usage (ornullfor zero-cost skips)
Usage
OUTPUT_DIR=./out/grounding
mkdir -p "$OUTPUT_DIR"Estimate Cost
nexagauge estimate grounding \
--input ./sample.json \
--limit 5 \
| tee "$OUTPUT_DIR/estimate.txt"Note: estimate currently supports --input and --limit, but not --output-dir; use tee to save estimate output.
Run Evaluation
nexagauge run grounding \
--input ./sample.json \
--limit 5 \
--output-dir "$OUTPUT_DIR"For full per-case report files that include grounding plus other metrics:
nexagauge run eval \
--input ./sample.json \
--limit 5 \
--output-dir "$OUTPUT_DIR"