MEASURE Function
The MEASURE function translates identified AI risks into quantitative and qualitative metrics that can be tracked over time. It answers the question: how much risk is present, and is it getting better or worse?
MEASURE is where VeriProof’s governance scoring and session analytics capabilities are most directly applicable. The metrics you configure define the production risk signal; VeriProof’s infrastructure captures and stores it in a form that’s both queryable and auditable.
Relevant MEASURE Categories
MEASURE 1 — Metrics and Methods
MEASURE 1.1 Approaches to evaluate AI risks are in place.
Your governance scoring configuration is the operationalisation of this practice. Each scoring dimension represents a measurable risk metric. To review your current governance scoring setup, go to Settings → Governance Policies in the Customer Portal. Each policy entry shows the policy type, current enforcement mode (Audit, Warn, or Block), and any configured thresholds.
Document each dimension with:
- The risk it measures (from your MAP risk identification)
- Why the threshold value was chosen (from benchmark data, regulation, or expert judgement)
- The measurement granularity (per-session, rolling average, etc.)
MEASURE 1.3 Internal experts and affected communities are involved in risk evaluation.
When your governance dimensions include signals from user feedback or escalation paths,
VeriProof captures these alongside automated signals. Custom metadata fields allow
you to capture human review outcomes. Use the SDK’s session.add_metadata() method
to attach reviewer ID, outcome, and notes to a flagged session after human review.
Aggregate human review outcomes as a calibration signal against your automated governance scores.
MEASURE 2 — Risk Metrics in Practice
MEASURE 2.1 System performance metrics are captured.
VeriProof captures the following metrics for every session:
| Metric | Field | Notes |
|---|---|---|
| Governance score | governance_score | Composite score from all active dimensions |
| Dimension scores | governance_dimension_scores.* | Per-dimension scores |
| Input token count | metadata.input_token_count | If emitted by adapter |
| Output token count | metadata.output_token_count | If emitted by adapter |
| Latency | metadata.latency_ms | End-to-end processing time |
| Model confidence | metadata.confidence_score | If emitted by model/adapter |
| Safety classification | metadata.safety_score | If emitted by safety classifier |
MEASURE 2.5 Privacy risks are evaluated.
VeriProof’s GDPR data subject management directly supports this practice:
- Sessions are linked to data subjects when personal data processing is involved
- Erasure workflow tracks the privacy risk lifecycle from subject creation to erasure
- Legal holds prevent premature erasure when regulatory retention applies
For a privacy risk report, open Compliance → Evidence Exports in the Customer Portal. Select NIST AI RMF as the framework, check the MEASURE function, and click Download Evidence Pack (PDF). The privacy & data rights section of the package includes data subject counts, erasure completion rates, and legal hold inventory.
MEASURE 2.8 AI system outputs are evaluated for trustworthiness.
Blockchain anchoring is the technical mechanism for output trustworthiness measurement. Every session record has a Merkle root anchored on Solana. Any tampering with the record after anchoring breaks the cryptographic chain.
To verify a specific session’s integrity, open the session detail view in the Customer Portal and click Verify Blockchain Proof. The portal checks the stored Merkle root against the current on-chain state and returns a pass or fail result. For a trustworthiness summary across all sessions, the MEASURE section of the evidence package (generated via Compliance → Evidence Exports) includes the total anchored session count and the verification pass rate for the period.
A trustworthiness summary across all sessions in a period (total anchored, verification pass rate) is included in MEASURE evidence packages.
MEASURE 2.11 Fairness indicators are tracked.
Define fairness policies in Settings → Governance Policies in the Customer Portal.
For example, you can add a min_guardrail_pass_rate policy to enforce a minimum pass
rate, or a requires_grounding policy for outputs that must be grounded in verified
sources. Each policy type has an enforcement mode (Audit, Warn, or Block) that controls
whether violations are logged only or actively prevented.
Capture a user_group metadata field in your adapter to enable group-level consistency
analysis — differential quality or refusal rates across groups are the primary
fairness signals at inference time.
MEASURE 3 — Impact Assessment Metrics
MEASURE 3.3 Metrics are available for AI impact assessment.
VeriProof’s periodic evidence export generates the session-level and aggregate metrics used in impact assessments. Open Compliance → Evidence Exports in the Customer Portal, select NIST AI RMF as the framework, check the MEASURE function, set your report period, and click Download Evidence Pack (PDF).
The MEASURE section includes: governance score distribution (mean, p10, p50, p90, p99), fairness dimension summaries, trustworthiness verification rate, alert trigger counts by dimension, and statistical comparison to the prior period.
Next Steps
- MANAGE function — responding to what MEASURE finds
- GOVERN function — policies that define MEASURE thresholds
- Governance Scoring guide — complete scoring configuration reference