← Docs
Helix CLI docs
Browse Helix CLI docs

AUROC delta demo (buyer-facing, offline-verifiable)

Helix computes AUROC for a scoring function against a frozen in silico evaluation pack, and emits offline-verifiable evidence bundles that include the metric, its provenance, and integrity checks.

What this demo proves

One thing only:

Given the same frozen evaluation pack, Candidate A is measurably better than Baseline B, and the proof is offline-verifiable.

Run (2 minutes)

From the repo root:

make demo-auroc-delta

What you should verify (no hand-waving):

  • Container digest (record the exact image digest you ran).
  • Evaluation pack id + pack digest:
    • Pack manifest SHA256 (preferred when present), and
    • Labels SHA256 (fallback digest and cross-check).
  • Rerun stability: run the same command twice and confirm the printed Baseline bundle SHA256 and Candidate bundle SHA256 lines match exactly.
  • Cross-machine stability: run on two machines using the same container digest and confirm the same SHA256 lines match exactly.

Alias (same target):

make demo_auroc_delta

The demo emits:

  • artifacts/demos/auroc_delta/baseline_helix_batch_report.zip
  • artifacts/demos/auroc_delta/candidate_helix_batch_report.zip
  • artifacts/demos/auroc_delta/compare.txt

What the buyer sees (comparison output)

artifacts/demos/auroc_delta/compare.txt includes:

Evaluation Pack: helix_eval_v1_batch_demo
AUROC (demo_top_k_guides, in silico)

Baseline:  0.8750
Candidate: 1.0000
Δ AUROC:  +0.1250

Pack manifest SHA256: <...>
Labels SHA256: <...>
Metric schema: helix.metrics.auroc/v1

Baseline bundle SHA256:  <...>
Candidate bundle SHA256: <...>

Comparability rule:

  • Δ AUROC is shown only when the evaluation pack id matches and the pack digest matches (pack manifest SHA256 if present, otherwise labels SHA256).
  • If the pack differs, output must explain why (for example: Not comparable: evaluation pack mismatch).

Example transcript (recorded)

$ make demo-auroc-delta
Evaluation Pack: helix_eval_v1_batch_demo
AUROC (demo_top_k_guides, in silico)

Baseline:  0.8750
Candidate: 1.0000
Δ AUROC:  +0.1250

Pack manifest SHA256: aa64c7c8026d2eec1c85cbefbe231bd8a08c4afa6572adf7e7fa83c2be0fa3d5
Labels SHA256: e84818f33a81973abf2c93bb816b724323c33816352ab6741c6c1d65a51c0796
Metric schema: helix.metrics.auroc/v1

Baseline bundle SHA256:  6a83ccce5e54ec27f7486f00f506b0a3ff16de923b66dded8a5f6ab5b7628a79
Candidate bundle SHA256: 3ebebdd30c8d6843567b39c5d8e4ffc49b9ab4f9ed2856a6c8b8f5c0b74b9d9c

!!! important AUROC reported by Helix is computed only against a declared in silico evaluation pack. It is not a claim of biological validation, real‑world performance, or real‑world accuracy. The metric is meaningful only within the scope, labels, and versioned provenance of the evaluation pack.

Evidence bundle statement

Each evidence bundle includes:

  1. helix_batch_report/metrics/auroc.json and helix_batch_report/metrics/auroc.sha256
  2. Evaluation pack provenance (full, per pack manifest):
    • helix_batch_report/evaluation_pack/evaluation_pack.json
    • helix_batch_report/evaluation_pack/labels.v1.json
    • helix_batch_report/evaluation_pack/README.md
    • helix_batch_report/evaluation_pack/manifest.json
  3. Batch report metadata with the evaluation fields (pack id + digests, metric schema id/version, tie policy, positive label)
  4. Receipts and hashes needed to verify integrity offline
AUROC, in silico evaluation
Pack: <pack id>
Pack manifest SHA256: <sha256>
Labels SHA256: <sha256>
Metric schema: <schema id>

Cross-machine proof line

To prove cross-machine bit identity, run make demo-auroc-delta on machine B using the same container digest and confirm the printed Baseline bundle SHA256 and Candidate bundle SHA256 lines match machine A.