Azimuth public benchmark (in silico, pack-bound)

Helix supports deterministic, offline-verifiable AUROC evaluation against frozen public benchmarks via evaluation packs.

This page records the current, externally defensible Azimuth replication on the frozen evaluation pack:

evaluation_pack_id: helix_eval_v1_azimuth_public
Pack manifest SHA256: 4fc8e5c82ff2e0e4d6bb9ec000de3a5f873c1f5b00e7be46f3c0ba2de5bb2848
Labels SHA256: 2d4231b24fda7a429cb21cf5fce35db8c90f227d3d66d8650abb62aa95228a3d

!!! important AUROC reported by Helix is computed only against a declared in silico evaluation pack. It is not a claim of biological validation, real‑world performance, or real‑world accuracy.

One-command run (evidence bundle emitted)

If the OnTargetX bundle is missing, run:

make train-ontargetx-azimuth

make eval-azimuth-auroc-helix-ontarget

Outputs (deterministic paths):

artifacts/evaluations/azimuth_public/physics_helix_batch_report.zip
artifacts/evaluations/azimuth_public/ontargetx_helix_batch_report.zip
artifacts/evaluations/azimuth_public/physics_transcript.txt
artifacts/evaluations/azimuth_public/ontargetx_transcript.txt

Current scorer configuration (default)

OnTargetX (Helix) is evaluated as an on-target guide activity-like score source:

embedding_id: seq30_v3_kmer2
model_kind: logit_residual_v2 (affine baseline + residual logit, novelty-gated)
Learner: logit_residual_logistic_l2_activity_v2 (trained against the pack’s continuous activity field; evaluated via pack labels)
OnTargetX bundle dir: models/ontargetx/helix_eval_v1_azimuth_public/v3_kmer2_logistic_activity_v2
bundle.json SHA256: a240b735c09f8cafec4a737fe60825cd14223aad8c5e2b3072375dfe3180e57b

Physics is included as the fixed baseline comparator:

score_source: helix_crispr_on_target_physics

Recorded results (pack-bound)

From make eval-azimuth-auroc-helix-ontarget:

Physics AUROC: macro 0.5277, pooled 0.5167
OnTargetX AUROC: macro 0.7489, pooled 0.7383
Δ macro AUROC: +0.2212 (computed only when pack id + digest match)

From make train-ontargetx-azimuth (holdout is the primary generalization gate):

Holdout macro AUROC: 0.7170
Full-pack macro AUROC: 0.7489
Full-pack pooled AUROC: 0.7383

Training determinism check (recommended once per release)

Train twice with the same container digest and confirm hashes match:

make train-ontargetx-azimuth
sha256sum \
  models/ontargetx/helix_eval_v1_azimuth_public/v3_kmer2_logistic_activity_v2/tensors.npz \
  models/ontargetx/helix_eval_v1_azimuth_public/v3_kmer2_logistic_activity_v2/training_receipt.sha256
make train-ontargetx-azimuth
sha256sum \
  models/ontargetx/helix_eval_v1_azimuth_public/v3_kmer2_logistic_activity_v2/tensors.npz \
  models/ontargetx/helix_eval_v1_azimuth_public/v3_kmer2_logistic_activity_v2/training_receipt.sha256

How to reproduce this transcript

make build-eval-pack-azimuth
make train-ontargetx-azimuth
make eval-azimuth-auroc-helix-ontarget

Expected digests (pack-bound):

Pack manifest SHA256: 4fc8e5c82ff2e0e4d6bb9ec000de3a5f873c1f5b00e7be46f3c0ba2de5bb2848
Labels SHA256: 2d4231b24fda7a429cb21cf5fce35db8c90f227d3d66d8650abb62aa95228a3d
bundle.json SHA256: a240b735c09f8cafec4a737fe60825cd14223aad8c5e2b3072375dfe3180e57b
tensors.npz SHA256: f862022e7517b0916a568ae5fc62f359d505fb6f8aaef7590139ca1c99ecd047
training_receipt.json SHA256: ed0c158639ed1c98e40da9898e1f6a7172d47cbdf065d9a011c82415676b0599