How To Run In CI¶
Baseline CI pipeline¶
python -m pytest packages/evaluator/tests -q
python -m pytest packages/runner/tests -q
python scripts/generate_docs_from_json.py
Deterministic checks in CI¶
For at least one representative bundle:
python -m automechinterp_evaluator.cli submission-review \
--bundle main/output/real_multi_task/ioi_v0_gpt2-small \
--reruns 3
Recommended CI artifacts¶
result.jsonstage_gate_report.mdsubmission_review.jsonsubmission_review.md- reproducibility audit outputs
Release-gating recommendation¶
Block release if:
- tests fail
- deterministic rerun mismatch appears
- required docs generation fails