Skip to content

How To Run In CI

Baseline CI pipeline

python -m pytest packages/evaluator/tests -q
python -m pytest packages/runner/tests -q
python scripts/generate_docs_from_json.py

Deterministic checks in CI

For at least one representative bundle:

python -m automechinterp_evaluator.cli submission-review \
  --bundle main/output/real_multi_task/ioi_v0_gpt2-small \
  --reruns 3
  • result.json
  • stage_gate_report.md
  • submission_review.json
  • submission_review.md
  • reproducibility audit outputs

Release-gating recommendation

Block release if:

  • tests fail
  • deterministic rerun mismatch appears
  • required docs generation fails