Skip to content

CLI Reference

This page summarizes the primary evaluator CLI commands and how they are typically used together.

Command map

Command Purpose Typical output
init-template scaffold minimal valid bundle bundle directory
evaluate compute machine-readable gate/tier outputs result.json
report render markdown summary report stage_gate_report.md
submission-review run deterministic reruns + workflow guidance submission_review.json, submission_review.md
reference-vectors emit compatibility vectors for cross-impl checks reference vector files

evaluate

python -m automechinterp_evaluator.cli evaluate \
  --bundle /path/to/bundle \
  --output /path/to/bundle/result.json

Use this as the canonical machine-readable record for downstream tooling.

report

python -m automechinterp_evaluator.cli report \
  --bundle /path/to/bundle \
  --output /path/to/bundle/stage_gate_report.md

Use this for human review, writeups, and triage meetings.

submission-review

python -m automechinterp_evaluator.cli submission-review \
  --bundle /path/to/bundle \
  --reruns 3 \
  --output-json /path/to/bundle/submission_review.json \
  --output-md /path/to/bundle/submission_review.md

Recommended for any external submission or reproducibility claim.

reference-vectors

python -m automechinterp_evaluator.cli reference-vectors

Use this to test behavioral compatibility when implementing third-party evaluators.

Command discovery

For authoritative flag details:

python -m automechinterp_evaluator.cli --help
python -m automechinterp_evaluator.cli <command> --help