10-Minute Setup

Quickstart

From clean clone to first verified bundle with deterministic workflow outputs.

Step 1: Verify Environment

python -m pytest packages/evaluator/tests -q
python -m pytest packages/runner/tests -q

Step 2: Create a Bundle

python -m automechinterp_evaluator.cli init-template \
  --output-dir /tmp/my_bundle

This writes protocol.yaml, hypothesis.jsonl, evaluation_result.json, and manifest.json.

Step 3: Generate Candidate Hypotheses (Optional)

python -m automechinterp_evaluator.cli generate-agent-output \
  --bundle /tmp/my_bundle --count 3 --overwrite

python -m automechinterp_evaluator.cli generate-hypotheses \
  --bundle /tmp/my_bundle \
  --agent-output /tmp/my_bundle/agent_output.json \
  --overwrite

Step 4: Run Stage-2 and Evaluate

python -m automechinterp_runner.cli run \
  --bundle /tmp/my_bundle --mode mock --device cpu --examples-per-cell 20

python -m automechinterp_evaluator.cli evaluate \
  --bundle /tmp/my_bundle --output /tmp/my_bundle/result.json

python -m automechinterp_evaluator.cli report \
  --bundle /tmp/my_bundle --output /tmp/my_bundle/stage_gate_report.md

Step 5: Generate Workflow Review

python -m automechinterp_evaluator.cli submission-review \
  --bundle /tmp/my_bundle --reruns 3 \
  --output-json /tmp/my_bundle/submission_review.json \
  --output-md /tmp/my_bundle/submission_review.md

Maps failed gates to concrete next actions and verifies deterministic decision stability.

Step 6: Compatibility Check

python -m automechinterp_evaluator.cli reference-vectors

Validates canonical tiering behavior against shared reference vectors.