Practical Workflows

Tutorials

End-to-end workflows for coverage analysis, stress testing, standards checks, and reproducible reporting.

Tutorial A: Coverage + Failure Analysis

python main/summarize_real_multi_task.py
python main/analyze_real_bundle_failures.py

Outputs:

Tutorial B: Multi-Lane Bundle Build

python main/build_multilane_real_bundles.py
python main/summarize_benchmark_breadth.py

Builds evaluated multi-lane bundles from real artifacts and summarizes breadth across task/model/lane/provider strata.

Tutorial C: Community Workflow Impact

python main/run_community_submission_demo.py

Shows deterministic review outputs that alter practical decisions (publish, hold, or reject).

Tutorial D: Stress and Ablation Diagnostics

python main/stress_test_ablation.py --bundle-dir main/output/real_multi_task/ioi_v0_gpt2-small
python main/stress_test_agnostic.py --bundle-dir main/output/real_multi_task/ioi_v0_gpt2-small
python main/stress_test_red_team.py --bundle-dir main/output/real_multi_task/ioi_v0_gpt2-small
python main/gate_ablation_study.py --bundle-dir main/output/real_multi_task/ioi_v0_gpt2-small

Tutorial E: Reporting and Summaries

python main/field_level_findings.py
python main/runtime_cost_report.py --reruns 3

Produces stratified findings, policy sensitivity tables, and runtime/cost summaries for sharing results.

Tutorial F: Standards Interop Check

python -m automechinterp_evaluator.cli reference-vectors

Use these vectors to validate compatible third-party evaluator implementations.