Tutorial A: Coverage + Failure Analysis
python main/summarize_real_multi_task.py
python main/analyze_real_bundle_failures.py
Outputs:
main/output/real_multi_task/coverage_summary.jsonmain/output/real_multi_task/failure_mode_summary.json
Tutorial B: Multi-Lane Bundle Build
python main/build_multilane_real_bundles.py
python main/summarize_benchmark_breadth.py
Builds evaluated multi-lane bundles from real artifacts and summarizes breadth across task/model/lane/provider strata.
Tutorial C: Community Workflow Impact
python main/run_community_submission_demo.py
Shows deterministic review outputs that alter practical decisions (publish, hold, or reject).
Tutorial D: Stress and Ablation Diagnostics
python main/stress_test_ablation.py --bundle-dir main/output/real_multi_task/ioi_v0_gpt2-small
python main/stress_test_agnostic.py --bundle-dir main/output/real_multi_task/ioi_v0_gpt2-small
python main/stress_test_red_team.py --bundle-dir main/output/real_multi_task/ioi_v0_gpt2-small
python main/gate_ablation_study.py --bundle-dir main/output/real_multi_task/ioi_v0_gpt2-small
Tutorial E: Reporting and Summaries
python main/field_level_findings.py
python main/runtime_cost_report.py --reruns 3
Produces stratified findings, policy sensitivity tables, and runtime/cost summaries for sharing results.
Tutorial F: Standards Interop Check
python -m automechinterp_evaluator.cli reference-vectors
Use these vectors to validate compatible third-party evaluator implementations.