Tutorials · AutoMechInterp

Tutorial A: Coverage + Failure Analysis

python main/summarize_real_multi_task.py
python main/analyze_real_bundle_failures.py

Outputs:

main/output/real_multi_task/coverage_summary.json
main/output/real_multi_task/failure_mode_summary.json

Tutorial B: Multi-Lane Bundle Build

python main/build_multilane_real_bundles.py
python main/summarize_benchmark_breadth.py

Builds evaluated multi-lane bundles from real artifacts and summarizes breadth across task/model/lane/provider strata.

Tutorial C: Community Workflow Impact

python main/run_community_submission_demo.py

Shows deterministic review outputs that alter practical decisions (publish, hold, or reject).

Tutorial D: Stress and Ablation Diagnostics

python main/stress_test_ablation.py --bundle-dir main/output/real_multi_task/ioi_v0_gpt2-small
python main/stress_test_agnostic.py --bundle-dir main/output/real_multi_task/ioi_v0_gpt2-small
python main/stress_test_red_team.py --bundle-dir main/output/real_multi_task/ioi_v0_gpt2-small
python main/gate_ablation_study.py --bundle-dir main/output/real_multi_task/ioi_v0_gpt2-small

Tutorial E: Reporting and Summaries

python main/field_level_findings.py
python main/runtime_cost_report.py --reruns 3

Produces stratified findings, policy sensitivity tables, and runtime/cost summaries for sharing results.

Tutorial F: Standards Interop Check

python -m automechinterp_evaluator.cli reference-vectors

Use these vectors to validate compatible third-party evaluator implementations.