Why It Matters · AutoMechInterp

Core Scientific Question

When do mechanistic-interpretability claims fail under principled controls, robustness checks, and preregistered statistical policy?

AutoMechInterp is the instrument that lets this question be answered repeatedly across tasks, models, and lanes.

Understand Real Failure Patterns

python main/field_level_findings.py

Produces stratified outcomes by task, model, claim family, lane, and provider, plus top failure gates and policy sensitivity outputs.

Test Robustness with Stress Suites

python main/stress_test_ablation.py --bundle-dir main/output/real_multi_task/ioi_v0_gpt2-small
python main/stress_test_agnostic.py --bundle-dir main/output/real_multi_task/ioi_v0_gpt2-small
python main/stress_test_red_team.py --bundle-dir main/output/real_multi_task/ioi_v0_gpt2-small

Runs complementary stress tests to show how robust your claim-evaluation setup is under different failure pressures.

Why This Matters for Practitioners

It prevents publishing fragile interpretability claims that do not survive controls.
It gives clear failure reasons so teams know what experiment to run next.
It makes results auditable and comparable across different discovery methods.

Explore Next

Start with the Quickstart, then move to Community Submissions and Standards for interoperability details.