AutoMechInterp¶
AutoMechInterp is an open-source benchmark for mechanistic-interpretability claim verification.
It gives teams a deterministic acceptance contract: the same bundle, protocol, and environment should produce the same gate outcomes and evidence tier every time.
Why this benchmark matters: mechanistic claims can look convincing under one intervention setup and fail under controls, robustness tests, or stricter statistical policy. AutoMechInterp standardizes those checks so evidence quality is comparable across tasks, models, and discovery pipelines.
What AutoMechInterp is and is not¶
AutoMechInterp is a verifier contract, not a mechanism-discovery algorithm.
- It accepts or rejects claim evidence under a frozen policy.
- It emits gate-level diagnostics so failures are actionable.
- It can evaluate bundles from many upstream discovery methods.
It does not claim to prove universal circuit ground truth by itself.
What you get¶
- :material-shield-check: **Deterministic verification contract**
A fixed gate policy with explicit outcomes (`pass`, `fail`, `not_evaluated`) and stable tier mapping.
- :material-file-document-outline: **Portable bundle standard**
Common artifacts (`protocol.yaml`, `hypothesis.jsonl`, `evaluation_result.json`, `manifest.json`) that any lab can produce.
- :material-chart-box-outline: **Actionable failure reports**
Claim-level failure decomposition and remediation guidance, not just a single summary label.
- :material-lan-connect: **Lane-agnostic acceptance logic**
Discovery methods can vary widely while acceptance standards stay fixed and comparable.
Who this is for¶
- Researchers publishing mechanistic claims.
- Labs validating claims from multiple discovery lanes.
- Tool builders implementing compatible evaluators.
- External collaborators who need reproducible acceptance criteria.
Stage-gate flow¶
- A discovery lane proposes candidate mechanistic hypotheses.
- Hypotheses and intervention outputs are serialized into a bundle.
- The evaluator applies causal, controls, robustness, and statistical gates.
- Evidence tiers and diagnostics drive publication decisions and next experiments.
Documentation map¶
Start quickly¶
Understand the contract¶
Build and submit bundles¶
Compare with adjacent work¶
Legacy website access¶
The previous static site is preserved and still accessible:
https://fcistud.github.io/mechanistic-interpretability/legacy/index.html