Verifier-First Benchmark

AutoMechInterp Documentation

Deterministic verification for mechanistic-interpretability claims. Discovery systems propose; the evaluator checks evidence quality and returns clear outcomes for follow-up.

Start Quickstart Submit a Bundle Run Audit

What This Benchmark Contributes

AutoMechInterp is a scientific reliability framework: it measures when mechanistic claims survive causal controls, robustness checks, and statistical policy, not just whether a pipeline can produce candidate explanations.

Core output Typed gate outcomes, evidence tiers, and workflow actions that are stable under deterministic reruns.

Independent Submission Workflow

  1. Initialize bundle contract from template.
  2. Run Stage-2 interventions and populate artifacts.
  3. Evaluate and generate workflow-oriented submission review.
  4. Use failed-gate remediations to plan next experiments.

Open full submission guide

Evidence Tiers

TierDecision
cross_model_confirmedReady to share with transfer evidence
single_model_confirmedReady to share in single-model setting
causal_tested_unstableHold and run robustness follow-up
suggestiveExploratory only; run confirmatory split
rejectedNeeds revision before claiming mechanism

Open-Source Project Assets

  • Stratified field-level findings and failure concentration analysis
  • Evaluator-agnostic and adaptive adversarial stress diagnostics
  • Runtime/cost envelope and exact reproducibility runbook
  • Claim bundle specification + compatibility vectors

Why this project matters

One Command Rebuild

python main/reproducibility_audit.py

This regenerates environment manifests, breadth summaries, field-level findings, stress outputs, runtime/cost diagnostics, and community-value reports under main/output/repro.