Compared To Discovery Tools¶

AutoMechInterp is not a replacement for discovery or intervention frameworks. It is a verifier layer that sits downstream of them.

Tool landscape¶

Tool / framework	Primary purpose	Typical role in pipeline
TransformerLens	activation inspection, patching, circuit analysis	generate candidate mechanistic hypotheses
nnsight	model internals inspection/intervention at scale	collect intervention traces and candidate explanations
pyvene	declarative intervention experiments	run causal tests and produce structured intervention outputs
SAELens	sparse autoencoder training/analysis	derive feature-centric hypotheses
Captum	attribution/explainability toolkit	produce attribution evidence for hypothesis generation

If discovery and acceptance are merged, standards often drift with method changes.

A separate verifier layer avoids this by keeping acceptance criteria stable while discovery methods continue to evolve.

Bundle artifact	Upstream source examples
`hypothesis.jsonl`	ranked components, neuron explanations, feature candidates
`evaluation_result.json`	intervention cells, patching traces, causal test outputs
`protocol.yaml`	frozen evaluation policy chosen for this run
`manifest.json`	hash binding over finalized artifacts

This loop is exactly where verification benchmarks add value: they convert “interesting idea” into “evidence that survives scrutiny.”

Discovery tools answer “what might be happening?”

AutoMechInterp answers “is the current evidence strong enough under a fixed contract?”

Both are necessary for robust mechanistic-interpretability science.