Skip to content

AutoMechInterp

Interpret Results

fcistud/mechanistic-interpretability

Interpret Results¶

Recommended reading order¶

failed_checks
not_evaluated_checks
gate_outcomes
evidence_tier

This avoids over-indexing on the final label before understanding the causal reason for it.

Evidence tiers in practice¶

cross_model_confirmed: strongest cross-model evidence pathway
single_model_confirmed: accepted within one model context
causal_plus_robustness: meaningful signal, but incomplete acceptance evidence
causal_tested_unstable: causal evidence exists but instability remains
suggestive: weak/incomplete support
rejected: evidence did not satisfy the contract

Failure decomposition workflow¶

For a bundle set:

Count failed gates across all claims.
Group failures by task/model/lane/provider.
Separate missing-evidence failures from contradiction failures.
Prioritize experiments that target the highest-concentration failure modes.

Interpreting `not_evaluated`¶

not_evaluated often indicates missing slice or unavailable transfer evidence, not necessarily negative evidence.

Treat it as a data-completeness signal.

Resubmission strategy¶

Fix structural incompleteness first (slices, schema, manifest integrity).
Then target robustness/method sensitivity failure clusters.
Re-run deterministic review before resubmitting.