Skip to content

Evaluator-Agnostic Latent Stress

Goal

Test verifier robustness using perturbations that are not directly defined by named gate templates.

Command

python main/stress_test_agnostic.py \
  --bundle-dir main/output/real_multi_task/ioi_v0_gpt2-small

Why this regime exists

If stress generation is fully evaluator-aware, measured robustness can be inflated by construction.

Latent stress reduces that coupling by perturbing underlying factors instead of directly targeting declared gate families.

Output artifact

  • stress_test_agnostic.json

What to inspect

  • false-accept behavior under latent perturbation families
  • whether failures cluster differently than suite-targeted ablations
  • whether observed weaknesses generalize across bundles