Is low acceptance automatically bad?
No. In a verifier-first benchmark, low acceptance can indicate strict false-accept control. What matters is transparent failure decomposition and actionable remediation.
Are synthetic stress tests vulnerable to Goodharting?
Potentially yes. The project now includes suite-targeted, evaluator-agnostic, and adaptive red-team stress regimes, plus a holdout governance roadmap for hidden stress suites.
Can external users submit bundles from any discovery method?
Yes. Discovery is decoupled from verification. Provide the bundle contract and optional lane/provider metadata for stratified reporting.
What if model downloads are unavailable?
Real Stage-2 runs require model availability. You can still validate schema, evaluator behavior, and reproducibility flows in mock mode.
How is GitHub Pages deployment handled?
The repo includes .github/workflows/pages.yml to deploy docs/website on push to main when docs files change.