Community FAQ

Frequently Asked Questions

Concise answers to common community and contributor questions.

Is low acceptance automatically bad?

No. In a verifier-first benchmark, low acceptance can indicate strict false-accept control. What matters is transparent failure decomposition and actionable remediation.

Are synthetic stress tests vulnerable to Goodharting?

Potentially yes. The project now includes suite-targeted, evaluator-agnostic, and adaptive red-team stress regimes, plus a holdout governance roadmap for hidden stress suites.

Can external users submit bundles from any discovery method?

Yes. Discovery is decoupled from verification. Provide the bundle contract and optional lane/provider metadata for stratified reporting.

What if model downloads are unavailable?

Real Stage-2 runs require model availability. You can still validate schema, evaluator behavior, and reproducibility flows in mock mode.

How is GitHub Pages deployment handled?

The repo includes .github/workflows/pages.yml to deploy docs/website on push to main when docs files change.