Validation and Benchmarks

Validation Suite

Cross-runtime validation script:

bash scripts/cv-validation-suite.sh --runtime both

Coverage cases include:

bash scripts/trace-slo-check.sh --max-gates 512 --max-frame-ms 50

Smoke benchmark:

bash scripts/compute-benchmark.sh --backend gaussian --modes 2 --layers 16 --iterations 2 --compute-backends auto,cpu,metal --max-p95-ms 500

Scaling benchmark:

bash scripts/compute-benchmark-scaling.sh ...

Use the benchmark scripts as gates, not just telemetry.

For compute-benchmark.sh:

For compute-benchmark-scaling.sh:

Latency regression envelope (current implementation):

A row fails only when it exceeds the larger of:

This avoids noisy failures for very small baselines while still catching large real regressions.

Recommended gate order:

bash scripts/cv-validation-suite.sh --runtime both
bash scripts/trace-slo-check.sh --max-gates 512 --max-frame-ms 50
bash scripts/compute-benchmark.sh ... --max-p95-ms <budget>
bash scripts/compute-benchmark-scaling.sh --baseline benchmarks/compute-scaling-baseline.json --max-regression-pct 180

If step 4 fails on routing mismatch, treat it as a policy/routing regression, not a raw latency fluctuation.