Validation and Benchmarks
Validation Suite
Cross-runtime validation script:
bash scripts/cv-validation-suite.sh --runtime both
Coverage cases include:
- single-mode squeezing,
- two-mode EPR + loss,
- thermal + heterodyne,
- single-mode Fock response.
Trace SLO Gate
bash scripts/trace-slo-check.sh --max-gates 512 --max-frame-ms 50
Compute Benchmarking
Smoke benchmark:
bash scripts/compute-benchmark.sh --backend gaussian --modes 2 --layers 16 --iterations 2 --compute-backends auto,cpu,metal --max-p95-ms 500
Scaling benchmark:
bash scripts/compute-benchmark-scaling.sh ...
Baseline Artifacts
benchmarks/schrosim-core-baseline.jsonbenchmarks/schrosim-core-scaling-baseline.jsonbenchmarks/compute-scaling-baseline.json
Benchmark Interpretation Guide
Use the benchmark scripts as gates, not just telemetry.
For compute-benchmark.sh:
- Primary latency gate is
--max-p95-ms(if provided). - Fail means the requested compute backend exceeded that p95 budget.
For compute-benchmark-scaling.sh:
- Baseline comparison is enabled with
--baseline <file>. - Baseline identity must match current run dimensions:
- same simulation backend,
- same compute backend list,
- same modes list,
- same layers list.
- Backend routing consistency is strict:
compute_backend_candidatemust match baseline,compute_backend_usedmust match baseline.
Latency regression envelope (current implementation):
- relative allowance:
max-regression-pct(default180), - absolute floors:
p95: baseline +25 ms,avg: baseline +15 ms.
A row fails only when it exceeds the larger of:
- baseline * (1 +
max-regression-pct/100), or - baseline + absolute floor.
This avoids noisy failures for very small baselines while still catching large real regressions.
Practical CI Usage
Recommended gate order:
bash scripts/cv-validation-suite.sh --runtime bothbash scripts/trace-slo-check.sh --max-gates 512 --max-frame-ms 50bash scripts/compute-benchmark.sh ... --max-p95-ms <budget>bash scripts/compute-benchmark-scaling.sh --baseline benchmarks/compute-scaling-baseline.json --max-regression-pct 180
If step 4 fails on routing mismatch, treat it as a policy/routing regression, not a raw latency fluctuation.