Results, Risks, and Limits
Result Surfaces Produced by Current Workflows
- Decoder replay matrix summaries.
- Source-vs-reference deltas.
- Bootstrap confidence intervals.
- Request-equivalence divergence metrics.
- Scaling and parametric trend plots (optional stages).
Trust Risks to Monitor
- Dependency drift across PennyLane/Qiskit/Cirq versions.
- Schema drift in request/response NDJSON payloads.
- Config mismatch between CLI and App runs.
- Silent fallback behavior when optional dependencies are absent.
- Over-interpretation of finite-shot differences.
Mitigations
- pin and record environment versions,
- keep schema checks in replay/analysis stages,
- persist run manifests and seed/config snapshots,
- compare against LiDMaS+ reference dataset per decoder.
Current Boundaries
- Engineering-grade reproducibility is provided.
- Formal proof-assistant verification is not yet integrated.
- Hardware calibration validity is external to this repository.
Open Problems
- Formalizing proof obligations into machine-checkable contracts.
- Stronger automated parity checks across CLI and App routes.
- Wider stress tests for cross-family normalization assumptions.