Variational Eigensolver Benchmarks on Current Hardware
A summary of our internal benchmarking of VQE and its variants across three quantum hardware platforms, with observations on noise, shot efficiency, and practical accuracy ceilings.
The variational quantum eigensolver (VQE) remains the dominant paradigm for near-term quantum chemistry. Its resilience to hardware noise — relative to phase estimation and other fault-tolerant methods — makes it the natural starting point for any laboratory doing quantum chemistry on today's devices.
This note summarises observations from an internal benchmarking exercise conducted across three superconducting qubit platforms over the past quarter. The molecules studied were H₂, LiH, and the first-row transition metal complex FeCO, chosen to probe progressively harder electronic structure problems.
Shot efficiency and measurement overhead
The dominant cost in VQE on real hardware is not gate execution — it is measurement. Computing the expectation value of a molecular Hamiltonian requires decomposing it into Pauli operators and measuring each term separately. For FeCO in a minimal basis, this amounts to several thousand distinct Pauli strings.
Across all platforms, we found that naively grouping commuting operators and allocating shots uniformly underperformed more adaptive schemes by a factor of two to four in wall-clock time for equivalent variance reduction. Classical shadow techniques offered the best shot efficiency, at the cost of increased classical post-processing.
Noise and effective circuit depth
All three platforms imposed a practical ceiling on circuit depth — beyond which coherence loss dominated the signal. For the platforms tested, this ceiling ranged from 40 to 120 CNOT-equivalent gates depending on qubit connectivity and error mitigation applied.
This depth constraint is binding for ansätze designed without hardware awareness. Chemically motivated ansätze (k-UpCCGSD, ADAPT-VQE) outperformed hardware-agnostic designs at equivalent depth, consistent with prior literature. However, even the best hardware-native ansätze showed systematic underestimation of correlation energy for FeCO — a reminder that the gap between "runs on hardware" and "chemically accurate" remains wide.
Practical accuracy
For H₂ and LiH, chemical accuracy (±1 kcal/mol) was achievable with error mitigation. For FeCO, the best results showed errors of 5–12 kcal/mol depending on active space choice and mitigation strategy. This is useful for qualitative comparison but insufficient for predictive catalysis.
Our working conclusion
Current quantum hardware is most productively used not as a standalone computation engine but as a generator of physically constrained training data for ML surrogates. We use VQE outputs as high-fidelity data points that anchor a classical model, rather than as the primary computational path. This hybrid strategy is the current centre of gravity of our Molecular Quantum Simulation program.
We expect the accuracy ceiling to rise as error correction matures. The infrastructure we are building now is designed to absorb that improvement without architectural change.