For research and educational purposes only. Not medical advice.

Wearable sleep tracking vs. polysomnography: how accurate stage detection actually is, and what your sleep score really measures

Wrist wearables show you a sleep architecture chart that looks a lot like polysomnography. The validation literature is more sober: total sleep time is decen…

Category: Sleep. 7 min read. Published 2026-04-27.

What wearables actually measure

Wrist wearables estimate sleep from a small set of inputs: 3-axis accelerometer, photoplethysmography (PPG) for heart rate and heart-rate variability, skin temperature in newer devices, and sometimes blood-oxygen estimates. They do not measure brain activity. The published clinical reference standard for sleep staging is polysomnography, which combines electroencephalography, electrooculography, electromyography, respiratory effort, and oxygen saturation .

The wearable's sleep stage chart is therefore a model output that estimates EEG-derived stages from non-EEG signals. The model is trained against PSG-labeled sleep, but the inputs are fundamentally indirect.

What validation studies actually show

Total sleep time and sleep onset latency: wearables agree with PSG within roughly 15-30 minutes on average in adult populations.
Wake-after-sleep-onset (WASO): biased low; wearables tend to under-detect brief awakenings.
Light vs. deep sleep classification: agreement with PSG is much weaker; epoch-by-epoch accuracy commonly sits in the 50-70 percent range across published validations .
REM detection: more accurate than deep sleep classification but still substantially noisier than PSG.
Performance is worse in populations with sleep disorders, atypical heart-rate patterns, and shift workers.

The sleep score construct

The single 'sleep score' on a wearable is a vendor-defined composite. It blends total time, stage estimates, heart rate, HRV, respiratory rate, and sometimes consistency of timing. Because the input signals are noisy and the weighting is proprietary, the same night of sleep can produce different scores across devices. The score correlates with subjective recovery in many users, but it is not directly comparable across brands or even firmware versions of the same brand.

What this changes for readers

If you are using a wearable to track sleep, the most defensible interpretation is: did total sleep time go up or down, and did bedtime variability go up or down. Stage-level day-to-day comparisons are noisier than the chart suggests. Persistent symptoms (loud snoring, choking awakenings, daytime sleepiness despite adequate hours, insomnia, witnessed apneas) are clinical questions that warrant evaluation regardless of what the wearable shows .

References

[1] PubMed search: polysomnography sleep staging reference standard (PubMed)
[2] PubMed search: wearable sleep tracker polysomnography validation accuracy (PubMed)
[3] CDC sleep and sleep disorders resource (CDC)