Evidence ladder · L0–L5

Evidence ladder

Each tier adds a kind of evidence a system is allowed to use. The ladder turns a vague question — does more evidence help? — into a measurable one.

The evidence ladder is not an implementation detail. It is one of PWM-Bench's core scientific contributions, because it lets the field measure which kinds of evidence actually improve person-specific forecasting skill.

Tiers are cumulative in access: a system at tier Lk may use all evidence from L0…Lk. Hypothesis H2 asks whether realized skill is non-decreasing in tier — an empirical question, not an assumption.

L0
Calendar + communications metadata
Structured, low-content signals: calendar entries, and the timing/direction/counterparties of communications — without message content.
Examples: Calendar events and changes · Message/call timestamps, direction, counterparties · Coarse app/usage timing
Powers: Defines the L0 baselines (R1, R2) and the Digital Exhaust Model.
⚠ Sensitivity: Lower content sensitivity, but still reveals social graph and rhythms.
L1
Text evidence
The content of the participant's text: messages, notes, documents the participant authors or receives.
Examples: Chat / message history · Personal notes and documents · Email body text
Powers: The Chat History Model and any content-aware text system.
⚠ Sensitivity: High. Contains third-party content and private subject matter.
L2
Text + audio transcript
Adds transcribed speech to text evidence — conversations and spoken context rendered as text.
Examples: Transcribed conversations · Voice notes · Meeting transcripts
Powers: Systems testing whether spoken context adds skill over written text alone.
⚠ Sensitivity: Very high. Captures bystanders and ambient speech.
L3
Multimodal passive evidence
A passive observational stream combining visual, audio, and screen context captured during consented observation windows.
Examples: Egocentric / scene video · Ambient audio · Screen context
Powers: The Passive Observation Model and the Combined Evidence Model.
⚠ Sensitivity: Extreme. Video, audio, and screens capture bystanders, children, addresses, and on-screen private information.
L4
Location / behavioral traces
Spatial and behavioral traces: location, movement, and device-level behavioral signals over time.
Examples: Location traces · Movement / mobility patterns · Device behavioral logs
Powers: Systems testing whether spatial/behavioral context improves forecasting of deviations and transitions.
⚠ Sensitivity: Extreme. Location is re-identifying and reveals home, work, and routines.
L5
Physiological signals
Body-level signals: heart rate, sleep, activity, and other physiological measurements.
Examples: Heart-rate / HRV · Sleep and activity · Other wearable physiology
Powers: Systems testing whether internal-state signals anticipate goal-state transitions (PWM-X).
⚠ Sensitivity: Extreme. Potentially health-revealing; may implicate medical information.

The questions the ladder lets us ask

Are self-reports enough?

Is digital exhaust enough?

Does passive observation matter?

Does richer multimodal evidence improve predictive skill?

How much evidence is required to maintain an accurate estimate of a person's evolving state?

Higher tiers are progressively more sensitive. The evidence ladder is inseparable from the governance and ethics commitments — richer evidence is only admissible under federated execution, consent, and aggregate-only reporting. Tiers render from data/evidenceTiers.json.

Calendar + communications metadata

Text evidence

Text + audio transcript

Multimodal passive evidence

Location / behavioral traces

Physiological signals

The questions the ladder lets us ask