Evidence ladder · L0–L5

Evidence ladder

Each tier adds a kind of evidence a system is allowed to use. The ladder turns a vague question — does more evidence help? — into a measurable one.

The evidence ladder is not an implementation detail. It is one of PWM-Bench's core scientific contributions, because it lets the field measure which kinds of evidence actually improve person-specific forecasting skill.

Tiers are cumulative in access: a system at tier Lk may use all evidence from L0…Lk. Hypothesis H2 asks whether realized skill is non-decreasing in tier — an empirical question, not an assumption.

  1. L0

    Calendar + communications metadata

    Structured, low-content signals: calendar entries, and the timing/direction/counterparties of communications — without message content.

    Examples: Calendar events and changes · Message/call timestamps, direction, counterparties · Coarse app/usage timing

    Powers: Defines the L0 baselines (R1, R2) and the Digital Exhaust Model.

    ⚠ Sensitivity: Lower content sensitivity, but still reveals social graph and rhythms.

  2. L1

    Text evidence

    The content of the participant's text: messages, notes, documents the participant authors or receives.

    Examples: Chat / message history · Personal notes and documents · Email body text

    Powers: The Chat History Model and any content-aware text system.

    ⚠ Sensitivity: High. Contains third-party content and private subject matter.

  3. L2

    Text + audio transcript

    Adds transcribed speech to text evidence — conversations and spoken context rendered as text.

    Examples: Transcribed conversations · Voice notes · Meeting transcripts

    Powers: Systems testing whether spoken context adds skill over written text alone.

    ⚠ Sensitivity: Very high. Captures bystanders and ambient speech.

  4. L3

    Multimodal passive evidence

    A passive observational stream combining visual, audio, and screen context captured during consented observation windows.

    Examples: Egocentric / scene video · Ambient audio · Screen context

    Powers: The Passive Observation Model and the Combined Evidence Model.

    ⚠ Sensitivity: Extreme. Video, audio, and screens capture bystanders, children, addresses, and on-screen private information.

  5. L4

    Location / behavioral traces

    Spatial and behavioral traces: location, movement, and device-level behavioral signals over time.

    Examples: Location traces · Movement / mobility patterns · Device behavioral logs

    Powers: Systems testing whether spatial/behavioral context improves forecasting of deviations and transitions.

    ⚠ Sensitivity: Extreme. Location is re-identifying and reveals home, work, and routines.

  6. L5

    Physiological signals

    Body-level signals: heart rate, sleep, activity, and other physiological measurements.

    Examples: Heart-rate / HRV · Sleep and activity · Other wearable physiology

    Powers: Systems testing whether internal-state signals anticipate goal-state transitions (PWM-X).

    ⚠ Sensitivity: Extreme. Potentially health-revealing; may implicate medical information.

The questions the ladder lets us ask

Are self-reports enough?

Is digital exhaust enough?

Does passive observation matter?

Does richer multimodal evidence improve predictive skill?

How much evidence is required to maintain an accurate estimate of a person's evolving state?

Higher tiers are progressively more sensitive. The evidence ladder is inseparable from the governance and ethics commitments — richer evidence is only admissible under federated execution, consent, and aggregate-only reporting. Tiers render from data/evidenceTiers.json.