T1Next contactPWM-ST1_next_contact
Forecast which person or entity the participant will contact next, or whether they will contact a given entity within a fixed window.
- Example question
- Who will the participant initiate contact with first tomorrow morning (08:00–12:00)?
- Answer space
- A short, person-specific list of frequent contacts plus an explicit “other” / “no contact” option.
["contact_A","contact_B","contact_C","other","no_contact"] - Resolution
- Resolved from communications metadata (first outbound message/call to a contact within the window). Adjudicated by the federated client; only the outcome label is reported.
- Baseline
- R1 population contact base rates; R2 the participant's own recent contact frequency/recency.
- Difficulty
- Heavy-tailed contact distributions make R2 strong. Skill must come from context that shifts the next contact away from routine.
T2Event realizationPWM-ST2_event_realization
Forecast whether a calendar or planned event actually occurs, is cancelled, or moves.
- Example question
- Will the 14:00 meeting on the participant's calendar tomorrow occur as scheduled, move, or cancel?
- Answer space
- Categorical: occurs as scheduled / moves / cancels.
["occurs","moves","cancels"] - Resolution
- Resolved from calendar state and communications metadata at the resolution time. Status is adjudicated against an explicit rubric (e.g., a >30 min shift counts as “moves”).
- Baseline
- R1 population cancellation/move rates; R2 the participant's historical event-realization rates.
- Difficulty
- Base rates are informative; genuine skill requires reading signals that a specific event is at risk.
T3Response behaviorPWM-DT3_response_behavior
Forecast whether the participant replies to a given message and, if so, the response-latency band.
- Example question
- Given this received message, will the participant reply within 1h, within 24h, or not within 24h?
- Answer space
- Ordered bands: reply <1h / reply 1–24h / no reply <24h.
["reply_lt_1h","reply_1_24h","no_reply_24h"] - Resolution
- Resolved from outbound reply timestamps relative to the triggering message. Bands are fixed in advance in the protocol.
- Baseline
- R1 population reply-latency distributions; R2 the participant's per-contact reply history.
- Difficulty
- Strong per-contact priors (R2) set a high bar; skill comes from situational context (workload, location, time of day).
T4Attention allocationPWM-AT4_attention_allocation
Forecast which active project or topic will receive the most attention in the next window.
- Example question
- Which active project will receive the most of the participant's working attention tomorrow?
- Answer space
- A person-specific set of active projects/topics plus “other.”
["project_1","project_2","family","admin","other"] - Resolution
- Resolved from the participant's own end-of-window labelling and/or activity evidence, against a pre-registered attribution rule. Only the resolved label leaves the client.
- Baseline
- R1 population topic priors (weak); R2 the participant's recent attention distribution.
- Difficulty
- Attention is volatile and partly intention-driven. This is where richer evidence (L2–L3) is hypothesised to help most.
T5Routine deviationPWM-ST5_routine_deviation
Forecast deviations from the participant's established routine.
- Example question
- Will the participant's tomorrow deviate from their typical weekday routine on a pre-specified dimension (e.g., start time, commute, core block)?
- Answer space
- Binary or categorical deviation on a pre-registered dimension.
["no_deviation","minor_deviation","major_deviation"] - Resolution
- Resolved by comparing the realized day to a routine model on the pre-specified dimension, against a fixed threshold rubric.
- Baseline
- R1 population deviation rates; R2 the participant's own deviation base rate (a deliberately strong baseline).
- Difficulty
- By construction R2 is hard to beat. Skill requires anticipating the specific causes of deviation, not its average frequency.
Future task families
The same forecast-unit, sealing, and scoring machinery extends to higher-stakes, longer-horizon questions:
Decisions
Forecast the outcome of an upcoming decision the participant faces.
Commitments
Forecast whether a stated commitment is kept, deferred, or dropped.
Long-horizon planning
Forecast multi-week plan realization and re-planning.
Drift
Forecast gradual shifts in priorities and attention over weeks.
Goal-state transitions
Anticipate discrete transitions between goal states (PWM-X).
Tasks render from data/tasks.json.