Questions
Frequently asked questions
Is PWM-Bench a model?
No. It is a benchmark framework — a task definition, scoring methodology, baselines, leakage controls, and governance protocol. It does not ship a model.
Is PWM-Bench a dataset?
No. Many datasets may instantiate it. PWM-Bench specifies how forecasts are made, sealed, resolved, and scored; the underlying participant data stays under participant control.
Does PWM-Bench claim to solve person understanding?
No. It proposes a way to measure progress toward it. The benchmark makes the claim falsifiable, not settled.
Why forecasting?
Understanding is latent and cannot be observed directly. Forecasting is observable: if a system understands an individual, it should predict that individual's future better than population knowledge and personal routine.
Why not self-report?
Self-report is valuable but retrospective and incomplete. PWM-Bench tests whether additional evidence improves future predictive accountability — a property self-report alone cannot establish.
Why is this not just personalization?
Personalization predicts outputs. PWM-Bench tests whether a system can forecast the evolving state that generates those outputs — attention, goals, and goal-state transitions — under sealed, prospective conditions.
Are there results yet?
No. The current release is pre-pilot. Scores will be reported in PWM-Pilot. No empirical results are currently on the leaderboard.
Will raw participant data be public?
No. PWM-Bench is designed for federated execution and aggregate-only reporting. Raw data stays under participant control; only resolved outcomes and aggregate metrics leave the client.
What stops a model from just memorizing a person?
The identity-permutation test. If a system's apparent skill survives when forecasts are scored against the wrong individual, that skill was not person-specific. PWM-Bench requires skill to collapse under permutation.
How do you prevent leakage from the future?
Forecasts are sealed and timestamped before outcomes occur, evaluation is strict walk-forward with no random cross-validation, and no system may access evidence dated after its forecast time.