Governance & ethics
Privacy and human-subjects commitments
PWM-Bench observes and forecasts individuals. That makes privacy, consent, and human-subjects governance load-bearing parts of the design — not compliance text bolted on afterward.
Principles
Informed consent
Participation requires explicit, informed consent covering what is observed, how it is used, and what leaves the participant's control.
Revocable participation
Participants can withdraw at any time. Withdrawal stops observation and forecasting on that participant.
Federated execution
Evaluation runs where the data lives. Models are brought to the evidence; raw evidence is not centralized.
Raw data stays with the participant
Raw evidence remains under participant control. Only resolved outcomes and aggregate metrics leave the client.
Aggregate-only reporting
Results are reported in aggregate. Individual forecasts, raw streams, and per-participant detail are not published.
Third-party consent
Evidence frequently captures other people. Third-party consent and minimization are first-class concerns, not afterthoughts.
No manipulation during evaluation
Systems may observe and forecast, but must not intervene on the participant during the scoring window.
Institutional review
Institutional review (e.g., IRB / ethics board) is recommended before any human-subjects deployment.
High-sensitivity evidence
The upper rungs of the evidence ladder are extraordinarily sensitive. Video, audio, screens, and behavioral traces can capture:
- bystanders who have not consented
- children
- home and other addresses
- medical and financial information
- on-screen private content
Because of this, richer evidence tiers are admissible only under federated execution, strict minimization, and aggregate-only reporting. The scientific value of the evidence ladder does not override these constraints; it is bounded by them.
Observe, do not intervene
A benchmark that rewarded systems for changing the participant would measure influence, not understanding — and would create an incentive to manipulate. PWM-Bench therefore scores forecasts about a future the system did not act to shape. Any system that intervenes on the participant during the scoring window is out of protocol.