Preprint · not peer reviewed

The paper

TargetSpace: Benchmarking Target-Specific Forecasting Under Partial Observation

Status
Pre-pilot draft (v2.4) — preprint — not peer reviewed
Version
v2.4 / PWM-0.1
Date
2026-06-09
Venue
Preprint
Version status. Pre-pilot draft (v2.4) — benchmark proposal and pre-registered research agenda; no experimental results yet. Only the personal track is implemented (synthetic). Empirical validation will be reported in a first pilot round.

Abstract

Understanding a partially observed system is a latent capability that cannot be measured directly. TargetSpace operationalizes it as target-specific forecasting: a system understands a target to the extent that it can predict what thistarget will do or become next, better than two baselines — a population prior (R1) and the target's own routine (R2). TargetSpace is a prospective, proper-scored, leakage-controlled, federated apparatus. Forecasts are sealed and timestamped before outcomes occur; they are scored with proper scoring rules and gated on calibration; and apparent skill must collapse under a permutation specificity test to count as target-specific. An evidence-tier ablation lets the field measure which kinds of evidence actually improve target-specific skill. TargetSpace is a multi-track apparatus; personal world modeling (the predecessor PWM-Bench) is the first and highest-value track, with health, energy and markets, robotics, and enterprise tracks planned or under research. This document is a pre-pilot proposal; the first empirical round is pre-registration pending.

Short summary

  • Problem.“Understanding a target” is claimed often and measured rarely. A forecast can look impressive while only reflecting population priors, base rates, or generic routine.
  • Proposal.Measure understanding as calibrated, target-specific forecasting skill, scored prospectively against a population prior (R1) and the target's own routine (R2).
  • Mechanism. Sealed forecasts, proper scoring, calibration gates, strict walk-forward evaluation, and a permutation specificity gate.
  • Contribution. The conjunction — anchored on the own-routine baseline (R2) and the permutation gate — plus an evidence-tier ablation, applied across architecture classes.
  • Status. Pre-pilot, multi-track apparatus. No results yet; only the personal track is implemented (synthetic).

Citation

Citation placeholder — author list to be finalized.

Plain

Yuri Sylvester. TargetSpace: Benchmarking Target-Specific Forecasting Under Partial Observation. Pre-pilot draft v2.4, preprint (not peer reviewed), 2026.

BibTeX

@misc{targetspace2026,
  title        = {TargetSpace: Benchmarking Target-Specific Forecasting Under Partial Observation},
  author       = {Sylvester, Yuri},
  year         = {2026},
  version      = {0.1.0},
  note         = {Pre-pilot draft v2.4. Preprint, not peer reviewed. Multi-track benchmark apparatus; personal world modeling is the first track. Synthetic data only; no empirical results.},
  howpublished = {\url{https://pwmbench.org}}
}

Important disclaimer

TargetSpace is currently a benchmark proposal and pre-registered research agenda. Only the personal track is implemented (synthetic); empirical validation will be reported in a first pilot round.