Brier

Brier Scoring Methodology

Methodology version: v1.1 Source of truth: distilled from docs/VENTURE-ANALYSIS.md Section 9 (Accuracy Score Framework). Where this document and Section 9 disagree, file an ADR and bump the methodology version; never silently edit. Status: specification only. The implementation lands with epic E1 (see TASKS.md) and must match this document exactly.

The framework is designed so that being vague, being lucky, and being loud all fail to pay. A naive percent-correct rewards vague, frequent, hedged bull calls in bull markets; every device below exists to remove one of those escape hatches.


1. Claim record

Each scored claim i is a tuple:

FieldMeaning
asset (A)Crypto asset, from a controlled vocabulary with an alias table
direction or targetBullish/bearish direction, or an explicit price target
horizon (T)Deadline date
stated_confidence (c)As stated, else imputed from language (table below)
p0Price at utterance
t0Utterance timestamp
specificity_classSee weights, Section 4
source pointerVideo ID + second offset

Imputed confidence conventions (published):

Languagec
"will"0.85
"likely"0.70
"could"excluded as non-falsifiable, unless paired with conditions

Default-horizon conventions (published):

Stated horizonT
"soon"30 days
"this year"December 31
none stated90 days

Conventions are part of the public methodology; consistency beats cleverness.

2. Resolution

Outcome y ∈ {0, 0.5, 1}.

2.1 Default-horizon materialisation (published table)

When an analyst does not state an explicit deadline, the horizon is computed at resolution time from the utterance date using this table:

horizon_basisEffective deadline
default_30dutterance date + 30 days
default_90dutterance date + 90 days
default_eoyDecember 31 of the utterance year
statedthe stated deadline; if absent, deferred

The materialised deadline is written back to the claim record so receipts display it consistently. Rule library entry: directional_at_horizon.v0 / target_by_deadline.v0 (same rules, applied to the computed deadline).

2.2 Conditional activation (rule: conditional_at_horizon.v0)

A conditional claim ("if BTC closes below $75 000, further decline follows") activates only when its trigger fires. The trigger is structured as:

FieldMeaning
trigger_assetAsset to watch for the trigger (own-asset only)
trigger_pricePrice level that trips the trigger
trigger_directionabove or below

The trigger observation window is the claim's own default horizon: it runs from the utterance date to the effective deadline computed in §2.1 (for a stated claim, to the stated deadline). The trigger must fire within that window. The first daily UTC close inside the window that satisfies the trigger is the activation date.

Once the trigger fires, the claim is scored over a fresh default horizon measured from the activation date — applying the §2.1 table to the activation date, not the utterance date. Because the scoring horizon is anchored on activation, a trigger that fires late in the observation window pushes the scoring window past the original observation window; that is intended.

Never-fires conventions (published):

Cross-asset triggers (trigger on a different asset than the claim) are outside MVP scope and are routed to QA for manual review.

The resolution rationale records the activation date, the trigger that fired, and the resulting horizon window for full auditability (NFR-2).

2.3 Explicit reversal (rule: reversal_close.v0, EC-11)

When an analyst explicitly reverses a prior position in a later video, the original claim is closed at the reversal date and scored to that date; the new (post-reversal) position opens as a fresh claim.

Representation: the new claim carries flags["reverses_claim_id"] pointing to the original claim's ID. The resolution engine closes the original as follows:

  1. The original claim's effective deadline is clamped to the reversal date.
  2. The applicable rule (target_by_deadline.v0, directional_at_horizon.v0, or conditional_at_horizon.v0 for a conditional original) is applied over the clamped window (score-to-date). A conditional original whose trigger has not fired by the reversal date cannot be scored-to-date; it is left for the normal conditional path (§2.2), which defers or voids it.
  3. A resolution with rule_id = "reversal_close.v0" is appended to the original claim (append-only; NFR-3 preserved).
  4. The original claim's status is set to resolved.
  5. The new claim is resolved independently via the normal pipeline.

2.4 Hedging contradictions (rule: contradiction_void.v0, EC-6)

When an analyst publishes two or more claims on the same asset with opposite directions (one bullish, one bearish) whose horizon windows overlap, all such claims are voided and a hedging flag is raised.

A horizon window is [utterance date, materialised deadline] (using the §2.1 table to compute the effective deadline for default-horizon claims). Two windows overlap when neither ends before the other begins. A claim with no computable deadline (a stated claim whose deadline was never captured) has no resolvable horizon, so overlap cannot be established — it is not voided as a hedge (it is left to the normal defer path). Such a claim never resolves and so is not a functional hedge; excluding it opens no scoring dodge.

Why this rule exists: an analyst who publishes simultaneous opposite-direction claims on the same asset over the same period cannot be wrong — whichever direction wins, one of their claims will look correct. This "both-bets dodge" must not earn scoring credit. Voiding both removes the incentive entirely.

Effect on scoring:

Three-claim case: if an analyst publishes two bullish claims and one bearish claim on the same asset with overlapping horizons, the bearish claim contradicts both bullish claims (pairwise). All three are voided. The most conservative precedent (analyst cannot cherry-pick any winner) is applied.

Precedence vs EC-11 reversal: contradiction detection runs first. A contradicted claim is voided before the reversal pre-pass runs, so it will not be reversal-closed as well.

3. Base rates: the honesty mechanism

For every claim, compute the base rate b = the empirical probability that a naive position matching the claim's direction succeeded over horizon T on that asset, using trailing 5-year history.

Example: a 30-day bullish BTC call in a trending regime can carry b ≈ 0.60. Skill is what remains after subtracting b. This single device deletes the perma-bull-in-a-bull-market illusion that destroys every naive leaderboard.

Published computation conventions (v1.1 pins, ADR-0009):

4. Weights

Specificity v:

Specificity classv
direction-only1.0
direction + magnitude1.5
explicit target + deadline2.0
conditional0.75
non-falsifiablescores nothing; counted in the falsifiability ratio (Section 5)

Difficulty d:

d = clamp( |ln(P_target / P0)| / (sigma_annual × sqrt(T_years)), 0.25, 2.0 )

Direction-only claims take d = 0.5. Bold, precise calls earn more; trivial calls earn little.

Claim weight:

w = v × d

with diminishing weight (divide by sqrt of count) for more than 3 claims per asset per week, neutralizing spam strategies.

5. Component scores

6. Composite and shrinkage

Raw composite, each component mapped to [0, 1]:

R = 0.45·norm(DS) + 0.25·C + 0.15·K + 0.15·F

Final score — Bayesian shrinkage, the IMDb-rating device:

FAS = 100 × ( n·R + k·R_prior ) / ( n + k )

with shrinkage constant k = 25 and R_prior = population median. Shrinkage prevents a 3-for-3 newcomer from topping the board.

Eligibility and the provisional flag (two tiers):

Reconciliation note: Section 9.6 of the venture analysis sets ranking eligibility at n ≥ 20; the Section 9.8 worked example flags Analyst B (n = 24) provisional until n ≥ 30. The two-tier reading above is the only one consistent with both texts and is the binding convention for implementation and tests.

Published computation conventions (v1.0 pins)

The formula above leaves certain details unspecified. The following pins are approved by the project owner (2026-06-12) and recorded in ADR-0002. They are definitional clarifications of unspecified details — not formula changes — so the methodology version remains v1.0.

  1. norm(DS): clamp((DS + 0.25) / 0.5, 0, 1). Zero skill maps to 0.5. A DS of −0.25 maps to 0; a DS of +0.25 maps to 1.

  2. Consistency K: Chronological rolling 10-claim windows (stride 1) over the analyst's resolved claims; each window computes its weighted DS; K is then clamp(1 − stdev(window DS values) / 0.25, 0, 1) using population standard deviation. Analysts with fewer than 2 windows (n < 11) receive K = 0.5 (neutral — not enough data to judge consistency).

  3. R_prior: Median of pre-shrinkage R across all analysts scored in the current score run (any n ≥ 1). If fewer than 3 analysts are in the run, R_prior = 0.5 (fallback to avoid a biased estimate from a tiny sample).

  4. direction_magnitude claims: Implied P_target = P0 × (1 ± magnitude_pct / 100), sign positive for bullish and negative for bearish, then fed into the deadline difficulty formula. sigma_annual is the standard deviation of the trailing 365 daily log-returns at t0, annualized by sqrt(365).

  5. Spam damping: Claims are grouped per (analyst, asset, ISO week). If a group has m > 3 claims, every claim in that group takes w / sqrt(m).

  6. Falsifiability F: The numerator is resolved-and-scored claims; the denominator is all extracted prediction-like statements (including non-falsifiable and void). Example: Analyst A with 60 scored claims out of 240 total prediction-like statements → F = 60 / 240 = 25%.

  7. Zero-confidence analysts: If an analyst has zero claims with a recorded stated_confidence, C = 0. No calibration evidence earns no calibration credit.

7. Anti-gaming inventory

  1. All-claims coverage: we extract everything; no self-submission, no cherry-picking.
  2. Deletion persistence: claims survive source deletion and the deletion itself is flagged publicly ("the tape does not forget").
  3. Contradiction detection: opposite-direction claims on the same asset with overlapping horizons void both and raise a hedging flag.
  4. Base-rate correction (Section 3) kills regime-riding.
  5. Brier penalty (Section 5) kills confidence inflation.
  6. Frequency damping (Section 4) kills spray-and-pray.
  7. Shrinkage + minimum n (Section 6) kills small-sample flukes.
  8. Versioned methodology with public changelog; full-history recomputation on every version; no silent retro-edits.

8. Worked examples (binding test cases)

These two examples are the acceptance fixture for the scoring engine (task E1-T2). The unit tests must encode both, including the ordering inversion: B outranks A despite a lower raw hit rate.

Analyst A, "Hype Caller"

Analyst B, "Precision Caller"

Lower raw accuracy, far higher score: the system is doing its job, and explaining exactly this example publicly is the methodology marketing.


Changelog

VersionDateChange
v1.02026-06-11Initial distillation from venture analysis Section 9. Two-tier provisional convention recorded.
v1.02026-06-12Convention pins recorded (ADR-0002): norm(DS), K windows, R_prior, direction_magnitude d, sigma_annual, spam damping, falsifiability denominator, zero-confidence C. No formula change; version remains v1.0.
v1.02026-06-15Full resolution rule library (E4-T1): §2.1 default-horizon materialisation table (30d/90d/eoy); §2.2 conditional activation — trigger observation window = the claim's own default horizon, scoring horizon anchored on the activation date, never-fires conventions (defer vs void), cross-asset out-of-scope note; §2.3 explicit reversal (EC-11) close-at-reversal-date convention (incl. conditional originals). rule_ids: conditional_at_horizon.v0, conditional_void.v0, reversal_close.v0. No formula change; version remains v1.0.
v1.02026-06-15Contradiction detection (E4-T4, EC-6): §2.4 hedging contradictions — opposite-direction claims on the same asset with overlapping horizons void both (not scored, not a miss); hedging flag recorded in claims.flags for auditability. rule_id: contradiction_void.v0. No formula change; version remains v1.0.
v1.12026-06-15Base rates from trailing history (E4-T2, FR-303, ADR-0009): §3 pins replace E1 fixture placeholders with real empirical base rates. Rolling T-day windows in trailing 5-year history; minimum 20 windows required, else 0.5 neutral prior. CoinGecko stdlib-REST (no SDK, ADR-0005 precedent) as the composite source; FakePriceSource stays CI/demo path. CCXT cross-check as an ADR-gated seam. PriceSource threaded into _load_resolved_claims / _execute_scoring_pass / run_score_pass / recompute_all. Full-history recompute required (FR-304, AC-4).