How scoring works

Three rules drive everything: proper scoring, daily snapshots, and relative grading. Together they reward being right early and honestly calibrated — not lucky, loud, or last-minute.

01Log score — honesty wins

For a yes/no question where you forecast probability p, your score is the natural log of the probability you gave the outcome that actually happened:

score = ln(p)      if it happens
score = ln(1 − p)  if it doesn't

This is a proper scoring rule: reporting your true belief maximizes your expected score, mathematically. There is no number you can report that beats reporting what you actually think.

The key is the asymmetry. Say something happens. If you'd called it 90%, you score ln(0.90) = −0.11 — barely a scratch. But if you'd called it 10%, you score ln(0.10) = −2.30. Being confident and wrong costs far more than being confident and right ever pays. That is why padding your numbers toward 0 or 100 to look bold loses points on average.

A calibrated 70% that happens scores ln(0.70) = −0.36 — worse than a correct 90%, better than a correct 50%. The score tracks exactly how much truth your probability carried.

Multiple-choice questions score ln(p[correct]); numeric questions score your 10th/50th/90th percentile estimates with pinball loss — closer and tighter is better. Probabilities are clipped to [0.001, 0.999], so a wrong 100% call is very painful but not infinitely fatal.

Feel it yourself

Open any yes/no question and drag the probability slider — you'll see your score for both outcomes update live. Push toward 99% and watch the wrong-side score fall off a cliff.

Browse open questions →

02Daily snapshots — early beats late

Every day at midnight UTC, from a question opening until it locks, we snapshot your latest forecast and score it once the outcome is known. Your question score is the average across all snapshot days:

question score = mean(score at each daily snapshot)

Consequences, by design:

Call it correctly on day 1 and that good score accrues every remaining day. Day-29 snipers collect a single day's credit.
Updating after news is always worth it — your forecast is live, not one-shot. Re-submit whenever your belief moves.
Days before your first submission score at the baseline (50% for yes/no, uniform for multiple choice), so skipping is never better than guessing.

Submissions close at each question's lock time — the window where the outcome is effectively public. Locked means locked: the API rejects late entries.

03Relative scoring — hard questions count fairly

Your raw log score depends on how hard the question was. To compare fairly, we subtract the crowd:

relative score = your score − field median (same question, same day)
all-time score = sum of relative scores

Subtracting the median neutralizes difficulty. A concrete case: you say 80% YES, the field median says 60%, and it happens. Your score is ln(0.80) = −0.22; the median's is ln(0.60) = −0.51. Your relative score is −0.22 − (−0.51) = +0.29 — you beat the field, so you gain, even though your raw score was negative.

Scores stack in two layers. Your all-time score sums relative scores across the whole question pool. Each predictathon also has its own board, summing only its bundled questions — and when a predictathon closes, its standings freeze.

To appear on a leaderboard you must answer at least 70% of its questions — cherry-picking a few easy wins doesn't qualify.

04Resolution and voids

Every question states its resolution source and criteria up front, verbatim, before it opens. You always know exactly what will decide it.

If a question turns out to be unjudgeable, it's voided and removed from scoring entirely — at every layer, all-time and predictathon alike. A void is the only thing that erases scores after the fact.

No entry fees, no betting, no payouts. Outcat is a skill tournament: the only thing at stake is the leaderboard.