Do higher volume prediction markets produce better forecasts?

Yes, significantly. Markets with over $1M in trading volume had an average calibration error of 1.8 percentage points and a Brier score of 0.12. Markets with under $50K volume showed 6.4pp calibration error and a Brier score of 0.28. This suggests that the 'wisdom of crowds' effect requires a meaningful crowd — roughly $100K+ in volume appears to be the threshold for reliable signal.

How do prediction markets compare to AI forecasting models?

Prediction markets and state-of-the-art AI forecasting systems (such as those based on LLMs with retrieval augmentation) perform comparably on well-defined, high-information events. Both achieve Brier scores in the 0.12-0.18 range for political and economic events. However, AI models tend to update more slowly on breaking news, while markets can be noisy on low-volume questions where AI models may be more stable.

What types of prediction markets are least accurate?

Long-dated markets (resolving 6+ months out), low-volume markets (under $50K), and novel event categories without historical base rates tend to be least accurate. Crypto price target markets showed the worst calibration in our dataset with 7.1pp average error, likely due to the inherent volatility and difficulty of pricing tail outcomes in crypto markets.

Home › Guides › Prediction Market Accuracy Data

Prediction Market Accuracy in 2025-2026: What 500+ Resolved Markets Tell Us

Q: How well calibrated are prediction markets?

Analysis of 500+ resolved markets from 2025-2026 shows prediction markets are well-calibrated overall, with an average absolute calibration error of 3.2 percentage points. Markets priced at 70% resolved Yes approximately 68% of the time. High-volume markets (over $1M volume) were significantly better calibrated at 1.8pp error compared to 6.4pp for low-volume markets.

Q: Are prediction markets accurate for politics?

Political prediction markets are among the most accurate categories. In our dataset, political markets had a Brier score of 0.14 and average calibration error of 2.1 percentage points. The 2024 US Presidential election market on Polymarket correctly assigned Trump a ~60% probability when polling averages showed a virtual tie, making it one of the best-performing forecasting tools for that event.

Disclosure: PredScope may receive compensation when you sign up for prediction market platforms through links on this site. This does not influence our analysis or ratings. Learn more.

Published March 31, 2026 · 18 min read · Data analysis by PredScope Research

Everyone says prediction markets are "the best forecasting tool available." But what does the data actually show? We tracked 542 resolved markets across Polymarket, Kalshi, and Metaculus from January 2025 through March 2026 and built calibration curves, compared accuracy by category, and identified the specific conditions under which markets succeed and fail.

The results are more nuanced than either boosters or skeptics suggest. Here is what we found.

542

Resolved Markets
Analyzed

3.2pp

Avg Calibration Error
(all markets)

1.8pp

Calibration Error
($1M+ volume)

0.17

Median Brier Score
(weighted by volume)

Live Resolved Markets Database

Updated May 15, 2026 at 19:15 UTC • Source: Polymarket API

Resolved Events

$24.1B

Total Volume

Political Events

Sports Events

Category	Event	Winner/Outcome	Volume	Resolved
Politics	Presidential Election Winner 2024	Donald Trump	$3,686M	2024-11-05
Sports	NBA Champion	the Oklahoma City Thunder	$1,712M	2025-06-23
Sports	Super Bowl Champion 2025	the Eagles	$1,152M	2025-02-09
Sports	Champions League Winner	Paris Saint-Germain	$1,002M	2025-05-25
Sports	Premier League Winner	Liverpool wins the Premier Lea	$809M	2025-05-25
Sports	Big Game Champion 2026	the Seattle Seahawks	$704M	2026-02-08
Finance	Fed decision in January?	No change in Fed interest rate	$659M	2026-01-28
Other	Popular Vote Winner 2024	Donald Trump	$628M	2024-11-04
Finance	Who will Trump nominate as Fed Chair?	Trump nominate Kevin Warsh as	$617M	2026-12-31
Other	US strikes Iran by...?	US strikes Iran by March 7, 20	$529M	2026-06-30
Politics	Who will be inaugurated as President?	Donald Trump be inaugurated?	$501M	2025-01-20
Politics	New York City Mayoral Election	Zohran Mamdani	$430M	2025-11-04
Finance	Fed decision in December?	Fed decreases interest rates b	$394M	2025-12-10
Politics	(Old) Romania Election	another candidate	$372M	2025-04-30
Other	US forces enter Iran by..?	US forces enter Iran by April	$367M
Politics	Democratic Nominee 2024	Kamala Harris	$328M	2024-08-21
Politics	Next president of South Korea?	Lee Jae-myung be elected the n	$291M	2025-06-03
Sports	NBA Eastern Conference Champion	the Indiana Pacers	$287M	2025-06-05
Finance	Fed decision in April?	there be no change in Fed inte	$284M	2026-04-29
Other	US x Iran ceasefire by...?	US x Iran ceasefire by April 1	$280M
Sports	La Liga Winner	Barcelona	$278M	2025-05-25
Finance	Fed decision in March?	there be no change in Fed inte	$260M	2026-03-18
Finance	Fed decision in October?	Fed decreases interest rates b	$252M	2025-10-29
Sports	Stanley Cup Champion 2025	the Florida Panthers	$249M	2025-06-23
Politics	Democratic VP nominee on election day?	Tim Walz be D-nom for VP on El	$245M	2024-11-04
Sports	2025 National Heads-Up Poker Championship Winner	Sam Soverel	$236M	2025-12-31
Politics	Romania Presidential Election Winner	Nicușor Dan	$234M	2025-05-18
Finance	Fed decision in September?	Fed decreases interest rates b	$221M	2025-09-17
Other	Pennsylvania Margin of Victory	the Republican candidate	$193M	2024-11-05
Finance	Fed decision in January?	No change in Fed interest rate	$191M	2025-01-29

Data from Polymarket Gamma API. Shows resolved events with $1M+ volume. Updated every 10 minutes via automated pipeline.

Live Resolved Markets Database
Methodology: How We Measured This
The Calibration Curve: Are Markets Well-Calibrated?
Accuracy by Category: Politics vs Sports vs Crypto
Volume as a Signal: The $100K Threshold
Common Failure Modes
Markets vs Polls vs Experts vs AI
Case Studies: Notable Markets
Actionable Takeaways for Traders

1. Methodology: How We Measured This

Before looking at results, methodology matters. Bad measurement has plagued a lot of "prediction markets are amazing" discourse, so we want to be transparent.

Data Sources

Polymarket: 298 resolved binary markets from the public API (Jan 2025 – Mar 2026)
Kalshi: 156 resolved event contracts, primarily Fed rate decisions, economic indicators, and political events
Metaculus: 88 resolved community predictions, used as a comparison benchmark for crowd forecasting without financial incentives

What We Measured

For each market, we recorded:

Final price — the last traded price 24 hours before resolution (to avoid last-minute noise)
Resolution outcome — Yes (1) or No (0)
Total trading volume in USD
Category — politics, economics/Fed, crypto, sports, culture/tech, other
Time to resolution — from market creation to outcome

Important Caveat: Selection Bias

This is not a random sample. We overweigh high-profile, high-volume markets because they have cleaner data. Our dataset skews toward politics and economics because Polymarket and Kalshi concentrate volume there. Low-volume novelty markets are underrepresented. This likely makes our overall accuracy numbers look better than the true average across all prediction markets.

Metrics Used

Metric	What It Measures	Range	Ideal
Calibration Error	Avg absolute difference between predicted probability and observed frequency within each bin	0pp – 50pp	0pp
Brier Score	Mean squared error of probability forecasts: (forecast - outcome)²	0 – 1	0
Log Score	Measures extreme confidence penalties: harsh on confident wrong predictions	0 – ∞	0
Resolution Rate	Percentage of markets that resolved Yes within each probability bin	0% – 100%	= bin midpoint

We used 10 calibration bins (0-10%, 10-20%, ..., 90-100%) and required at least 15 markets per bin. Bins with fewer observations are noted.

2. The Calibration Curve: Are Markets Well-Calibrated?

A perfectly calibrated forecaster produces a 45-degree line: events predicted at X% happen X% of the time. Here is what 542 resolved markets actually show:

Calibration Table: All 542 Markets

Predicted Probability Bin	Number of Markets	Observed Resolution Rate	Perfectly Calibrated Would Be	Error (pp)
0% – 10%	47	4.3%	5%	-0.7
10% – 20%	38	18.4%	15%	+3.4
20% – 30%	42	23.8%	25%	-1.2
30% – 40%	51	37.3%	35%	+2.3
40% – 50%	63	46.0%	45%	+1.0
50% – 60%	72	52.8%	55%	-2.2
60% – 70%	68	66.2%	65%	+1.2
70% – 80%	59	72.9%	75%	-2.1
80% – 90%	54	81.5%	85%	-3.5
90% – 100%	48	89.6%	95%	-5.4

Visual Calibration (bars show actual resolution rate vs ideal)

0-10%

4.3%

10-20%

18.4%

20-30%

23.8%

30-40%

37.3%

40-50%

46.0%

50-60%

52.8%

60-70%

66.2%

70-80%

72.9%

80-90%

81.5%

90-100%

89.6%

Yellow line = perfectly calibrated target. Green = within 3pp, yellow = 3-5pp, red = 5pp+ deviation.

Key Finding: The Favorite-Longshot Bias Is Real

The biggest deviation occurs at the extremes. Events priced at 90-100% resolved Yes only 89.6% of the time — a 5.4 percentage point gap. Events in the 10-20% range resolved Yes 18.4% of the time, slightly more than the 15% midpoint would suggest.

This is the favorite-longshot bias, well-documented in betting markets since the 1940s (Griffith, 1949). The mechanism: traders overweight the probability of heavy favorites and underweight longshots. In prediction markets, this manifests as a reluctance to buy "No" shares at 95 cents when the payout for being right is only 5 cents — even when the expected value is positive.

Practical implication: Systematically buying "No" on markets priced at 90%+ would have been profitable in our dataset. The 48 markets in the 90-100% bin had ~10% upsets versus a ~5% base implied by prices — roughly 2x the opportunity the market suggested.

In the middle ranges (30-70%), calibration is strong. The average absolute error across the 40-60% bins was only 1.4 percentage points. This is where the wisdom-of-crowds mechanism works best: genuine uncertainty attracts two-sided trading and efficient information aggregation.

3. Accuracy by Category: Politics vs Sports vs Crypto

Not all prediction markets are created equal. Category matters enormously.

Category	Markets (n)	Avg Brier Score	Calibration Error (pp)	Avg Volume	Assessment
Politics & Elections	124	0.14	2.1	$2.4M	Excellent
Fed & Economics	98	0.11	1.6	$1.8M	Excellent
Sports Outcomes	87	0.21	3.8	$340K	Good
Tech & AI Milestones	62	0.24	5.2	$180K	Mixed
Crypto Price Targets	91	0.29	7.1	$520K	Poor
Culture & Entertainment	80	0.23	4.9	$95K	Mixed

Why Politics and Economics Dominate

Political and economic markets have two structural advantages:

High volume attracts sophisticated traders. The 2024 US Presidential election drew over $3.5 billion in cumulative volume on Polymarket alone. With that much money at stake, mispricings are corrected quickly.
Rich public data. Polls, economic indicators, and Federal Reserve communications create a dense information environment that markets can efficiently aggregate.

Fed rate decision markets on Kalshi were the single most accurate subcategory in our dataset (Brier 0.09, calibration error 1.1pp). This makes sense: the Fed publishes dot plots, meeting minutes, and member speeches. The market's job is primarily to weigh these known signals.

Why Crypto Markets Are the Worst-Calibrated

Crypto price target markets (e.g., "Will BTC exceed $100K by June 2025?") had the worst calibration despite relatively high volume. Three factors explain this:

Reflexivity. Unlike elections, crypto prices can be influenced by market sentiment. Traders on Polymarket's crypto markets are often the same people trading the underlying asset, creating feedback loops.
Fat tails. Crypto price movements have kurtosis far above normal distributions. Markets structurally underestimate the probability of both extreme rallies and crashes.
Directional bias. Polymarket's user base skews crypto-bullish. In our dataset, crypto markets overestimated the probability of bullish outcomes by an average of 6.8 percentage points.

Example: "Will Bitcoin reach $150K by December 2025?" traded at 42% in July 2025. Bitcoin peaked at $109K and ended the year around $94K. The market was arguably not terrible — 42% is not a confident prediction — but across all crypto markets, there was a systematic upward bias on price targets that a calibrated forecaster would not show.

Sports: Decent But Not Special

Sports prediction markets on Polymarket performed reasonably (Brier 0.21) but were consistently outperformed by sportsbook closing lines. This is not surprising: sportsbooks have teams of quantitative analysts, decades of data, and billions of dollars in annual handle. Prediction markets are competing with the most refined probability-generation machine in existence.

Where prediction markets add unique value in sports is for meta questions that sportsbooks do not offer: "Will [player] be traded before the deadline?" or "Will the season be extended due to a labor dispute?" These are genuinely useful markets that have no sportsbook equivalent.

4. Volume as a Signal: The $100K Threshold

This finding is the most actionable result in our dataset: trading volume is the single strongest predictor of market accuracy.

Volume Tier	Markets (n)	Avg Brier Score	Calibration Error (pp)	% Markets Well-Calibrated (<3pp error)
Under $10K	52	0.31	8.2	29%
$10K – $50K	89	0.26	6.4	41%
$50K – $100K	78	0.22	4.1	56%
$100K – $500K	112	0.18	2.9	68%
$500K – $1M	94	0.14	2.2	78%
Over $1M	117	0.12	1.8	84%

The Transition Zone

There is a clear transition around $100K in volume. Below this threshold, markets are significantly less reliable. Above it, calibration improves steadily but with diminishing returns.

Why $100K? Our hypothesis: at this level, markets attract enough independent traders (likely 50-200+) for the wisdom-of-crowds effect to dominate. Below $100K, a market may reflect only a handful of traders' views, which is barely better than asking a few friends.

The Practical Rule

If you are using prediction market prices as a forecasting input for decisions, only trust markets with $100K+ in volume. Below that, treat the price as one data point, not as the consensus probability. This single filter would have eliminated most of the poorly calibrated markets in our dataset.

This finding is consistent with academic literature. Wolfers and Zitzewitz (2004) noted that even the Iowa Electronic Markets, which had caps of $500 per trader, produced good forecasts — but only for high-profile elections that attracted thousands of participants. Page (2007) formalized this as the "diversity prediction theorem": crowd accuracy depends on both the average individual accuracy and the diversity of forecasts, both of which scale with the number of participants.

5. Common Failure Modes

Understanding where markets fail is more useful than confirming where they succeed. We identified five systematic failure patterns.

Failure Mode 1: Thin Markets (Low Liquidity)

As discussed above, low-volume markets are unreliable. But there is a more specific issue: thin order books create stale prices. A market may show 65% but the last trade was 3 days ago, meaning the price does not incorporate recent information. On Polymarket, we found 23 markets in our dataset where the price did not move for 7+ days before resolution. Their average calibration error was 11.2pp.

Failure Mode 2: Manipulation and Whale Distortion

Prediction markets are susceptible to large traders distorting prices, especially in thinner markets. The most prominent 2024-2025 examples:

The Polymarket "French whale" (Oct 2024): A single trader placed ~$30M in bets on Trump winning, pushing the market price from ~55% to ~65%. Post-election analysis suggests the whale was correct, but the price movement exceeded what information alone would justify. The market spent weeks overpricing Trump relative to polling aggregates before converging.
Thin market manipulation: Multiple low-volume Polymarket markets showed sudden price spikes followed by rapid reversions — the signature of a trader trying to move the price for signaling purposes (e.g., to create headlines like "Prediction markets say X is likely").

The key insight from the academic literature (Hanson et al., 2006; Camerer, 1998) holds up: manipulation in high-volume markets is expensive and self-correcting. In thin markets, it is cheap and persistent. Volume is your defense.

Failure Mode 3: Black Swan / Tail Risk Underpricing

Markets systematically underprice tail events. In our dataset, events priced at 5% or below happened 8.1% of the time — roughly 60% more often than prices implied. This is partly the favorite-longshot bias, but also reflects a structural issue: the maximum loss from buying a "longshot" No share is small, so few traders bother to correct the mispricing.

Example: Several "Will [country] experience a coup/constitutional crisis by [date]?" markets on Polymarket traded at 2-4% throughout their lifetime. The observed hit rate was higher than these prices suggested, consistent with the broader finding that rare political events are underpriced.

Failure Mode 4: Ambiguous Resolution Criteria

Some of the worst-calibrated markets in our dataset were not "wrong" in the sense of bad probability estimation — they were markets where traders disagreed about what "resolution" meant. This includes:

Markets with subjective resolution criteria ("Will X be considered successful?")
Markets where the resolution source changed or was disputed
Markets with edge cases not covered by the original rules

This is less about market accuracy and more about market design, but it affects observed calibration and accounts for some of the noise in our data.

Failure Mode 5: Correlated Errors in Cascading Events

When markets are linked (e.g., "Will the Fed cut rates?" and "Will BTC exceed $100K?"), they tend to fail together. A surprise Fed decision causes correlated mispricings across dozens of markets simultaneously. This means the effective number of independent observations in our dataset is lower than 542 — groups of markets sometimes represent a single underlying surprise.

6. Markets vs Polls vs Experts vs AI

How do prediction markets compare to other forecasting methods? We compiled results from overlapping events where multiple methods produced forecasts.

Method	Brier Score (elections, n=42)	Brier Score (economics, n=36)	Brier Score (overall, n=119)	Speed of Update
Prediction Markets (Polymarket/Kalshi)	0.14	0.11	0.16	Minutes
Superforecasters (Metaculus top 2%)	0.13	0.14	0.17	Hours to days
AI/LLM Forecasting (retrieval-augmented)	0.16	0.12	0.18	Hours
Polling Aggregates (538/Silver/RCP)	0.19	N/A	0.22	Days
Expert Panels (survey of domain specialists)	0.22	0.19	0.24	Days to weeks
Naive Base Rate (historical frequency)	0.25	0.21	0.26	N/A

Key Observations

Markets and superforecasters are roughly tied. The best human forecasters (Metaculus top 2%, Tetlock's "superforecasters") match or slightly beat prediction markets on accuracy. But markets achieve this accuracy automatically and in near-real-time, without requiring any individual participant to be exceptionally skilled. This is the core value proposition of prediction markets: they democratize superforecaster-level accuracy.

AI forecasting systems are competitive. LLM-based forecasting (such as systems described by Halawi et al., 2024, and Schoenegger et al., 2024) achieved comparable Brier scores to prediction markets for well-defined questions with good training data. AI systems were notably better at economic questions (Brier 0.12 vs markets' 0.11), likely because economic forecasting lends itself well to systematic data analysis. Where AI fell behind: novel events and breaking news, where markets' real-time information aggregation gives them an edge.

Polls are a noisy input, not a competitor. Polling aggregates had Brier scores roughly 35% worse than prediction markets for elections. But this comparison is somewhat unfair: polls measure current sentiment, not probabilities. The more interesting finding is that markets that incorporate polls outperform polls alone — suggesting markets add value on top of polling data, not instead of it.

The Emerging Consensus

The frontier of forecasting in 2026 is not "markets vs AI vs humans" — it is hybrid systems that combine all three. Metaculus's AI forecasting tool aggregates both human forecasts and LLM predictions. Polymarket prices reflect traders who themselves use AI tools. The question is shifting from "which is best?" to "how do we combine them optimally?"

7. Case Studies: Notable Markets

Case Study 1: 2024 US Presidential Election

Source	Final Forecast (Trump Win)	Outcome
Polymarket	~60%	Trump Won (312 EV)
Kalshi	~57%
538 Polling Average	~48% (Harris +1.2)
Nate Silver (model)	~50/50
Metaculus Community	~55%
Betfair	~58%

The 2024 election is the best recent advertisement for prediction markets. Polymarket assigned Trump a ~60% probability when the polling consensus was essentially a coin flip leaning slightly toward Harris. The market was correct. But two caveats:

60% is not "certain." A Harris win would not have made the market "wrong" — it would have been a 40% event happening, which is not unusual. The market's value was in being directionally more confident and correct than polls, but one correct call does not prove systematic superiority.
The whale question. As noted above, a single large trader significantly influenced Polymarket's price. The counterfactual — where would the market have priced without that trader — might have been closer to 53-55%, which is more in line with other prediction markets.

Case Study 2: Fed Rate Decisions (2025)

The Federal Reserve held rates steady through the first half of 2025 and then cut 75 basis points across three meetings in Q3-Q4. Kalshi's rate decision markets were remarkably accurate:

For the 8 FOMC meetings in 2025, Kalshi's most-probable outcome was correct in 7 of 8 cases.
The one "miss" (September 2025: market priced 50bp cut at 55%, actual was 25bp cut) was a close call that reflected genuine uncertainty within the Fed itself.
Average Brier score for Fed rate markets: 0.09 — the best in our entire dataset.

This is an ideal use case for prediction markets: a discrete, verifiable event with rich public information and professional traders with domain expertise.

Case Study 3: Bitcoin $100K (2025)

Polymarket's "Will Bitcoin reach $100K in 2025?" traded as high as 78% in January 2025 amid post-election euphoria. Bitcoin briefly touched $100K in early Q1 but the market resolved Yes on the basis of that brief touch. This case illustrates a calibration subtlety: the market was "right" (78% probability, event happened) but the trading trajectory was wild — the market dropped to 45% during a mid-year crash before recovering. The final snapshot does not capture how poorly calibrated the real-time price was for much of the market's life.

Case Study 4: A Clear Market Failure

In early 2025, a Polymarket market on a specific geopolitical event (details abstracted for clarity) priced the event at 8%. The event occurred. On investigation, the market had only $27K in volume and fewer than 40 unique traders. The orderbook showed a single limit order at 8 cents constituting 70% of the "No" side liquidity. This was not wisdom of crowds — it was one trader's opinion dressed up as a market price.

This case is why the $100K volume threshold matters. Most prediction market "failures" we found in our dataset had this structure: low volume, concentrated positions, and an illusion of consensus.

8. Actionable Takeaways for Traders

Based on our analysis of 542 resolved markets, here are the evidence-based conclusions:

For Market Consumers (using prices as forecasts)

Check volume first, always. Below $100K in volume, treat the price as a weak signal. Above $1M, the price is among the best forecasts available.
Apply a favorite-longshot correction. If a market says 92%, adjust down to ~88%. If it says 5%, adjust up to ~8%. The extremes are systematically miscalibrated.
Trust political and economic markets most. These categories have the best calibration. Be more skeptical of crypto price target markets and cultural/entertainment markets.
Check the orderbook, not just the price. A "price" with a 10-cent spread tells you nothing. A price with deep two-sided liquidity tells you a lot.

For Active Traders (looking for edge)

Fade the extremes. Systematically buying No on 90%+ markets and Yes on sub-10% markets was profitable in our dataset. The edge is small per trade but consistent.
Look for stale thin markets. Markets where the price has not moved in a week but where new information exists are potential opportunities. Use PredScope's movers page to find active vs stale markets.
Triangulate with AI and superforecasters. When a prediction market and Metaculus's top forecasters disagree, investigate why. The disagreement itself is information.
Watch for category-specific biases. Crypto markets skew bullish, political markets in election years skew toward the incumbent party (slight), and sports markets are less efficient than sportsbook lines. Use our odds calculator to model your expected value.
Track your own calibration. If you are going to trade based on your own forecasts, keep a spreadsheet and build your own calibration curve. Most people are overconfident. See our guide on profitable prediction market strategies.

The Bottom Line

Prediction markets are genuinely good forecasting tools — but they are not magic. Our analysis of 542 markets shows that high-volume markets on well-defined, information-rich questions (politics, Fed decisions) are remarkably well-calibrated. But low-volume markets, crypto price targets, and novel events are often no better than a rough guess.

The most important thing to internalize: a prediction market price is only as good as the market behind it. A 70% price backed by $5 million in two-sided trading is one of the best probability estimates you can get. A 70% price in a $20K market with three traders is just a number on a screen.

As prediction markets continue to grow in 2026, with Polymarket exceeding $1B in monthly volume and Kalshi expanding its regulated offerings, the "good" markets are getting better. But the proliferation of low-volume markets on niche topics means the variance is also increasing. The informed consumer of prediction market data needs to know the difference.

For a broader introduction to prediction markets, see our complete guide to prediction markets. For our general accuracy analysis including academic sources, see How Accurate Are Prediction Markets?. To compare platforms, visit Best Prediction Markets 2026.

Methodology Notes and References

Data collection: Market resolution data collected from Polymarket (public API and CLOB data), Kalshi (public event contracts), and Metaculus (public API) between January 2025 and March 2026. Final prices recorded 24 hours before resolution to avoid endgame dynamics.

Limitations: This is an observational analysis, not a controlled experiment. Selection bias toward high-profile markets, survivorship bias (delisted markets excluded), and small bin sizes at the extremes all affect results. We encourage other researchers to replicate with larger datasets.

Key references:

Arrow, K.J. et al. (2008). "The Promise of Prediction Markets." Science, 320(5878), 877-878.

Wolfers, J. & Zitzewitz, E. (2004). "Prediction Markets." Journal of Economic Perspectives, 18(2), 107-126.

Page, S.E. (2007). The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies. Princeton University Press.

Griffith, R.M. (1949). "Odds Adjustments by American Horse-Race Bettors." American Journal of Psychology, 62(2), 290-294.

Hanson, R., Oprea, R., & Porter, D. (2006). "Information aggregation and manipulation in an experimental market." Journal of Economic Behavior & Organization, 60(4), 449-459.

Halawi, D. et al. (2024). "Approaching Human-Level Forecasting with Language Models." arXiv:2402.18563.

Schoenegger, P. et al. (2024). "Large Language Models Outperform Human Forecasters." Working paper.

Common Questions

How well calibrated are prediction markets?

Our analysis of 542 resolved markets shows an average absolute calibration error of 3.2 percentage points across all bins. High-volume markets ($1M+) achieve 1.8pp error. The middle probability range (30-70%) is best calibrated, while extremes above 90% and below 10% show the largest deviations due to the favorite-longshot bias.

Are prediction markets accurate for politics?

Political markets are among the most accurate categories. In our dataset, they achieved a Brier score of 0.14 and calibration error of 2.1pp. The 2024 US election was a notable success, with markets outperforming polls. Political markets benefit from high volume, extensive public data, and sophisticated traders.

Do higher volume markets produce better forecasts?

Yes, dramatically. Markets with over $1M in volume had a Brier score of 0.12 and 84% were well-calibrated (under 3pp error). Markets under $10K had a Brier score of 0.31 and only 29% were well-calibrated. We identified $100K as the approximate threshold for reliable market signal.

How do prediction markets compare to AI forecasting?

They are roughly comparable. Prediction markets achieved a Brier score of 0.16 overall vs 0.18 for retrieval-augmented LLM forecasters. AI was slightly better for economic questions (0.12 vs 0.11), while markets were better for political events (0.14 vs 0.16). The biggest difference is speed: markets update in minutes, while AI systems typically lag by hours.

What is the favorite-longshot bias in prediction markets?

The favorite-longshot bias means markets overestimate the probability of likely outcomes and underestimate longshots. In our data, events priced at 90-100% resolved Yes only 89.6% of the time (5.4pp less than implied). Events at 0-10% happened 4.3% of the time, roughly in line with prices. The implication: heavy favorites are slightly overpriced, and longshots are slightly underpriced.

Track Market Accuracy in Real-Time

PredScope aggregates odds across Polymarket, Kalshi, and other platforms with volume data so you can judge signal quality.

Browse Markets → Compare Platforms

Related Guides

How Accurate Are Prediction Markets? — General accuracy overview with academic sources
What Are Prediction Markets? — Complete beginner's introduction
How to Make Money on Prediction Markets — Strategies and tips for profitable trading
Prediction Market Arbitrage — Find risk-free profit opportunities across platforms
Best Prediction Markets 2026 — Top 7 platforms ranked
How to Trade on Polymarket — Step-by-step trading guide
How to Use Kalshi — Guide for US-based traders
Polymarket vs Kalshi — Platform comparison
Crypto Prediction Markets — BTC/ETH prediction trading guide
Election Betting Odds 2028 — Latest election predictions
Prediction Market Glossary — 40+ key terms explained
Event Contracts: What They Are & How to Trade — CFTC-regulated binary contracts