What features can we engineer from Resolved Markets orderbook data?

The full bid/ask depth enables dozens of microstructure features: bid-ask spread evolution, depth concentration ratios, order book imbalance (total_bid_quantity vs total_ask_quantity), volume-weighted midpoint shifts, time-to-best-execution, depth clustering entropy, and inter-arrival times between large orders. With millisecond timestamps, you can calculate volatility measures at sub-second timescales. These features capture market sentiment and conviction far better than price-only inputs.

Can we use historical snapshots for backtesting prediction models?

Yes, our full historical archive of 11.4M+ snapshots enables authentic backtesting. You can train models on snapshots from period A, validate on period B, and backtest on period C with zero look-ahead bias. Each snapshot includes the exact timestamp and full orderbook state, enabling realistic simulation of your model's performance. Export snapshots in JSON or Parquet format for efficient processing in your training pipeline.

How do we handle missing data or gaps in the snapshot stream?

Our capture process is continuous at 20Hz for crypto and variable intervals for other categories. Gaps occur only during platform maintenance (announced in advance). We provide metadata with each snapshot indicating the time since the last capture, enabling you to detect and interpolate over gaps. For production models, our WebSocket API guarantees delivery of every update; client-side buffering prevents data loss due to network transients.

Can we build models predicting Polymarket price movements before crypto spot markets move?

Yes, this is a primary use case. Polymarket prediction contracts for BTC and ETH price direction often reprices minutes before spot price changes, as sophisticated traders discover new information. Train models on orderbook features from prediction markets to predict subsequent spot price direction. The unified API makes it simple to correlate prediction market orderbook evolution with spot price candles from any exchange, enabling cross-market alpha research.

What's the best way to handle the scale of 11.4M+ snapshots in training pipelines?

Export snapshots to Parquet format for efficient storage and query. Our API supports time-range and market-range filtering to limit export scope. Use distributed computing frameworks (Spark, Dask, Ray) to parallelize feature engineering across snapshot partitions. For live training, subscribe to WebSocket streams for specific markets rather than querying entire historical datasets. This hybrid approach—historical exports for model development, streaming for live updates—optimizes both training speed and inference latency.

How big is the dataset behind 14-Column ClickHouse Schema?

11.4M+ snapshots across 100+ markets and 7 categories. Each snapshot includes full bid/ask arrays with millisecond timestamps — enough for deep learning and statistical modeling.

Can data scientists access live 14-Column ClickHouse Schema for inference?

Yes. WebSocket streaming pushes sub-second updates for real-time inference. The MCP server exposes 14-Column ClickHouse Schema as function calls for AI agents.

How do data scientists prepare 14-Column ClickHouse Schema for ML?

14-Column ClickHouse Schema ships as a 14-column ClickHouse-optimized schema with bid prices, ask prices, depth at each level, market identifiers, and millisecond timestamps. It maps directly into pandas for feature engineering.

Is 14-Column ClickHouse Schema compatible with Apache Iceberg or Delta Lake?

Yes. Bulk Parquet exports of 14-Column ClickHouse Schema drop directly into Iceberg or Delta tables for time-travel queries and ACID semantics.

Can I use 14-Column ClickHouse Schema with dbt?

Yes. Most teams build dbt models that consume 14-Column ClickHouse Schema via the ClickHouse connector and derive downstream features (spread, depth imbalance, mid-price velocity).

Tick-Level Orderbook Data: 14-Column ClickHouse

Overview

14-Column ClickHouse Schema for Data Scientists

Data scientists leverage Resolved Markets to build predictive models using 11.4M+ orderbook snapshots from Polymarket across crypto, sports, economics, and weather categories. The platform provides raw bid/ask depth arrays with millisecond timestamps—ideal for feature engineering, time-series analysis, and market microstructure modeling. With continuous 20Hz capture rates for crypto markets and comprehensive coverage of 100+ prediction markets, data scientists can train models on real market behavior patterns, sentiment evolution, and price discovery mechanisms. The unified API and historical data storage enable reproducible research, backtesting frameworks, and deployment of models via WebSocket streaming for live predictions.

Data scientists building ML models on prediction markets need 14-Column ClickHouse Schema. Resolved Markets ships the 14-column ClickHouse orderbook schema as a 14-column ClickHouse schema, with bid/ask arrays, depth values, and millisecond timestamps optimized for feature engineering on best_bid, best_ask, mid_price, spread, bids[], asks[].

⚡

Live snapshot: Resolved Markets is currently tracking 171 active Polymarket contracts and has captured 793.2M orderbook snapshots. Latest update: 2026-05-09 03:14:12.061.

Challenges We Solve

Data challenges Data Scientists run into

14-Column ClickHouse Schema from Resolved Markets is built around the data gaps Data Scientists hit when they try to work with raw Polymarket feeds.

01

Fragmented data sources requiring extensive ETL and normalization

Building prediction models requires consolidating data from sports betting APIs, crypto exchanges, economics calendars, and weather databases. Each source has different schemas, timestamps, data quality standards, and update frequencies. Data scientists waste weeks building ETL pipelines just to get consistent data for model training. Resolved Markets eliminates this integration burden by providing all four categories through a single, normalized API with consistent timestamp precision and schema.

02

Insufficient orderbook depth granularity for sophisticated microstructure models

Most market data providers deliver only OHLCV candles—open, high, low, close, volume. This completely discards orderbook microstructure where the signal lives. Sophisticated traders and algorithms exploit bid/ask spreads, depth clustering, and order book imbalances minutes before price moves. Resolved Markets provides full depth arrays showing every bid and ask level, enabling feature engineering on fundamental market structure rather than derived price metrics.

03

Limited historical data windows for training robust prediction models

Historical prediction market data is nearly impossible to acquire at scale. Most platforms don't archive snapshots, leaving data scientists with limited training windows of days or weeks. Resolved Markets maintains 11.4M+ snapshots across 100+ markets with millisecond precision. This depth enables training time-series models on diverse market regimes, economic cycles, election outcomes, and sports season progressions—impossible with limited data.

04

High operational overhead managing real-time data pipelines

Real-time data pipelines are operationally complex: maintaining WebSocket connections, handling reconnection logic, buffering, deduplication, and writing to analytical databases. Building this infrastructure takes months and requires dedicated engineering. Resolved Markets abstracts this complexity through simple API endpoints and WebSocket subscriptions, letting data scientists focus on modeling rather than infrastructure.

Why This Data

Built for quantitative work on 14-Column ClickHouse Schema

Orderbook-level prediction-market data that doesn't exist anywhere else.

01

Millisecond-precision timestamps enable accurate microstructure feature engineering

Every orderbook update is timestamped to the millisecond, enabling precise sequence analysis and event-driven modeling. You can engineer features like 'time_to_next_large_buy_order', 'depth_concentration_ratio', and 'spread_evolution_velocity'—metrics that predict price moves seconds or minutes ahead. These ultra-precise timestamps turn raw orderbook data into predictive signals for microsecond-scale market efficiency models.

02

11.4M+ snapshots provide deep historical windows for robust model training

The 11.4M+ snapshot archive spans months of continuous Polymarket evolution. Your models can train on diverse market conditions: pre-election volatility (prediction markets repricing as new polls emerge), FOMC uncertainty (hourly probability shifts as economic data releases), sports event outcomes (live match developments changing contract prices), and crypto volatility (correlation with macro sentiment). This breadth prevents overfitting to narrow market regimes.

03

Full bid/ask depth enables advanced market structure analysis impossible with price data alone

Orderbook depth reveals market participant composition and conviction. When large bids appear at favorable odds, orders are building conviction. When depth clusters at certain levels, smart money is defending support/resistance. When spreads widen dramatically, information asymmetry is high. Resolved Markets' full depth arrays let you engineer these structural features directly, rather than inferring them from price changes that may have already occurred.

04

Unified API across 4 market categories enables cross-domain transfer learning

Training a single-category model (just crypto, or just sports) limits generalization. Resolved Markets' unified API across crypto, sports, economics, and weather enables transfer learning: patterns in how BTC price predictions reprices ahead of US macro data might apply to EPL match predictions. Cross-category feature spaces create richer representations, improving model robustness when deploying to new markets.

Research Applications

✓ Spread analysis and market making simulation

✓ Liquidity depth profiling across categories

✓ Implied probability vs realized outcomes

✓ Market microstructure and order flow analysis

✓ Weather derivative research across 44 cities

✓ Cross-category correlation studies

Use Cases

How Data Scientists use 14-Column ClickHouse Schema

1

Build cross-market correlation dashboards from 14-Column ClickHouse Schema

2

Construct feature stores from 14-Column ClickHouse Schema for reusable ML pipelines

3

Set up Apache Iceberg tables on top of 14-Column ClickHouse Schema for time-travel queries

4

Use 14-Column ClickHouse Schema as the source for a Materialize-based real-time analytics view

5

Bulk-load 14-Column ClickHouse Schema into ClickHouse for analytical queries with sub-second latency

What We Track

Seven categories, hundreds of markets

Prediction markets across crypto, sports, economics, weather, and more — live and historical orderbook data, all queryable through one API.

16 markets

Crypto

BTC, ETH, SOL, XRP — up/down markets every 5m to 1d.

18 markets

Equities

S&P 500 (SPX) daily open — up or down predictions.

71 markets

Social

Elon Musk tweet counts — weekly prediction ranges.

64 markets

Sports

NBA, NFL, EPL — game outcomes and season predictions.

12 markets

Economics

Fed decisions, jobs reports — FOMC meetings and macro data.

78 markets

Weather

44 cities daily — temperature, hurricanes, Arctic ice.

4 pairs

Hyperliquid

BTC, ETH, SOL, XRP perp orderbooks — 1/sec sampling.

Sample Data

Tick-level orderbook snapshots

Every snapshot includes full bid/ask depth, mid prices, spreads, and crypto spot price.

polymarket.snapshots_hf 793.2M rows

Side	Bid	Size	Ask	Size	Spread
UP	0.5400	1,240	0.5500	1,100	1.00%
UP	0.5300	980	0.5600	1,450	3.00%
UP	0.5200	1,560	0.5700	890	5.00%
UP	0.5100	2,100	0.5800	2,300	7.00%
UP	0.5000	1,800	0.5900	1,700	9.00%
UP	0.4900	3,200	0.6000	3,100	11.00%

Schema 14 columns

cryptoLowCardinality(String)BTC

timeframeLowCardinality(String)5m

token_sideEnum8('UP','DOWN')UP

timestampDateTime64(3)2026-05-09 03:14:12.061

crypto_priceFloat64$80,471.01

best_bidFloat640.5400

best_askFloat640.5500

mid_priceFloat640.5450

spreadFloat640.0100

bidsArray(Tuple(F64,F64))[(0.54,1240),...]

asksArray(Tuple(F64,F64))[(0.55,1100),...]

Data Coverage

Comprehensive market coverage

Prediction markets across multiple categories, captured continuously with high-frequency precision.

7

14-Column ClickHouse Schema ships with

Historical orderbook snapshot data with full depth arrays and millisecond timestamps

Structured data exports in multiple formats (JSON, Parquet, CSV via API)

Time-series features: spread evolution, depth changes, volume concentration

Multi-category market data (crypto, sports, economics, weather) for cross-domain models

WebSocket API for model inference deployment and live probability predictions

MCP integration enabling AI agents to access prediction market intelligence

Research Applications

What Data Scientists build with 14-Column ClickHouse Schema

Cross-category clustering to identify market regime changes

AutoML pipelines for rapid hypothesis testing across 14-Column ClickHouse Schema

Time-travel queries via Iceberg/Delta on historical 14-Column ClickHouse Schema

Real-time materialized views via Materialize or RisingWave

Lakehouse architectures with 14-Column ClickHouse Schema as a primary source

Get Started

Up and running in minutes

Three steps from signup to live 14-Column ClickHouse Schema in your application.

1

Get Your API Key

Generate a free API key instantly. No credit card. Just click and go.

Sign Up Free

2

Explore the API

Browse 11 endpoints with live examples. Test requests directly from the docs.

API Reference

3

Start Building

Integrate live 14-Column ClickHouse Schema into your research pipeline, trading bot, or analytics platform.

fetch('/v1/markets/live', { headers: { 'X-API-Key': key } })

1

Get a free API key at resolvedmarkets.com

2

Explore the schema: curl -H 'X-API-Key: rm_xxx' 'https://api.resolvedmarkets.com/api/snapshot?crypto=BTC&timeframe=1h&includebook=true'

3

Import into pandas: pd.json_normalize() on the response

4

Bulk download: rm-api download --crypto BTC --days 30 --format csv

5

Engineer features: spread, depth imbalance, mid velocity, order count

Integration

Wiring 14-Column ClickHouse Schema into your workflow

Data scientists integrate 14-Column ClickHouse Schema via REST for exploratory work in Jupyter, bulk CSV exports for training pipelines, and WebSocket streaming for inference. The 14-column ClickHouse schema maps directly to pandas DataFrames.

Native ClickHouse JDBC/ODBC connector
Snowflake Snowpipe ingest for streaming 14-Column ClickHouse Schema
AWS Glue catalog integration for 14-Column ClickHouse Schema Parquet files

Key Advantages

Why Data Scientists pick 14-Column ClickHouse Schema

11.4M+ millisecond-timestamped snapshots provide unprecedented depth for training time-series prediction models across market regimes
Full bid/ask depth arrays enable microstructure-based feature engineering impossible with aggregated price data
Unified API across crypto, sports, economics, and weather enables transfer learning and cross-domain model development
WebSocket streaming API enables seamless deployment of trained models into production for live market probability predictions

Why This Data

Why 14-Column ClickHouse Schema matters

14-Column ClickHouse Schema matters for data science because it's structured. Most prediction-market data needs hours of cleanup; 14-Column ClickHouse Schema ships as a schema-aligned dataset with DateTime64(3) timestamps with full bid/ask arrays ready for ML pipelines on best_bid, best_ask, mid_price, spread, bids[], asks[].

Landscape

14-Column ClickHouse Schema in context

ML pipelines on prediction markets used to fight raw exchange data. 14-Column ClickHouse Schema from Resolved Markets removes that friction: schema, timestamps, and bid/ask arrays are already aligned for ingestion into pandas, ClickHouse, or any modern feature store.

FAQ

Frequently asked: 14-Column ClickHouse Schema for Data Scientists

What features can we engineer from Resolved Markets orderbook data?

The full bid/ask depth enables dozens of microstructure features: bid-ask spread evolution, depth concentration ratios, order book imbalance (total_bid_quantity vs total_ask_quantity), volume-weighted midpoint shifts, time-to-best-execution, depth clustering entropy, and inter-arrival times between large orders. With millisecond timestamps, you can calculate volatility measures at sub-second timescales. These features capture market sentiment and conviction far better than price-only inputs.
Can we use historical snapshots for backtesting prediction models?

Yes, our full historical archive of 11.4M+ snapshots enables authentic backtesting. You can train models on snapshots from period A, validate on period B, and backtest on period C with zero look-ahead bias. Each snapshot includes the exact timestamp and full orderbook state, enabling realistic simulation of your model's performance. Export snapshots in JSON or Parquet format for efficient processing in your training pipeline.
How do we handle missing data or gaps in the snapshot stream?

Our capture process is continuous at 20Hz for crypto and variable intervals for other categories. Gaps occur only during platform maintenance (announced in advance). We provide metadata with each snapshot indicating the time since the last capture, enabling you to detect and interpolate over gaps. For production models, our WebSocket API guarantees delivery of every update; client-side buffering prevents data loss due to network transients.
Can we build models predicting Polymarket price movements before crypto spot markets move?

Yes, this is a primary use case. Polymarket prediction contracts for BTC and ETH price direction often reprices minutes before spot price changes, as sophisticated traders discover new information. Train models on orderbook features from prediction markets to predict subsequent spot price direction. The unified API makes it simple to correlate prediction market orderbook evolution with spot price candles from any exchange, enabling cross-market alpha research.
What's the best way to handle the scale of 11.4M+ snapshots in training pipelines?

Export snapshots to Parquet format for efficient storage and query. Our API supports time-range and market-range filtering to limit export scope. Use distributed computing frameworks (Spark, Dask, Ray) to parallelize feature engineering across snapshot partitions. For live training, subscribe to WebSocket streams for specific markets rather than querying entire historical datasets. This hybrid approach—historical exports for model development, streaming for live updates—optimizes both training speed and inference latency.
How big is the dataset behind 14-Column ClickHouse Schema?

11.4M+ snapshots across 100+ markets and 7 categories. Each snapshot includes full bid/ask arrays with millisecond timestamps — enough for deep learning and statistical modeling.
Can data scientists access live 14-Column ClickHouse Schema for inference?

Yes. WebSocket streaming pushes sub-second updates for real-time inference. The MCP server exposes 14-Column ClickHouse Schema as function calls for AI agents.
How do data scientists prepare 14-Column ClickHouse Schema for ML?

14-Column ClickHouse Schema ships as a 14-column ClickHouse-optimized schema with bid prices, ask prices, depth at each level, market identifiers, and millisecond timestamps. It maps directly into pandas for feature engineering.
Is 14-Column ClickHouse Schema compatible with Apache Iceberg or Delta Lake?

Yes. Bulk Parquet exports of 14-Column ClickHouse Schema drop directly into Iceberg or Delta tables for time-travel queries and ACID semantics.
Can I use 14-Column ClickHouse Schema with dbt?

Yes. Most teams build dbt models that consume 14-Column ClickHouse Schema via the ClickHouse connector and derive downstream features (spread, depth imbalance, mid-price velocity).

Explore More

Data Scientists: Build Models on 14-Column ClickHouse Schema

14-Column ClickHouse Schema for Data Scientists

Data challenges Data Scientists run into

Fragmented data sources requiring extensive ETL and normalization

Insufficient orderbook depth granularity for sophisticated microstructure models

Limited historical data windows for training robust prediction models

High operational overhead managing real-time data pipelines

Built for quantitative work on 14-Column ClickHouse Schema

Millisecond-precision timestamps enable accurate microstructure feature engineering

11.4M+ snapshots provide deep historical windows for robust model training

Full bid/ask depth enables advanced market structure analysis impossible with price data alone

Unified API across 4 market categories enables cross-domain transfer learning

How Data Scientists use 14-Column ClickHouse Schema

Seven categories, hundreds of markets

Crypto

Equities

Social

Sports

Economics

Weather

Hyperliquid

Tick-level orderbook snapshots

Comprehensive market coverage

14-Column ClickHouse Schema ships with

What Data Scientists build with 14-Column ClickHouse Schema

Up and running in minutes

Get Your API Key

Explore the API

Start Building

Wiring 14-Column ClickHouse Schema into your workflow

Why Data Scientists pick 14-Column ClickHouse Schema

Why 14-Column ClickHouse Schema matters

14-Column ClickHouse Schema in context

Frequently asked: 14-Column ClickHouse Schema for Data Scientists

Related orderbook datasets

Ready to access 14-Column ClickHouse Schema?