When should you use a columnar database?

A columnar database is the right choice when analytical queries aggregate over large ranges of a few columns from a wide table — SUM, AVG, COUNT, percentiles, and GROUP BY across event, log, or metric data. On these workloads, ClickHouse, Snowflake, BigQuery, or DuckDB will outperform Postgres or MySQL, typically by 10-100× on the same hardware.

TL;DR #

Use columnar when queries scan wide tables and aggregate, and ingest is append-mostly. Concurrency capacity varies by engine, so high-concurrency workloads drive the engine choice rather than rule columnar out.
Skip columnar when point lookups, per-row updates, or 10,000+ TPS sustained writes drive the workload. Postgres or MySQL will be faster and operationally simpler.

When should you use a columnar database? #

Use a columnar database when queries scan many rows but read few columns, the workload is aggregation-heavy, and ingest is append-mostly. The layout cuts I/O, compresses data 5-10×, and runs aggregations 10-100× faster than a row store on the same hardware.

Four conditions where columnar wins:

Wide tables, narrow queries. A 50-column table where typical queries select 3-10 columns. The I/O reduction is the main speedup: AWS Redshift's columnar storage docs work an example where the engine reads ~6% of bytes instead of 100% by skipping the unused columns.
Aggregation-heavy workloads. GROUP BY, SUM, COUNT, AVG, percentiles. Vectorised execution operates on 1024-4096 values per call, amortising interpreter overhead by 10-100× compared to row-at-a-time iteration.
Append-mostly ingest. Logs, metrics, events, transactions-as-history. Columnar engines are tuned for bulk INSERT paths. Heavy per-row UPDATE patterns are better served by a row store or by columnar upsert patterns like ReplacingMergeTree.
Compression matters. Columnar layouts hit 5-10× ratios routinely. ClickBench results show specific cases above 30× on low-cardinality columns. On data above 1 TB, storage savings often pay for the migration before runtime savings are counted.

A simple decision tree captures the order to evaluate these:

Query shape?
├── Point lookups by key ──► Row store (Postgres, MySQL)
└── Aggregations / scans
    └── Write shape?
        ├── Per-row updates ──► Columnar database with UPDATE support (e.g. ClickHouse)
        └── Append-mostly
            └── Data size?
                ├── < 100 GB
                │   ├── Single user, laptop / embedded ──► DuckDB or ClickHouse
                │   └── Multi-user service ──►  ClickHouse
                └── 100 GB+
                    └── Latency & concurrency profile?
                        ├── Internal BI, ad-hoc (seconds tolerable, tens of analysts)
                        │       ──► ClickHouse, Snowflake, BigQuery
                        └── User-facing / real-time (sub-second, 100s-1,000s of users)
                                ──► ClickHouse, Druid, Pinot

For a side-by-side latency and compression comparison on the same hardware, see the row vs column comparison.

When should you not use a columnar database? #

Skip a columnar database when point lookups are the main access pattern, per-row updates are frequent, the workload demands 10,000+ TPS sustained writes, or row-level transactional consistency is non-negotiable. Postgres, MySQL, or another row store will outperform columnar databases on these patterns and is operationally simpler to run.

Four conditions where columnar loses:

Point lookups are the typical query. Queries shaped like SELECT * FROM users WHERE id = 42. A column store opens 50 column files to materialise one row, while a row store walks one B-tree path. Postgres will return in microseconds.
Per-row updates are the main write pattern. Column stores are tuned for bulk writes rather than per-row mutation. ClickHouse offers standard SQL UPDATE for scattered-row changes and ReplacingMergeTree for upsert workloads, but if UPDATE volume substantially exceeds INSERT volume, the row-store path is simpler and faster. See the row-vs-column comparison for the full write-path tradeoff.
Sustained 10,000+ TPS unbatched write concurrency. OLTP write-heavy systems (ledgers, order entry, session stores) are not what columnar engines are tuned for. Postgres on commodity hardware handles single-row writes at this rate.
Row-level transactional consistency is required. Columnar OLAP engines target batch-oriented atomicity by default; OLTP-grade row-level serialisation is not their target use case. Snowflake, BigQuery, and ClickHouse are designed for analytics. If two concurrent updates to the same row must serialise with row-level locking semantics, use a row store.

What query shapes is a columnar database actually good at? #

Columnar engines win on queries that aggregate over large row ranges while reading only a handful of columns from a wide table. The recognisable shapes:

Aggregations over time windows. SELECT toStartOfHour(ts), SUM(bytes) FROM events WHERE ts > now() - INTERVAL 7 DAY GROUP BY 1. Tens of millions of rows scanned, two columns read.
GROUP BY across high-cardinality dimensions. Top-N URLs, top-N customers, conversion rates by cohort. The engine scans one or two columns per dimension and one or two per metric.
Percentile and quantile queries. P50/P95/P99 of latency or response size across a window. Vectorised execution amortises the per-value cost across thousands of values per call.
Wide-table dashboards with narrow projection. A 50-column events table where dashboards select 3-10 of them. Column pruning means the engine reads a small fraction of the bytes a row store would.
Full-table scans with selective filters. WHERE country = 'GB' AND product_id IN (...) over a billion rows. Sparse indexes plus compression skip most of the data without a B-tree per column.

The inverse pattern is what loses on columnar: read every column of one row, identified by primary key. That is what a row store's B-tree is built for. If the typical access pattern looks like SELECT * FROM users WHERE id = 42, stay on Postgres or MySQL.

What if the workload needs both OLTP and OLAP? #

Most applications need row-level reads and writes for live traffic alongside aggregation queries that scan millions of rows for analytics and reporting. No single engine handles both shapes well. Row stores fall over once queries start aggregating, and column stores are slow on single-record lookups by primary key. The right architecture pairs a specialised engine to each workload. A row store handles the live transactional path, a columnar store handles the analytical path, and CDC replication keeps them in sync.

Postgres + ClickHouse is the canonical implementation of this pattern, pairing two best-of-breed engines into a unified data stack for transactions and analytics. ClickHouse Cloud provisions the full stack out of the box — managed Postgres, managed ClickHouse, and ClickPipes CDC replication wired up by default. The pg_clickhouse extension takes it further, letting Postgres query ClickHouse directly with predicates and projections pushed down to the columnar side automatically.

Frequently asked questions

Is ClickHouse always faster than Postgres for analytics?

For analytical queries, yes. ClickHouse is typically 10-100× faster than Postgres on the same hardware. On point lookups by primary key, Postgres is typically 3-10× faster because a row store walks one index path while a column store opens many column files.

The public reproducible head-to-head is ClickBench.

How big does my data need to be before columnar pays off?

It is a question of size, query shape, concurrency, and latency. A 10 GB table hit by hundreds of concurrent aggregation queries could overwhelm Postgres. A single complex report over a multi-TB table in Postgres could take hours and hog resources. Those are cases for a columnar database. A one-off report on a 10 GB table is unlikely to trouble Postgres, and wouldn't justify the change.