ClickHouse Release 26.4

Another month goes by, which means it’s time for another release!

The ClickHouse 26.4 release contains 39 new features 🌷 45 performance optimizations 🐇 238 bug fixes 🐝

This release sees more features become SQL compatible, faster COUNT DISTINCT, even prettier EXPLAIN, and more!

New contributors #

A special welcome to all the new contributors in 26.4! The growth of ClickHouse's community is humbling, and we are always grateful for the contributions that have made ClickHouse so popular.

Below are the names of the new contributors:

Alexander Kuleshov, Alsu, Anton Frost, Aruj Bansal, Asya, ClickGap AI Bot, Denys Melnyk, Diego Gomes Tome, Dustin Healy, Evgeny Kuzin, Farid Adam, Francisco Garcia Florez, Gagan Dhakrey, Gleb Popov, Groene AI, Ivan Mantova, Jaap Elst, Jack Knudson, James Cunningham, JingYanchao, K, Kc Balusu, Matheus Nerone, Michael Russell, MukundaKatta, Nikita Semenov, Pavel Kravtsov, Peng, RenzoMXD, Sergey Veletskiy, Takumi Hara, Timothy Kurniawan, Wenyu Chen, XiaoBinMu, Yuri Fedoseev, ashrithb, asyablue22, dwagner-decix, egor romanov, groeneai, liuguangliang, manerone, nerve-bot, simonhammes, sourcelliu, xiaobin

Hint: if you’re curious how we generate this list… here.

You can also view the slides from the presentation.

SQL compatibility: VALUES as table expression, EXTRACT, SET TIME ZONE #

The 26.4 release sees more features become compatible with standard SQL syntax. We’ll look at just a few of them, but you can see the presentation slide deck for more.

VALUES as a table expression #

Contributed by Desel72 #

First up, VALUES. Before this release, you could call it like this:


1SELECT * 
2FROM VALUES((1, 'a'), (2, 'b'), (3, 'c'));

   ┌─c1─┬─c2─┐
1. │  1 │ a  │
2. │  2 │ b  │
3. │  3 │ c  │
   └────┴────┘

Whereas now, we can also call it as a table expression, as shown below:


1SELECT * 
2FROM (VALUES (1, 'a'), (2, 'b'), (3, 'c'));

   ┌─c1─┬─c2─┐
1. │  1 │ a  │
2. │  2 │ b  │
3. │  3 │ c  │
   └────┴────┘

We can now also alias columns, which is useful when using VALUES in a join query. For example, instead of the following:


1SELECT
2    c.c2,
3    o.c2
4FROM VALUES((1, 'Alice'), (2, 'Bob')) AS c
5INNER JOIN VALUES((1, 250), (2, 100), (1, 75)) AS o 
6ON c.c1 = o.c1;

   ┌─c2────┬─o.c2─┐
1. │ Alice │  250 │
2. │ Alice │   75 │
3. │ Bob   │  100 │
   └───────┴──────┘

We can name the columns, which makes the query easier to understand:


1SELECT c.name, o.amount
2  FROM (VALUES (1, 'Alice'), (2, 'Bob')) AS c(id, name)
3  JOIN (VALUES (1, 250), (2, 100), (1, 75)) AS o(customer_id, amount)
4  ON c.id = o.customer_id;

   ┌─name──┬─amount─┐
1. │ Alice │    250 │
2. │ Alice │     75 │
3. │ Bob   │    100 │
   └───────┴────────┘

EXTRACT #

Contributed by Alexey Milovidov #

The EXTRACT operator (used when working with dates) now supports PostgreSQL-style units, as shown in the following query:


1SELECT
2  EXTRACT(EPOCH      FROM now())   AS epoch,
3  EXTRACT(DOW        FROM today()) AS dayOfWeek,
4  EXTRACT(DOY        FROM today()) AS dayOfYear,
5  EXTRACT(ISODOW     FROM today()) AS isoDOW,
6  EXTRACT(ISOYEAR    FROM today()) AS isoYear,
7  EXTRACT(WEEK       FROM today()) AS isoWeek,
8  EXTRACT(CENTURY    FROM today()) AS century,
9  EXTRACT(DECADE     FROM today()) AS decade,
10  EXTRACT(MILLENNIUM FROM today()) AS millennium;

Row 1:
──────
epoch:      1777992683
dayOfWeek:  2
dayOfYear:  125
isoDOW:     2
isoYear:    2026
isoWeek:    19
century:    21
decade:     202
millennium: 3

SET TIME ZONE #

Contributed by phulv94 #

There is also a new SQL standard alias for setting the time zone. First, let’s check my current time zone:


1SELECT timezone(), formatDateTime(now(), '%Y-%m-%d %H:%M:%S %z');

   ┌─timezone()────┬─formatDateTim⋯H:%M:%S %z')─┐
1. │ Europe/London │ 2026-05-05 15:May:47 +0100 │
   └───────────────┴────────────────────────────┘

And now, we’ll set it to be Amsterdam instead:


1SET TIME ZONE 'Europe/Amsterdam';

And if we re-run the above query:

   ┌─timezone()───────┬─formatDateTim⋯H:%M:%S %z')─┐
1. │ Europe/Amsterdam │ 2026-05-05 16:May:59 +0200 │
   └──────────────────┴────────────────────────────┘

Other compatibility improvements #

And that’s not all - there is also now support for NATURAL JOIN, OVERLAY is drop-in compatible, compound INTERVAL literals are supported, and more!

LIKE uses text index #

Contributed by Elmi Ahmadov #

From ClickHouse 25.4, when a LIKE/ILIKE query pattern is %<alpha-numeric-characters-without-spaces>% and the text index tokenizer is splitByNonAlpha, ClickHouse uses the inverted index to speed up those queries. It does this by scanning the inverted index dictionary rather than performing a full-table scan to find the matching pattern.

Let’s have a look at how this works with our trusty HackerNews dataset, first using clickhousectl to get ClickHouse 26.4 running on my laptop:


1chctl local install 26.4

And then we’ll start it up:


1chctl local server start --version 26.4

And connect using the ClickHouse client:


1chctl local client --name default -mn

Next, we’ll create our HackerNews table:


1CREATE TABLE hackernews
2(
3    `id` Int64,
4    `deleted` Int64,
5    `type` String,
6    `by` String,
7    `time` DateTime64(9),
8    `text` String,
9    `dead` Int64,
10    `parent` Int64,
11    `poll` Int64,
12    `kids` Array(Int64),
13    `url` String,
14    `score` Int64,
15    `title` String,
16    `parts` Array(Int64),
17    `descendants` Int64
18    GRANULARITY 128
19)
20ORDER BY time;

We’ll then insert the data:


1INSERT INTO hackernews 
2SELECT *
3FROM url('https://datasets-documentation.s3.eu-west-3.amazonaws.com/hackernews/hacknernews.csv.gz', 'CSVWithNames')

And next, we’ll add a text index on the text column using the splitByNonAlpha tokenizer:


1ALTER TABLE hackernews
2ADD INDEX text_tokens_idx text 
3TYPE text(tokenizer='splitByNonAlpha') 
4GRANULARITY 1;

And materialize that index:


1ALTER TABLE hackernews
2(MATERIALIZE INDEX text_tokens_idx)
3SETTINGS mutations_sync = 1;

This optimization is already enabled in 26.4, but can be controlled using the use_text_index_like_evaluation_by_dictionary_scan setting. The following query counts how many Hacker News posts mentioned Kubernetes:


1SELECT count()
2FROM hackernews
3WHERE text LIKE '%Kubernetes%'
4SETTINGS use_text_index_like_evaluation_by_dictionary_scan=0;

   ┌─count()─┐
1. │   20070 │
   └─────────┘

1 row in set. Elapsed: 0.832 sec. Processed 18.25 million rows, 6.29 GB (21.93 million rows/s., 7.56 GB/s.)
Peak memory usage: 88.18 MiB.

1 row in set. Elapsed: 0.624 sec. Processed 18.25 million rows, 6.29 GB (29.23 million rows/s., 10.08 GB/s.)
Peak memory usage: 87.93 MiB.

1 row in set. Elapsed: 0.638 sec. Processed 18.25 million rows, 6.29 GB (28.60 million rows/s., 9.86 GB/s.)
Peak memory usage: 86.01 MiB.

And now using the optimization:


1SELECT count()
2FROM hackernews
3WHERE text LIKE '%Kubernetes%'
4SETTINGS use_text_index_like_evaluation_by_dictionary_scan=1;

   ┌─count()─┐
1. │   20070 │
   └─────────┘

1 row in set. Elapsed: 0.208 sec. Processed 18.25 million rows, 18.25 MB (87.53 million rows/s., 87.53 MB/s.)
Peak memory usage: 2.07 MiB.

1 row in set. Elapsed: 0.225 sec. Processed 18.25 million rows, 18.25 MB (80.98 million rows/s., 80.98 MB/s.)
Peak memory usage: 2.07 MiB.

1 row in set. Elapsed: 0.234 sec. Processed 18.25 million rows, 18.25 MB (77.83 million rows/s., 77.83 MB/s.)
Peak memory usage: 2.07 MiB.

The number of rows processed is the same, but the query using the optimization has a best runtime of 208 milliseconds, compared to 624 milliseconds, a little over 3 times faster.

If we compare the query plans, we can see that the one using the optimization scans more than 1,000 fewer granules.

No use of inverted index:

    ┌─explain─────────────────────────────────────────────────────────┐
 1. │ Output: count()                                                 │
 2. │                                                                 │
 3. │ Aggregating                                                     │
 4. │ └──Filter ((WHERE + Change column names to column identifiers)) │
 5. │    └──ReadFromMergeTree (default.hackernews)                    │
 6. │          Indexes:                                               │
 7. │            PrimaryKey                                           │
 8. │              Condition: true                                    │
 9. │              Parts: 6/6                                         │
10. │              Granules: 3533/3533                                │
11. │            Skip                                                 │
12. │              Name: text_tokens_idx                              │
13. │              Description: text GRANULARITY 100000000            │
14. │              Condition: (mode: All; tokens: [])                 │
15. │              Parts: 6/6                                         │
16. │              Granules: 3533/3533                                │
17. │            Ranges: 6                                            │
    └─────────────────────────────────────────────────────────────────┘

Uses inverted index:

    ┌─explain──────────────────────────────────────────────┐
 1. │ Output: count()                                      │
 2. │                                                      │
 3. │ Aggregating                                          │
 4. │ └──Filter                                            │
 5. │    └──ReadFromMergeTree (default.hackernews)         │
 6. │          Indexes:                                    │
 7. │            PrimaryKey                                │
 8. │              Condition: true                         │
 9. │              Parts: 6/6                              │
10. │              Granules: 3533/3533                     │
11. │            Skip                                      │
12. │              Name: text_tokens_idx                   │
13. │              Description: text GRANULARITY 100000000 │
14. │              Condition: (mode: All; tokens: [])      │
15. │              Parts: 6/6                              │
16. │              Granules: 2247/3533                     │
17. │            Ranges: 190                               │
    └──────────────────────────────────────────────────────┘

Faster COUNT DISTINCT #

Contributed by Jiebin Sun #

There are a couple of improvements to uniqExact (used by COUNT(DISTINCT ...)) on high-core-count machines:

ClickHouse no longer spawns redundant threads during the merge phase. uniqExact uses a two-level hash table with 256 buckets, but previously, ClickHouse would spawn up to max_threads threads regardless, and many of them would have nothing to do and exit immediately.
When merging N intermediate hash tables (one per aggregation thread), the thread pool was initialized N times, causing O(N × threads) total thread spawns and severe lock contention. Now, all N hash tables are merged in a single pass - each thread processes one bucket across all hash tables at once, reducing thread pool initializations from O(N) to O(1).

In some of our benchmarks, we saw speedups of 3 to 15 times on a 288-core machine.

This, however, is very much an optimization for machines with many cores - I tried it out on the HackerNews dataset on my Mac M2 Max (which has 12 cores) and didn’t see any improvement!

Even prettier EXPLAIN #

Contributed by Kirill Kopnev #

EXPLAIN PLAN pretty=1 now prints expressions in a human-readable form, shows top-level output columns and per-step output columns, and labels JOINs with estimated row counts and locality.

Let’s see how this works with the following query:


1EXPLAIN pretty = 1
2SELECT by, count()
3FROM hackernews
4WHERE (text LIKE '%OpenAI%') AND (text LIKE '%Google%')
5GROUP BY ALL
6ORDER BY count() DESC, by
7LIMIT 10;

26.3


   ┌─explain────────────────────────────────────────────────────────────────────────────┐
1. │ Expression (Project names)                                                         │
2. │ └──Limit (preliminary LIMIT)                                                       │
3. │    └──Sorting (Sorting for ORDER BY)                                               │
4. │       └──Expression ((Before ORDER BY + Projection))                               │
5. │          └──Aggregating                                                            │
6. │             └──Expression (Before GROUP BY)                                        │
7. │                └──Expression ((WHERE + Change column names to column identifiers)) │
8. │                   └──ReadFromMergeTree (default.hackernews)                        │
   └────────────────────────────────────────────────────────────────────────────────────┘

26.4

    ┌─explain─────────────────────────────────────────────────────┐
 1. │ Output: by, count()                                         │
 2. │                                                             │
 3. │ Expression (Project names)                                  │
 4. │ └──Limit (preliminary LIMIT)                                │
 5. │    └──Sorting (Sorting for ORDER BY)                        │
 6. │       └──Expression ((Before ORDER BY + Projection))        │
 7. │          └──Aggregating                                     │
 8. │             └──Expression (Before GROUP BY)                 │
 9. │                └──Expression                                │
10. │                   └──ReadFromMergeTree (default.hackernews) │
    └─────────────────────────────────────────────────────────────┘

JSONAllValues + text index #

Contributed by Anton Popov #

ClickHouse 26.4 adds the JSONAllValues, which returns every leaf value of a JSON column as Array(String). We can create a text index on top of this, enabling more efficient filtering on JSON subcolumns.

Let’s have a look at how this works with help from the StatsBomb dataset. Let’s get a subset of the data on our machine by running the following:


1git clone --filter=blob:none --sparse https://github.com/statsbomb/open-data.git
2cd open-data
3git sparse-checkout set data/events

We’ll create the following table using clickhouse-local:


1CREATE TABLE events (
2      match_id UInt32,
3      json JSON(id String, index UInt32),
4      INDEX vals JSONAllValues(json) TYPE text(tokenizer = 'ngrams') GRANULARITY 1
5  )
6  ENGINE = MergeTree
7  ORDER BY (match_id, json.index);

And then insert the data:


1INSERT INTO events
2SELECT
3  toUInt32(replaceRegexpOne(_file, '\\.json$', '')) AS match_id,
4  json
5FROM file('open-data/data/events/*.json', JSONAsObject);

12188949 rows in set. Elapsed: 1275.404 sec. Processed 12.19 million rows, 10.48 GB (9.56 thousand rows/s., 8.22 MB/s.)
Peak memory usage: 1.87 GiB.

Just for our understanding of how the index works, let’s have a look at what the JSONAllValues function returns:


1SELECT JSONAllValues(json) FROM events LIMIT 1
2FORMAT Vertical;

JSONAllValues(json): ['[36.4,21.7]','1.013174','000000b5-8156-429d-9088-e62a6ac2ea0d','2529','[36.8,20]','60','2','4','From Throw In','10958','Chris Smalling','5','Left Center Back','123','39','Manchester United','[\'5fbbde9b-74ab-48e9-9873-ef956db384de\',\'fd43cc18-c37b-438a-8a40-a8bb50e59469\']','18','39','Manchester United','00:15:18.727','43','Carry']

The dataset has just over 12 million records, which isn’t really enough to see the impact of the index, so we’ll duplicate the data a bunch of times:


1ALTER TABLE events
2ATTACH PARTITION ID 'all'
3FROM events;

0 rows in set. Elapsed: 3.892 sec.
0 rows in set. Elapsed: 7.957 sec.
0 rows in set. Elapsed: 15.894 sec.
0 rows in set. Elapsed: 33.655 sec.
0 rows in set. Elapsed: 68.870 sec.

And now we’ve got a lot more records:


1SELECT count()
2FROM events;

   ┌───count()─┐
1. │ 390046368 │ -- 390.05 million
   └───────────┘

The following query returns the number of rows related to Lionel Messi:


1SELECT count()
2FROM events
3WHERE json.player.name = 'Lionel Andrés Messi Cuccittini'
4SETTINGS use_skip_indexes = 1;

We can disable the text index by setting use_skip_indexes = 0. Running this query gives us the following result:

   ┌─count()─┐
1. │ 4268960 │ -- 4.27 million
   └─────────┘

We’ll run it three times without the index:

1 row in set. Elapsed: 1.505 sec. Processed 390.05 million rows, 13.87 GB (259.20 million rows/s., 9.22 GB/s.)
Peak memory usage: 48.23 MiB.

1 row in set. Elapsed: 1.666 sec. Processed 390.05 million rows, 13.87 GB (234.12 million rows/s., 8.32 GB/s.)
Peak memory usage: 48.23 MiB.

1 row in set. Elapsed: 1.668 sec. Processed 390.05 million rows, 13.87 GB (233.88 million rows/s., 8.32 GB/s.)
Peak memory usage: 48.23 MiB.

And three times with the index:

1 row in set. Elapsed: 1.139 sec. Processed 80.64 million rows, 3.23 GB (70.80 million rows/s., 2.84 GB/s.)
Peak memory usage: 69.25 MiB.

1 row in set. Elapsed: 1.096 sec. Processed 80.64 million rows, 3.23 GB (73.61 million rows/s., 2.95 GB/s.)
Peak memory usage: 68.93 MiB.

1 row in set. Elapsed: 1.087 sec. Processed 80.64 million rows, 3.23 GB (74.21 million rows/s., 2.97 GB/s.)
Peak memory usage: 74.13 MiB.

From the processed rows, we can see that the index reduces the amount of data to scan by almost 5 times. The best time without the index is 1,505 milliseconds, compared to 1,087 milliseconds with the index, an improvement of around 50%.

ClickHouse Release 26.4

New contributors #

SQL compatibility: VALUES as table expression, EXTRACT, SET TIME ZONE #

VALUES as a table expression #

Contributed by Desel72 #

EXTRACT #

Contributed by Alexey Milovidov #

SET TIME ZONE #

Contributed by phulv94 #

Other compatibility improvements #

LIKE uses text index #

Contributed by Elmi Ahmadov #

Faster COUNT DISTINCT #

Contributed by Jiebin Sun #

Even prettier EXPLAIN #

Contributed by Kirill Kopnev #

JSONAllValues + text index #

Contributed by Anton Popov #

Subscribe to our newsletter

Recent posts

Introducing Postgres Query Insights in ClickHouse Cloud

Agentic analytics starts with query-ready data: the write-side cost of Snowflake vs. ClickHouse

Stop guessing in production: Full fidelity tracing at scale with ClickHouse and Odigos

Gala supercharges analytics performance with ClickHouse on AWS

ClickHouse Release 26.4

New contributors #

SQL compatibility: VALUES as table expression, EXTRACT, SET TIME ZONE #

VALUES as a table expression #

Contributed by Desel72 #

EXTRACT #

Contributed by Alexey Milovidov #

SET TIME ZONE #

Contributed by phulv94 #

Other compatibility improvements #

LIKE uses text index #

Contributed by Elmi Ahmadov #

Faster COUNT DISTINCT #

Contributed by Jiebin Sun #

Even prettier EXPLAIN #

Contributed by Kirill Kopnev #

JSONAllValues + text index #

Contributed by Anton Popov #

Get started today

Subscribe to our newsletter

Recent posts

Introducing Postgres Query Insights in ClickHouse Cloud

Agentic analytics starts with query-ready data: the write-side cost of Snowflake vs. ClickHouse

Stop guessing in production: Full fidelity tracing at scale with ClickHouse and Odigos

Gala supercharges analytics performance with ClickHouse on AWS