此类型允许在其中存储任意类型的值,而无需预先了解或枚举所有可能的类型。
要声明一个 Dynamic 类型的列,请使用以下语法:
<column_name> Dynamic(max_types=N)
其中 N 是一个可选参数,取值范围为 0 到 254,用于指示在一个单独存储的数据块内(例如 MergeTree 表的单个数据分片中),类型为 Dynamic 的列中可以作为独立子列存储的不同数据类型的数量上限。如果超过此限制,所有具有新类型的值将以二进制形式一起存储在一个特殊的共享数据结构中。max_types 的默认值为 32。
创建 Dynamic 类型
在表的列定义中使用 Dynamic 类型:
CREATE TABLE test (d Dynamic) ENGINE = Memory;
INSERT INTO test VALUES (NULL), (42), ('Hello, World!'), ([1, 2, 3]);
SELECT d, dynamicType(d) FROM test;
┌─d─────────────┬─dynamicType(d)─┐
│ ᴺᵁᴸᴸ │ None │
│ 42 │ Int64 │
│ Hello, World! │ String │
│ [1,2,3] │ Array(Int64) │
└───────────────┴────────────────┘
在普通列上使用 CAST:
SELECT 'Hello, World!'::Dynamic AS d, dynamicType(d);
┌─d─────────────┬─dynamicType(d)─┐
│ Hello, World! │ String │
└───────────────┴────────────────┘
对 Variant 列使用 CAST:
SET use_variant_as_common_type = 1;
SELECT multiIf((number % 3) = 0, number, (number % 3) = 1, range(number + 1), NULL)::Dynamic AS d, dynamicType(d) FROM numbers(3)
┌─d─────┬─dynamicType(d)─┐
│ 0 │ UInt64 │
│ [0,1] │ Array(UInt64) │
│ ᴺᵁᴸᴸ │ None │
└───────┴────────────────┘
以子列形式读取 Dynamic 中的嵌套类型
Dynamic 类型支持通过类型名作为子列名,从 Dynamic 列中读取单个嵌套类型。
因此,如果存在一列 d Dynamic,可以使用语法 d.T 读取任意有效类型 T 的子列,
当 T 可以被 Nullable 包裹时,该子列的类型为 Nullable(T),否则为 T。该子列的大小将与原始 Dynamic 列相同,
并且在原始 Dynamic 列中不包含类型 T 的所有行里,该子列将包含 NULL 值(或者在 T 不能被 Nullable 包裹时为空值)。
也可以使用函数 dynamicElement(dynamic_column, type_name) 来读取 Dynamic 的子列。
示例:
CREATE TABLE test (d Dynamic) ENGINE = Memory;
INSERT INTO test VALUES (NULL), (42), ('Hello, World!'), ([1, 2, 3]);
SELECT d, dynamicType(d), d.String, d.Int64, d.`Array(Int64)`, d.Date, d.`Array(String)` FROM test;
┌─d─────────────┬─dynamicType(d)─┬─d.String──────┬─d.Int64─┬─d.Array(Int64)─┬─d.Date─┬─d.Array(String)─┐
│ ᴺᵁᴸᴸ │ None │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ [] │ ᴺᵁᴸᴸ │ [] │
│ 42 │ Int64 │ ᴺᵁᴸᴸ │ 42 │ [] │ ᴺᵁᴸᴸ │ [] │
│ Hello, World! │ String │ Hello, World! │ ᴺᵁᴸᴸ │ [] │ ᴺᵁᴸᴸ │ [] │
│ [1,2,3] │ Array(Int64) │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ [1,2,3] │ ᴺᵁᴸᴸ │ [] │
└───────────────┴────────────────┴───────────────┴─────────┴────────────────┴────────┴─────────────────┘
SELECT toTypeName(d.String), toTypeName(d.Int64), toTypeName(d.`Array(Int64)`), toTypeName(d.Date), toTypeName(d.`Array(String)`) FROM test LIMIT 1;
┌─toTypeName(d.String)─┬─toTypeName(d.Int64)─┬─toTypeName(d.Array(Int64))─┬─toTypeName(d.Date)─┬─toTypeName(d.Array(String))─┐
│ Nullable(String) │ Nullable(Int64) │ Array(Int64) │ Nullable(Date) │ Array(String) │
└──────────────────────┴─────────────────────┴────────────────────────────┴────────────────────┴─────────────────────────────┘
SELECT d, dynamicType(d), dynamicElement(d, 'String'), dynamicElement(d, 'Int64'), dynamicElement(d, 'Array(Int64)'), dynamicElement(d, 'Date'), dynamicElement(d, 'Array(String)') FROM test;```
┌─d─────────────┬─dynamicType(d)─┬─dynamicElement(d, 'String')─┬─dynamicElement(d, 'Int64')─┬─dynamicElement(d, 'Array(Int64)')─┬─dynamicElement(d, 'Date')─┬─dynamicElement(d, 'Array(String)')─┐
│ ᴺᵁᴸᴸ │ None │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ [] │ ᴺᵁᴸᴸ │ [] │
│ 42 │ Int64 │ ᴺᵁᴸᴸ │ 42 │ [] │ ᴺᵁᴸᴸ │ [] │
│ Hello, World! │ String │ Hello, World! │ ᴺᵁᴸᴸ │ [] │ ᴺᵁᴸᴸ │ [] │
│ [1,2,3] │ Array(Int64) │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ [1,2,3] │ ᴺᵁᴸᴸ │ [] │
└───────────────┴────────────────┴─────────────────────────────┴────────────────────────────┴───────────────────────────────────┴───────────────────────────┴────────────────────────────────────┘
To know what variant is stored in each row function dynamicType(dynamic_column) can be used. It returns String with value type name for each row (or 'None' if row is NULL).
Example:
CREATE TABLE test (d Dynamic) ENGINE = Memory;
INSERT INTO test VALUES (NULL), (42), ('Hello, World!'), ([1, 2, 3]);
SELECT dynamicType(d) FROM test;
┌─dynamicType(d)─┐
│ None │
│ Int64 │
│ String │
│ Array(Int64) │
└────────────────┘
Conversion between Dynamic column and other columns
There are 4 possible conversions that can be performed with Dynamic column.
Converting an ordinary column to a Dynamic column
SELECT 'Hello, World!'::Dynamic AS d, dynamicType(d);
┌─d─────────────┬─dynamicType(d)─┐
│ Hello, World! │ String │
└───────────────┴────────────────┘
Converting a String column to a Dynamic column through parsing
To parse Dynamic type values from a String column you can enable setting cast_string_to_dynamic_use_inference:
SET cast_string_to_dynamic_use_inference = 1;
SELECT CAST(materialize(map('key1', '42', 'key2', 'true', 'key3', '2020-01-01')), 'Map(String, Dynamic)') as map_of_dynamic, mapApply((k, v) -> (k, dynamicType(v)), map_of_dynamic) as map_of_dynamic_types;
┌─map_of_dynamic──────────────────────────────┬─map_of_dynamic_types─────────────────────────┐
│ {'key1':42,'key2':true,'key3':'2020-01-01'} │ {'key1':'Int64','key2':'Bool','key3':'Date'} │
└─────────────────────────────────────────────┴──────────────────────────────────────────────┘
Converting a Dynamic column to an ordinary column
It is possible to convert a Dynamic column to an ordinary column. In this case all nested types will be converted to a destination type:
CREATE TABLE test (d Dynamic) ENGINE = Memory;
INSERT INTO test VALUES (NULL), (42), ('42.42'), (true), ('e10');
SELECT d::Nullable(Float64) FROM test;
┌─CAST(d, 'Nullable(Float64)')─┐
│ ᴺᵁᴸᴸ │
│ 42 │
│ 42.42 │
│ 1 │
│ 0 │
└──────────────────────────────┘
Converting a Variant column to Dynamic column
CREATE TABLE test (v Variant(UInt64, String, Array(UInt64))) ENGINE = Memory;
INSERT INTO test VALUES (NULL), (42), ('String'), ([1, 2, 3]);
SELECT v::Dynamic AS d, dynamicType(d) FROM test;
┌─d───────┬─dynamicType(d)─┐
│ ᴺᵁᴸᴸ │ None │
│ 42 │ UInt64 │
│ String │ String │
│ [1,2,3] │ Array(UInt64) │
└─────────┴────────────────┘
Converting a Dynamic(max_types=N) column to another Dynamic(max_types=K)
If K >= N than during conversion the data doesn't change:
CREATE TABLE test (d Dynamic(max_types=3)) ENGINE = Memory;
INSERT INTO test VALUES (NULL), (42), (43), ('42.42'), (true);
SELECT d::Dynamic(max_types=5) as d2, dynamicType(d2) FROM test;
┌─d─────┬─dynamicType(d)─┐
│ ᴺᵁᴸᴸ │ None │
│ 42 │ Int64 │
│ 43 │ Int64 │
│ 42.42 │ String │
│ true │ Bool │
└───────┴────────────────┘
If K < N, then the values with the rarest types will be inserted into a single special subcolumn, but still will be accessible:
CREATE TABLE test (d Dynamic(max_types=4)) ENGINE = Memory;
INSERT INTO test VALUES (NULL), (42), (43), ('42.42'), (true), ([1, 2, 3]);
SELECT d, dynamicType(d), d::Dynamic(max_types=2) as d2, dynamicType(d2), isDynamicElementInSharedData(d2) FROM test;
┌─d───────┬─dynamicType(d)─┬─d2──────┬─dynamicType(d2)─┬─isDynamicElementInSharedData(d2)─┐
│ ᴺᵁᴸᴸ │ None │ ᴺᵁᴸᴸ │ None │ false │
│ 42 │ Int64 │ 42 │ Int64 │ false │
│ 43 │ Int64 │ 43 │ Int64 │ false │
│ 42.42 │ String │ 42.42 │ String │ false │
│ true │ Bool │ true │ Bool │ true │
│ [1,2,3] │ Array(Int64) │ [1,2,3] │ Array(Int64) │ true │
└─────────┴────────────────┴─────────┴─────────────────┴──────────────────────────────────┘
Functions isDynamicElementInSharedData returns true for rows that are stored in a special shared data structure inside Dynamic and as we can see, resulting column contains only 2 types that are not stored in shared data structure.
If K=0, all types will be inserted into single special subcolumn:
CREATE TABLE test (d Dynamic(max_types=4)) ENGINE = Memory;
INSERT INTO test VALUES (NULL), (42), (43), ('42.42'), (true), ([1, 2, 3]);
SELECT d, dynamicType(d), d::Dynamic(max_types=0) as d2, dynamicType(d2), isDynamicElementInSharedData(d2) FROM test;
┌─d───────┬─dynamicType(d)─┬─d2──────┬─dynamicType(d2)─┬─isDynamicElementInSharedData(d2)─┐
│ ᴺᵁᴸᴸ │ None │ ᴺᵁᴸᴸ │ None │ false │
│ 42 │ Int64 │ 42 │ Int64 │ true │
│ 43 │ Int64 │ 43 │ Int64 │ true │
│ 42.42 │ String │ 42.42 │ String │ true │
│ true │ Bool │ true │ Bool │ true │
│ [1,2,3] │ Array(Int64) │ [1,2,3] │ Array(Int64) │ true │
└─────────┴────────────────┴─────────┴─────────────────┴──────────────────────────────────┘
Reading Dynamic type from the data
All text formats (TSV, CSV, CustomSeparated, Values, JSONEachRow, etc) supports reading Dynamic type. During data parsing ClickHouse tries to infer the type of each value and use it during insertion to Dynamic column.
Example:
SELECT
d,
dynamicType(d),
dynamicElement(d, 'String') AS str,
dynamicElement(d, 'Int64') AS num,
dynamicElement(d, 'Float64') AS float,
dynamicElement(d, 'Date') AS date,
dynamicElement(d, 'Array(Int64)') AS arr
FROM format(JSONEachRow, 'd Dynamic', $$
{"d" : "Hello, World!"},
{"d" : 42},
{"d" : 42.42},
{"d" : "2020-01-01"},
{"d" : [1, 2, 3]}
$$)
┌─d─────────────┬─dynamicType(d)─┬─str───────────┬──num─┬─float─┬───────date─┬─arr─────┐
│ Hello, World! │ String │ Hello, World! │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ [] │
│ 42 │ Int64 │ ᴺᵁᴸᴸ │ 42 │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ [] │
│ 42.42 │ Float64 │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 42.42 │ ᴺᵁᴸᴸ │ [] │
│ 2020-01-01 │ Date │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 2020-01-01 │ [] │
│ [1,2,3] │ Array(Int64) │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ [1,2,3] │
└───────────────┴────────────────┴───────────────┴──────┴───────┴────────────┴─────────┘
Using Dynamic type in functions
Most of the functions support arguments with type Dynamic. In this case the function is executed separately on each internal data type stored inside Dynamic column.
When the result type of the function depends on the arguments types, the result of such function executed with Dynamic arguments will be Dynamic. When the result type of the function doesn't depend on the arguments types - the result will be Nullable(T) where T the usual result type of this function.
Examples:
CREATE TABLE test (d Dynamic) ENGINE=Memory;
INSERT INTO test VALUES (NULL), (1::Int8), (2::Int16), (3::Int32), (4::Int64);
SELECT d, dynamicType(d) FROM test;
┌─d────┬─dynamicType(d)─┐
│ ᴺᵁᴸᴸ │ None │
│ 1 │ Int8 │
│ 2 │ Int16 │
│ 3 │ Int32 │
│ 4 │ Int64 │
└──────┴────────────────┘
SELECT d, d + 1 AS res, toTypeName(res), dynamicType(res) FROM test;
┌─d────┬─res──┬─toTypeName(res)─┬─dynamicType(res)─┐
│ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ Dynamic │ None │
│ 1 │ 2 │ Dynamic │ Int16 │
│ 2 │ 3 │ Dynamic │ Int32 │
│ 3 │ 4 │ Dynamic │ Int64 │
│ 4 │ 5 │ Dynamic │ Int64 │
└──────┴──────┴─────────────────┴──────────────────┘
SELECT d, d + d AS res, toTypeName(res), dynamicType(res) FROM test;
┌─d────┬─res──┬─toTypeName(res)─┬─dynamicType(res)─┐
│ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ Dynamic │ None │
│ 1 │ 2 │ Dynamic │ Int16 │
│ 2 │ 4 │ Dynamic │ Int32 │
│ 3 │ 6 │ Dynamic │ Int64 │
│ 4 │ 8 │ Dynamic │ Int64 │
└──────┴──────┴─────────────────┴──────────────────┘
SELECT d, d < 3 AS res, toTypeName(res) FROM test;
┌─d────┬──res─┬─toTypeName(res)─┐
│ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ Nullable(UInt8) │
│ 1 │ 1 │ Nullable(UInt8) │
│ 2 │ 1 │ Nullable(UInt8) │
│ 3 │ 0 │ Nullable(UInt8) │
│ 4 │ 0 │ Nullable(UInt8) │
└──────┴──────┴─────────────────┘
SELECT d, exp2(d) AS res, toTypeName(res) FROM test;
┌─d────┬──res─┬─toTypeName(res)───┐
│ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ Nullable(Float64) │
│ 1 │ 2 │ Nullable(Float64) │
│ 2 │ 4 │ Nullable(Float64) │
│ 3 │ 8 │ Nullable(Float64) │
│ 4 │ 16 │ Nullable(Float64) │
└──────┴──────┴───────────────────┘
TRUNCATE TABLE test;
INSERT INTO test VALUES (NULL), ('str_1'), ('str_2');
SELECT d, dynamicType(d) FROM test;
┌─d─────┬─dynamicType(d)─┐
│ ᴺᵁᴸᴸ │ None │
│ str_1 │ String │
│ str_2 │ String │
└───────┴────────────────┘
SELECT d, upper(d) AS res, toTypeName(res) FROM test;
┌─d─────┬─res───┬─toTypeName(res)──┐
│ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ Nullable(String) │
│ str_1 │ STR_1 │ Nullable(String) │
│ str_2 │ STR_2 │ Nullable(String) │
└───────┴───────┴──────────────────┘
SELECT d, extract(d, '([0-3])') AS res, toTypeName(res) FROM test;
┌─d─────┬─res──┬─toTypeName(res)──┐
│ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ Nullable(String) │
│ str_1 │ 1 │ Nullable(String) │
│ str_2 │ 2 │ Nullable(String) │
└───────┴──────┴──────────────────┘
TRUNCATE TABLE test;
INSERT INTO test VALUES (NULL), ([1, 2]), ([3, 4]);
SELECT d, dynamicType(d) FROM test;
┌─d─────┬─dynamicType(d)─┐
│ ᴺᵁᴸᴸ │ None │
│ [1,2] │ Array(Int64) │
│ [3,4] │ Array(Int64) │
└───────┴────────────────┘
SELECT d, d[1] AS res, toTypeName(res), dynamicType(res) FROM test;
┌─d─────┬─res──┬─toTypeName(res)─┬─dynamicType(res)─┐
│ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ Dynamic │ None │
│ [1,2] │ 1 │ Dynamic │ Int64 │
│ [3,4] │ 3 │ Dynamic │ Int64 │
└───────┴──────┴─────────────────┴──────────────────┘
If function cannot be executed on some type inside Dynamic column, the exception will be thrown:
INSERT INTO test VALUES (42), (43), ('str_1');
SELECT d, dynamicType(d) FROM test;
┌─d─────┬─dynamicType(d)─┐
│ 42 │ Int64 │
│ 43 │ Int64 │
│ str_1 │ String │
└───────┴────────────────┘
┌─d─────┬─dynamicType(d)─┐
│ ᴺᵁᴸᴸ │ None │
│ [1,2] │ Array(Int64) │
│ [3,4] │ Array(Int64) │
└───────┴────────────────┘
SELECT d, d + 1 AS res, toTypeName(res), dynamicType(d) FROM test;
收到异常:
Code: 43. DB::Exception: 函数 plus 的参数类型不合法:Array(Int64) 和 UInt8,执行 'FUNCTION plus(__table1.d : 3, 1_UInt8 :: 1) -> plus(__table1.d, 1_UInt8) Dynamic : 0' 时。(ILLEGAL_TYPE_OF_ARGUMENT)
We can filter out unneeded types:
SELECT d, d + 1 AS res, toTypeName(res), dynamicType(res) FROM test WHERE dynamicType(d) NOT IN ('String', 'Array(Int64)', 'None')
┌─d──┬─res─┬─toTypeName(res)─┬─dynamicType(res)─┐
│ 42 │ 43 │ Dynamic │ Int64 │
│ 43 │ 44 │ Dynamic │ Int64 │
└────┴─────┴─────────────────┴──────────────────┘
Or extract required type as subcolumn:
SELECT d, d.Int64 + 1 AS res, toTypeName(res) FROM test;
┌─d─────┬──res─┬─toTypeName(res)─┐
│ 42 │ 43 │ Nullable(Int64) │
│ 43 │ 44 │ Nullable(Int64) │
│ str_1 │ ᴺᵁᴸᴸ │ Nullable(Int64) │
└───────┴──────┴─────────────────┘
┌─d─────┬──res─┬─toTypeName(res)─┐
│ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ Nullable(Int64) │
│ [1,2] │ ᴺᵁᴸᴸ │ Nullable(Int64) │
│ [3,4] │ ᴺᵁᴸᴸ │ Nullable(Int64) │
└───────┴──────┴─────────────────┘
Using Dynamic type in ORDER BY and GROUP BY
During ORDER BY and GROUP BY values of Dynamic types are compared similar to values of Variant type:
The result of operator < for values d1 with underlying type T1 and d2 with underlying type T2 of a type Dynamic is defined as follows:
- If
T1 = T2 = T, the result will be d1.T < d2.T (underlying values will be compared).
- If
T1 != T2, the result will be T1 < T2 (type names will be compared).
By default Dynamic type is not allowed in GROUP BY/ORDER BY keys, if you want to use it consider its special comparison rule and enable allow_suspicious_types_in_group_by/allow_suspicious_types_in_order_by settings.
Examples:
CREATE TABLE test (d Dynamic) ENGINE=Memory;
INSERT INTO test VALUES (42), (43), ('abc'), ('abd'), ([1, 2, 3]), ([]), (NULL);
SELECT d, dynamicType(d) FROM test;
┌─d───────┬─dynamicType(d)─┐
│ 42 │ Int64 │
│ 43 │ Int64 │
│ abc │ String │
│ abd │ String │
│ [1,2,3] │ Array(Int64) │
│ [] │ Array(Int64) │
│ ᴺᵁᴸᴸ │ None │
└─────────┴────────────────┘
SELECT d, dynamicType(d) FROM test ORDER BY d SETTINGS allow_suspicious_types_in_order_by=1;
┌─d───────┬─dynamicType(d)─┐
│ [] │ Array(Int64) │
│ [1,2,3] │ Array(Int64) │
│ 42 │ Int64 │
│ 43 │ Int64 │
│ abc │ String │
│ abd │ String │
│ ᴺᵁᴸᴸ │ None │
└─────────┴────────────────┘
Note: values of dynamic types with different numeric types are considered as different values and not compared between each other, their type names are compared instead.
Example:
CREATE TABLE test (d Dynamic) ENGINE=Memory;
INSERT INTO test VALUES (1::UInt32), (1::Int64), (100::UInt32), (100::Int64);
SELECT d, dynamicType(d) FROM test ORDER BY d SETTINGS allow_suspicious_types_in_order_by=1;
┌─v───┬─dynamicType(v)─┐
│ 1 │ Int64 │
│ 100 │ Int64 │
│ 1 │ UInt32 │
│ 100 │ UInt32 │
└─────┴────────────────┘
SELECT d, dynamicType(d) FROM test GROUP BY d SETTINGS allow_suspicious_types_in_group_by=1;
┌─d───┬─dynamicType(d)─┐
│ 1 │ Int64 │
│ 100 │ UInt32 │
│ 1 │ UInt32 │
│ 100 │ Int64 │
└─────┴────────────────┘
Note: the described comparison rule is not applied during execution of comparison functions like </>/= and others because of special work of functions with Dynamic type
Reaching the limit in number of different data types stored inside Dynamic
Dynamic data type can store only limited number of different data types as separate subcolumns. By default, this limit is 32, but you can change it in type declaration using syntax Dynamic(max_types=N) where N is between 0 and 254 (due to implementation details, it's impossible to have more than 254 different data types that can be stored as separate subcolumns inside Dynamic).
When the limit is reached, all new data types inserted to Dynamic column will be inserted into a single shared data structure that stores values with different data types in binary form.
Let's see what happens when the limit is reached in different scenarios.
Reaching the limit during data parsing
During parsing of Dynamic values from the data, when the limit is reached for current block of data, all new values will be inserted into shared data structure:
SELECT d, dynamicType(d), isDynamicElementInSharedData(d) FROM format(JSONEachRow, 'd Dynamic(max_types=3)', '
{"d" : 42}
{"d" : [1, 2, 3]}
{"d" : "Hello, World!"}
{"d" : "2020-01-01"}
{"d" : ["str1", "str2", "str3"]}
{"d" : {"a" : 1, "b" : [1, 2, 3]}}
')
┌─d──────────────────────┬─dynamicType(d)─────────────────┬─isDynamicElementInSharedData(d)─┐
│ 42 │ Int64 │ false │
│ [1,2,3] │ Array(Int64) │ false │
│ Hello, World! │ String │ false │
│ 2020-01-01 │ Date │ true │
│ ['str1','str2','str3'] │ Array(String) │ true │
│ (1,[1,2,3]) │ Tuple(a Int64, b Array(Int64)) │ true │
└────────────────────────┴────────────────────────────────┴─────────────────────────────────┘
As we can see, after inserting 3 different data types Int64, Array(Int64) and String all new types were inserted into special shared data structure.
During merges of data parts in MergeTree table engines
During merge of several data parts in MergeTree table the Dynamic column in the resulting data part can reach the limit of different data types that can be stored in separate subcolumns inside and won't be able to store all types as subcolumns from source parts.
In this case ClickHouse chooses what types will remain as separate subcolumns after merge and what types will be inserted into shared data structure. In most cases ClickHouse tries to keep the most frequent types and store the rarest types in shared data structure, but it depends on the implementation.
Let's see an example of such merge. First, let's create a table with Dynamic column, set the limit of different data types to 3 and insert values with 5 different types:
CREATE TABLE test (id UInt64, d Dynamic(max_types=3)) ENGINE=MergeTree ORDER BY id;
SYSTEM STOP MERGES test;
INSERT INTO test SELECT number, number FROM numbers(5);
INSERT INTO test SELECT number, range(number) FROM numbers(4);
INSERT INTO test SELECT number, toDate(number) FROM numbers(3);
INSERT INTO test SELECT number, map(number, number) FROM numbers(2);
INSERT INTO test SELECT number, 'str_' || toString(number) FROM numbers(1);
Each insert will create a separate data pert with Dynamic column containing single type:
SELECT count(), dynamicType(d), isDynamicElementInSharedData(d), _part FROM test GROUP BY _part, dynamicType(d), isDynamicElementInSharedData(d) ORDER BY _part, count();
┌─count()─┬─dynamicType(d)──────┬─isDynamicElementInSharedData(d)─┬─_part─────┐
│ 5 │ UInt64 │ false │ all_1_1_0 │
│ 4 │ Array(UInt64) │ false │ all_2_2_0 │
│ 3 │ Date │ false │ all_3_3_0 │
│ 2 │ Map(UInt64, UInt64) │ false │ all_4_4_0 │
│ 1 │ String │ false │ all_5_5_0 │
└─────────┴─────────────────────┴─────────────────────────────────┴───────────┘
Now, let's merge all parts into one and see what will happen:
SYSTEM START MERGES test;
OPTIMIZE TABLE test FINAL;
SELECT count(), dynamicType(d), isDynamicElementInSharedData(d), _part FROM test GROUP BY _part, dynamicType(d), isDynamicElementInSharedData(d) ORDER BY _part, count() desc;
┌─count()─┬─dynamicType(d)──────┬─isDynamicElementInSharedData(d)─┬─_part─────┐
│ 5 │ UInt64 │ false │ all_1_5_2 │
│ 4 │ Array(UInt64) │ false │ all_1_5_2 │
│ 3 │ Date │ false │ all_1_5_2 │
│ 2 │ Map(UInt64, UInt64) │ true │ all_1_5_2 │
│ 1 │ String │ true │ all_1_5_2 │
└─────────┴─────────────────────┴─────────────────────────────────┴───────────┘
As we can see, ClickHouse kept the most frequent types UInt64 and Array(UInt64) as subcolumns and inserted all other types into shared data.
All JSONExtract* functions support Dynamic type:
SELECT JSONExtract('{"a" : [1, 2, 3]}', 'a', 'Dynamic') AS dynamic, dynamicType(dynamic) AS dynamic_type;
┌─dynamic─┬─dynamic_type───────────┐
│ [1,2,3] │ Array(Nullable(Int64)) │
└─────────┴────────────────────────┘
SELECT JSONExtract('{"obj" : {"a" : 42, "b" : "Hello", "c" : [1,2,3]}}', 'obj', 'Map(String, Dynamic)') AS map_of_dynamics, mapApply((k, v) -> (k, dynamicType(v)), map_of_dynamics) AS map_of_dynamic_types
┌─map_of_dynamics──────────────────┬─map_of_dynamic_types────────────────────────────────────┐
│ {'a':42,'b':'Hello','c':[1,2,3]} │ {'a':'Int64','b':'String','c':'Array(Nullable(Int64))'} │
└──────────────────────────────────┴─────────────────────────────────────────────────────────┘
SELECT JSONExtractKeysAndValues('{"a" : 42, "b" : "Hello", "c" : [1,2,3]}', 'Dynamic') AS dynamics, arrayMap(x -> (x.1, dynamicType(x.2)), dynamics) AS dynamic_types```
┌─dynamics───────────────────────────────┬─dynamic_types─────────────────────────────────────────────────┐
│ [('a',42),('b','Hello'),('c',[1,2,3])] │ [('a','Int64'),('b','String'),('c','Array(Nullable(Int64))')] │
└────────────────────────────────────────┴───────────────────────────────────────────────────────────────┘
在 RowBinary 格式中,Dynamic 类型的值以如下格式序列化:
<binary_encoded_data_type><value_in_binary_format_according_to_the_data_type>