Syntax
-- core commands
BACKUP | RESTORE [ASYNC]
--- what to backup/restore (or exclude)
TABLE [db.]table_name [AS [db.]table_name_in_backup] |
DICTIONARY [db.]dictionary_name [AS [db.]name_in_backup] |
DATABASE database_name [AS database_name_in_backup] |
TEMPORARY TABLE table_name [AS table_name_in_backup] |
VIEW view_name [AS view_name_in_backup] |
[EXCEPT TABLES ...] |
ALL [EXCEPT {TABLES|DATABASES}...] } [,...]
---
[ON CLUSTER 'cluster_name']
--- where to backup or restore to or from
TO|FROM
File('<path>/<filename>') |
Disk('<disk_name>', '<path>/') |
S3('<S3 endpoint>/<path>', '<Access key ID>', '<Secret access key>', '<extra_credentials>') |
AzureBlobStorage('<connection string>/<url>', '<container>', '<path>', '<account name>', '<account key>')
--- additional settings
[SETTINGS ...]
See "command summary" for more details
of each command.
In the examples below you will see the backup destination specified as Disk('backups', '1.zip').
To use the Disk backup engine it is necessary to first add a file specifying
the backup destination at the path below:
/etc/clickhouse-server/config.d/backup_disk.xml
For example, the configuration below defines a disk named backups and then adds that disk to
the allowed_disk list of backups:
<clickhouse>
<storage_configuration>
<disks>
<!--highlight-next-line -->
<backups>
<type>local</type>
<path>/backups/</path>
</backups>
</disks>
</storage_configuration>
<!--highlight-start -->
<backups>
<allowed_disk>backups</allowed_disk>
<allowed_path>/backups/</allowed_path>
</backups>
<!--highlight-end -->
</clickhouse>
Configure a backup destination for S3 disk
It is also possible to BACKUP/RESTORE to S3 by configuring an S3 disk in the
ClickHouse storage configuration. Configure the disk like this by adding a file to
/etc/clickhouse-server/config.d as was done above for the local disk.
<clickhouse>
<storage_configuration>
<disks>
<s3_plain>
<type>s3_plain</type>
<endpoint></endpoint>
<access_key_id></access_key_id>
<secret_access_key></secret_access_key>
</s3_plain>
</disks>
<policies>
<s3>
<volumes>
<main>
<disk>s3_plain</disk>
</main>
</volumes>
</s3>
</policies>
</storage_configuration>
<backups>
<allowed_disk>s3_plain</allowed_disk>
</backups>
</clickhouse>
BACKUP/RESTORE for S3 disk is done in the same way as for local disk:
BACKUP TABLE data TO Disk('s3_plain', 'cloud_backup');
RESTORE TABLE data AS data_restored FROM Disk('s3_plain', 'cloud_backup');
Note
- This disk should not be used for
MergeTree itself, only for BACKUP/RESTORE
- If your tables are backed by S3 storage and the types of the disks are different,
it doesn't use
CopyObject calls to copy parts to the destination bucket, instead,
it downloads and uploads them, which is very inefficient. In this case prefer using
the BACKUP ... TO S3(<endpoint>) syntax for this use-case.
Usage examples of backup/restore to local disk
Backup and restore a table
Run the following commands below to create the test database and table we will be
making a backup and restoration of in this example:
Setup commands
Create the database and table:
CREATE DATABASE test_db;
CREATE TABLE test_db.test_table (
id UUID,
name String,
email String,
age UInt8,
salary UInt32,
created_at DateTime,
is_active UInt8,
department String,
score Float32,
country String
) ENGINE = MergeTree()
ORDER BY id;
Preprocess and one thousand rows of random data:
INSERT INTO test_table (id, name, email, age, salary, created_at, is_active, department, score, country)
SELECT
generateUUIDv4() as id,
concat('User_', toString(rand() % 10000)) as name,
concat('user', toString(rand() % 10000), '@example.com') as email,
18 + (rand() % 65) as age,
30000 + (rand() % 100000) as salary,
now() - toIntervalSecond(rand() % 31536000) as created_at,
rand() % 2 as is_active,
arrayElement(['Engineering', 'Marketing', 'Sales', 'HR', 'Finance', 'Operations'], (rand() % 6) + 1) as department,
rand() / 4294967295.0 * 100 as score,
arrayElement(['USA', 'UK', 'Germany', 'France', 'Canada', 'Australia', 'Japan', 'Brazil'], (rand() % 8) + 1) as country
FROM numbers(1000);
Next you will need to create a file specifying the backup destination at the
path below:
/etc/clickhouse-server/config.d/backup_disk.xml
<clickhouse>
<storage_configuration>
<disks>
<backups>
<type>local</type>
<path>/backups/</path> -- for MacOS choose: /Users/backups/
</backups>
</disks>
</storage_configuration>
<backups>
<allowed_disk>backups</allowed_disk>
<allowed_path>/backups/</allowed_path> -- for MacOS choose: /Users/backups/
</backups>
</clickhouse>
Note
If clickhouse-server is running you will need to restart it for the changes to
take effect.
To backup the table you can run:
BACKUP TABLE test_db.test_table TO Disk('backups', '1.zip')
┌─id───────────────────────────────────┬─status─────────┐
1. │ 065a8baf-9db7-4393-9c3f-ba04d1e76bcd │ BACKUP_CREATED │
└──────────────────────────────────────┴────────────────┘
The table can be restored from the backup using the following command if the table is empty:
RESTORE TABLE test_db.test_table FROM Disk('backups', '1.zip')
┌─id───────────────────────────────────┬─status───┐
1. │ f29c753f-a7f2-4118-898e-0e4600cd2797 │ RESTORED │
└──────────────────────────────────────┴──────────┘
Note
The above RESTORE would fail if the table test.table contains data.
The setting allow_non_empty_tables=true allows RESTORE TABLE to insert data
into non-empty tables. This will mix earlier data in the table with the data extracted from the backup.
This setting can therefore cause data duplication in the table, and should be used with caution.
To restore the table with data already in it, run:
RESTORE TABLE test_db.table_table FROM Disk('backups', '1.zip')
SETTINGS allow_non_empty_tables=true
Tables can be restored, or backed up, with new names:
RESTORE TABLE test_db.table_table AS test_db.test_table_renamed FROM Disk('backups', '1.zip')
The backup archive for this backup has the following structure:
├── .backup
└── metadata
└── test_db
└── test_table.sql
Formats other than zip can be used. See "Backups as tar archives"
below for further details.
Incremental backups to disk
A base backup in ClickHouse is the initial, full backup from which the following
incremental backups are created. Incremental backups only store the changes
made since the base backup, so the base backup must be kept available to
restore from any incremental backup. The base backup destination can be set with setting
base_backup.
Note
Incremental backups depend on the base backup. The base backup must be kept available
to be able to restore from an incremental backup.
To make an incremental backup of a table, first make a base backup:
BACKUP TABLE test_db.test_table TO Disk('backups', 'd.zip')
BACKUP TABLE test_db.test_table TO Disk('backups', 'incremental-a.zip')
SETTINGS base_backup = Disk('backups', 'd.zip')
All data from the incremental backup and the base backup can be restored into a
new table test_db.test_table2 with command:
RESTORE TABLE test_db.test_table AS test_db.test_table2
FROM Disk('backups', 'incremental-a.zip');
Securing a backup
Backups written to disk can have a password applied to the file.
The password can be specified using the password setting:
BACKUP TABLE test_db.test_table
TO Disk('backups', 'password-protected.zip')
SETTINGS password='qwerty'
To restore a password-protected backup, the password must again
be specified using the password setting:
RESTORE TABLE test_db.test_table
FROM Disk('backups', 'password-protected.zip')
SETTINGS password='qwerty'
Backups as tar archives
Backups can be stored not only as zip archives, but also as tar archives.
The functionality is the same as for zip, except that password protection is not
supported for tar archives. Additionally, tar archives support a variety of
compression methods.
To make a backup of a table as a tar:
BACKUP TABLE test_db.test_table TO Disk('backups', '1.tar')
to restore from a tar archive:
RESTORE TABLE test_db.test_table FROM Disk('backups', '1.tar')
To change the compression method, the correct file suffix should be appended to
the backup name. For example, to compress the tar archive using gzip run:
BACKUP TABLE test_db.test_table TO Disk('backups', '1.tar.gz')
The supported compression file suffixes are:
tar.gz
.tgz
tar.bz2
tar.lzma
.tar.zst
.tzst
.tar.xz
Compression settings
The compression method and level of compression can be specified using
setting compression_method and compression_level respectively.
BACKUP TABLE test_db.test_table
TO Disk('backups', 'filename.zip')
SETTINGS compression_method='lzma', compression_level=3
Restore specific partitions
If specific partitions associated with a table need to be restored, these can be specified.
Let's create a simple partitioned table into four parts, insert some data into it and then
take a backup of only the first and fourth partitions:
Setup
CREATE IF NOT EXISTS test_db;
-- Create a partitioend table
CREATE TABLE test_db.partitioned (
id UInt32,
data String,
partition_key UInt8
) ENGINE = MergeTree()
PARTITION BY partition_key
ORDER BY id;
INSERT INTO test_db.partitioned VALUES
(1, 'data1', 1),
(2, 'data2', 2),
(3, 'data3', 3),
(4, 'data4', 4);
SELECT count() FROM test_db.partitioned;
SELECT partition_key, count()
FROM test_db.partitioned
GROUP BY partition_key
ORDER BY partition_key;
┌─count()─┐
1. │ 4 │
└─────────┘
┌─partition_key─┬─count()─┐
1. │ 1 │ 1 │
2. │ 2 │ 1 │
3. │ 3 │ 1 │
4. │ 4 │ 1 │
└───────────────┴─────────┘
Run the following command to back up partitions 1 and 4:
BACKUP TABLE test_db.partitioned PARTITIONS '1', '4'
TO Disk('backups', 'partitioned.zip')
Run the following command to restore partitions 1 and 4:
RESTORE TABLE test_db.partitioned PARTITIONS '1', '4'
FROM Disk('backups', 'partitioned.zip')
SETTINGS allow_non_empty_tables=true