DataLakeCatalog
The DataLakeCatalog
database engine enables you to connect ClickHouse to external
data catalogs and query open table format data without the need for data duplication.
This transforms ClickHouse into a powerful query engine that works seamlessly with
your existing data lake infrastructure.
Supported Catalogs
The DataLakeCatalog
engine supports the following data catalogs:
- AWS Glue Catalog - For Iceberg tables in AWS environments
- Databricks Unity Catalog - For Delta Lake and Iceberg tables
- Hive Metastore - Traditional Hadoop ecosystem catalog
- REST Catalogs - Any catalog supporting the Iceberg REST specification
Creating a Database
You will need to enable the relevant settings below to use the DataLakeCatalog
engine:
Databases with the DataLakeCatalog
engine can be created using the following syntax:
The following settings are supported:
Setting | Description |
---|---|
catalog_type | Type of catalog: glue , unity (Delta), rest (Iceberg), hive |
warehouse | The warehouse/database name to use in the catalog. |
catalog_credential | Authentication credential for the catalog (e.g., API key or token) |
auth_header | Custom HTTP header for authentication with the catalog service |
auth_scope | OAuth2 scope for authentication (if using OAuth) |
storage_endpoint | Endpoint URL for the underlying storage |
oauth_server_uri | URI of the OAuth2 authorization server for authentication |
vended_credentials | Boolean indicating whether to use vended credentials (AWS-specific) |
aws_access_key_id | AWS access key ID for S3/Glue access (if not using vended credentials) |
aws_secret_access_key | AWS secret access key for S3/Glue access (if not using vended credentials) |
region | AWS region for the service (e.g., us-east-1 ) |
Examples
See below pages for examples of using the DataLakeCatalog
engine: