Skip to main content

ClickHouse

This connector materializes Estuary collections into tables in a ClickHouse database.

ClickHouse is a column-oriented OLAP database designed for real-time analytics. This connector writes batches directly to ClickHouse using the Native protocol and Native format.

When to use this connector

We recommend this direct materialization for most ClickHouse use cases. It writes to ClickHouse natively, supports hard and soft deletes, and works with both self-hosted and ClickHouse Cloud deployments.

Estuary also offers a Dekaf-based integration that ingests via ClickPipes. Choose Dekaf when:

  • You are required to use ClickPipes for data ingestion (e.g., organizational policy)
  • You want ClickHouse Cloud to manage the ingestion infrastructure via ClickPipes

Prerequisites

To use this connector, you'll need:

  • A ClickHouse database (self-hosted or ClickHouse Cloud) and user.
  • The connector uses the ClickHouse native protocol. The default port is 9440 (TLS enabled, the default) or 9000 (TLS disabled). It does not use the HTTP interface on port 8123.
  • At least one Estuary collection.

Required Permissions

-- Target database access: CREATE TABLE, DROP TABLE, SELECT, INSERT, TRUNCATE, etc.
GRANT ALL ON <database>.* TO <user>;

-- System table access for metadata discovery and partition management.
-- These are NOT covered by the database grant above.
GRANT SELECT ON system.columns TO <user>;
GRANT SELECT ON system.parts TO <user>;
GRANT SELECT ON system.tables TO <user>;

Replace <user> and <database> with your actual user and database names.

Optionally, you can use row-level policies to restrict the user's system table access to only the target database:

CREATE ROW POLICY estuary_tables ON system.tables FOR SELECT USING database = '<database>' TO <user>;
CREATE ROW POLICY estuary_columns ON system.columns FOR SELECT USING database = '<database>' TO <user>;
CREATE ROW POLICY estuary_parts ON system.parts FOR SELECT USING database = '<database>' TO <user>;
tip

If you haven't yet captured your data from its external source, start at the beginning of the guide to create a dataflow. You'll be referred back to this connector-specific documentation at the appropriate steps.

Configuration

To use this connector, begin with data in one or more Estuary collections. Use the below properties to configure a ClickHouse materialization, which will direct the contents of these collections into ClickHouse tables.

Properties

Endpoint

PropertyTitleDescriptionTypeRequired/Default
/addressAddressHost and port of the database, in the form of host[:port]. Port 9440 is used as the default when SSL is enabled (the default), or 9000 when SSL is disabled.stringRequired
/credentialsAuthenticationobjectRequired
/credentials/auth_typeAuth TypeAuthentication type. Must be user_password.stringRequired
/credentials/usernameUsernameDatabase username.stringRequired
/credentials/passwordPasswordDatabase password.stringRequired
/databaseDatabaseName of the ClickHouse database to materialize to.stringRequired
/hardDeleteHard DeleteIf enabled, items deleted in the source will also be deleted from the destination. By default, deletions are tracked via _meta/op (soft-delete).booleanfalse
/advanced/sslmodeSSL ModeControls the TLS connection behavior. Options: disable, require, verify-full.stringverify-full
/advanced/no_flow_documentExclude Flow DocumentWhen enabled, the root document column will not be required for standard updates.booleanfalse

Bindings

PropertyTitleDescriptionTypeRequired/Default
/tableTableName of the database table to materialize to. The connector will create the table if it doesn't already exist.stringRequired
/delta_updatesDelta UpdateShould updates to this table be done via delta updates.booleanfalse

Sample

materializations:
${PREFIX}/${mat_name}:
endpoint:
connector:
config:
address: clickhouse.example.com:9440
credentials:
auth_type: user_password
username: flow_user
password: secret
database: my_database
image: ghcr.io/estuary/materialize-clickhouse:v1
bindings:
- resource:
table: my_table
source: ${PREFIX}/${source_collection}

ReplacingMergeTree and FINAL

In standard (non-delta) mode, the connector creates tables using the ReplacingMergeTree engine with flow_published_at as the version column. Updated records are inserted as new rows; ClickHouse deduplicates them in a background process, keeping the row with the highest flow_published_at value for each key.

The connector also configures automatic background cleanup merges so that superseded rows and tombstones are eventually removed from persistent storage.

Your queries should use the FINAL directive to get results with duplicate and tombstone rows removed:

SELECT * FROM my_table FINAL;

Handling deletions

By default, the connector materializes deletions as soft deletes. This can be explicitly set by setting the hardDelete endpoint configuration parameter to true or false.

In soft delete mode, source deletions are tracked in the destination ClickHouse table via the _meta/op column, which indicates whether a row was created, updated, or deleted. The row itself remains in the table.

In hard delete mode, the connector marks rows for deletion by inserting a new row with the original primary key, plus _is_deleted = 1. This is often called a "tombstone". The ReplacingMergeTree ClickHouse engine uses _is_deleted to exclude these rows from FINAL queries, and to automatically remove the tombstoned rows from persistent storage. Removing tombstoned rows is a background task in the ClickHouse server, configured by this connector.

Delta updates

This connector supports delta updates on a per-binding basis. When delta_updates is enabled for a binding, the table uses the MergeTree engine instead of ReplacingMergeTree. Every store operation is appended as-is with no deduplication — rows accumulate and are never removed.