Skip to main content

ClickHouse

This connector materializes Estuary collections into tables in a ClickHouse database.

ClickHouse is a column-oriented OLAP database designed for real-time analytics. This connector writes batches directly to ClickHouse using the Native protocol and Native format.

Estuary also provides a Dekaf-based integration for users who prefer to ingest via ClickPipes.

Prerequisites

To use this connector, you'll need:

  • A ClickHouse database (self-hosted or ClickHouse Cloud) and user.
  • The connector uses the ClickHouse native protocol. The default port is 9440 (TLS enabled, the default) or 9000 (TLS disabled). It does not use the HTTP interface on port 8123.
  • At least one Estuary collection.

Required Permissions

-- Target database access: CREATE TABLE, DROP TABLE, SELECT, INSERT, TRUNCATE, etc.
GRANT ALL ON <database>.* TO <user>;

-- System table access for metadata discovery and partition management.
-- These are NOT covered by the database grant above.
GRANT SELECT ON system.columns TO <user>;
GRANT SELECT ON system.parts TO <user>;
GRANT SELECT ON system.tables TO <user>;

Replace <user> and <database> with your actual user and database names.

Optionally, you can use row-level policies to restrict the user's system table access to only the target database:

CREATE ROW POLICY estuary_tables ON system.tables FOR SELECT USING database = '<database>' TO <user>;
CREATE ROW POLICY estuary_columns ON system.columns FOR SELECT USING database = '<database>' TO <user>;
CREATE ROW POLICY estuary_parts ON system.parts FOR SELECT USING database = '<database>' TO <user>;
tip

If you haven't yet captured your data from its external source, start at the beginning of the guide to create a dataflow. You'll be referred back to this connector-specific documentation at the appropriate steps.

Configuration

To use this connector, begin with data in one or more Estuary collections. Use the below properties to configure a ClickHouse materialization, which will direct the contents of these collections into ClickHouse tables.

Properties

Endpoint

PropertyTitleDescriptionTypeRequired/Default
/addressAddressHost and port of the database, in the form of host[:port]. Port 9440 is used as the default when SSL is enabled (the default), or 9000 when SSL is disabled.stringRequired
/credentialsAuthenticationobjectRequired
/credentials/auth_typeAuth TypeAuthentication type. Must be user_password.stringRequired
/credentials/usernameUsernameDatabase username.stringRequired
/credentials/passwordPasswordDatabase password.stringRequired
/databaseDatabaseName of the ClickHouse database to materialize to.stringRequired
/hardDeleteHard DeleteIf enabled, items deleted in the source will also be deleted from the destination. By default, deletions are tracked via _meta/op (soft-delete).booleanfalse
/advanced/sslmodeSSL ModeControls the TLS connection behavior. Options: disable, require, verify-full.stringverify-full
/advanced/no_flow_documentExclude Flow DocumentWhen enabled, the root document column will not be required for standard updates.booleanfalse

Bindings

PropertyTitleDescriptionTypeRequired/Default
/tableTableName of the database table to materialize to. The connector will create the table if it doesn't already exist.stringRequired
/delta_updatesDelta UpdateShould updates to this table be done via delta updates.booleanfalse

Sample

materializations:
${PREFIX}/${mat_name}:
endpoint:
connector:
config:
address: clickhouse.example.com:9440
credentials:
auth_type: user_password
username: flow_user
password: secret
database: my_database
image: ghcr.io/estuary/materialize-clickhouse:v1
bindings:
- resource:
table: my_table
source: ${PREFIX}/${source_collection}

ReplacingMergeTree and FINAL

In standard (non-delta) mode, the connector creates tables using the ReplacingMergeTree engine with flow_published_at as the version column. Updated records are inserted as new rows; ClickHouse deduplicates them in a background process, keeping the row with the highest flow_published_at value for each key.

The connector also configures automatic background cleanup merges so that superseded rows and tombstones are eventually removed from disk.

Your queries should use the FINAL directive to get results with duplicate and tombstone rows removed:

SELECT * FROM my_table FINAL;

Hard deletes

When hardDelete: true is set in the endpoint configuration, the connector adds an _is_deleted (UInt8) column to each table. When a source document is deleted, the connector inserts a tombstone row with _is_deleted = 1 and the same key columns as the original row. The ReplacingMergeTree engine uses _is_deleted to exclude these rows from FINAL queries, and automatic cleanup merges eventually remove the tombstoned records from disk.

Soft deletes

By default (when hardDelete is not enabled), source deletions are tracked in the destination via the _meta/op column, which indicates whether a row was created, updated, or deleted. The row itself remains in the table.

Delta updates

This connector supports delta updates on a per-binding basis. When delta_updates is enabled for a binding, the table uses the MergeTree engine instead of ReplacingMergeTree. Every store operation is appended as-is with no deduplication — rows accumulate and are never removed.