Google Cloud Bigtable
This connector materializes Estuary collections into tables in a Google Cloud Bigtable instance.
Prerequisites
To use this connector, you'll need:
-
A Google Cloud project with the Bigtable API enabled.
-
A Bigtable instance within that project, with at least one table already created (see the note on the first table below).
-
A Google Cloud service account authorized for the Bigtable instance with both of the following roles:
roles/bigtable.user— for reading and writing rows.roles/bigtable.admin— for creating tables and column families during the connector's Apply step.
Both roles are required: the connector both administers tables and reads/writes their data. See Setup for detailed steps.
Setup
To prepare your Bigtable instance and service account, complete the following steps.
-
Create a Bigtable instance in the project of your choice, if one doesn't already exist. For example, using the
gcloudCLI:gcloud bigtable instances create my-instance \
--display-name=my-instance \
--cluster-config=id=my-instance-c1,zone=us-east1-d,nodes=1 \
--project=my-gcp-project -
Create a placeholder table in the instance if it has no tables yet (see The instance must contain at least one table):
cbt -project=my-gcp-project -instance=my-instance createtable __keepalivecbtis part of the gcloud SDK and can be installed withgcloud components install cbt. -
Create a service account for the connector to use:
gcloud iam service-accounts create bigtable-materialization \
--display-name="Bigtable materialization" \
--project=my-gcp-project -
Grant the service account both
roles/bigtable.userandroles/bigtable.adminon the Bigtable instance:SA="<service-account-email>"
gcloud bigtable instances add-iam-policy-binding my-instance \
--member="serviceAccount:${SA}" \
--role='roles/bigtable.user' \
--project=my-gcp-project
gcloud bigtable instances add-iam-policy-binding my-instance \
--member="serviceAccount:${SA}" \
--role='roles/bigtable.admin' \
--project=my-gcp-projectYou can also grant these roles at the project level if you prefer broader scoping. IAM bindings can take several minutes to propagate.
-
Authenticate the connector with the service account using one of:
-
Service account key: select the new service account in the Cloud console. On the Keys tab, click Add key and create a new JSON key. The key is automatically downloaded. You'll paste its contents into the connector's
credentials_jsonfield. -
Google Cloud IAM (workload identity federation): follow the steps in the GCP IAM guide. This avoids managing a long-lived service account key.
-
The instance must contain at least one table
The Bigtable client library primes its connection pool with a PingAndWarm request when it starts. If the target instance has no tables, the server returns NotFound: No tables found for instance and the client treats this as a fatal startup error — so the connector cannot Validate or Apply against a brand-new empty instance.
Data model
Bigtable is a wide-column NoSQL store: each row has a single byte-string row key, and cell values are stored as bytes within column families. The connector maps Estuary data collections onto this model as follows:
- Tables correspond to bindings. Each binding writes to one Bigtable table.
- Row keys are derived from the source collection's primary key. Composite keys are encoded as FoundationDB-packed tuples, which preserves lexicographic ordering of the components — so range scans by a key prefix work efficiently.
- Column family: the connector uses a single column family named
ffor all cells. The column family is created automatically with the table. - Columns: each selected field is stored under a column qualifier matching the field name. The materialized root document is stored under the column qualifier
flow_document(or an alternate name if a projection is configured for the source collection's root document).
Value encoding
Bigtable stores all cell values as raw bytes. The connector encodes field values as follows:
| Data type | Encoding |
|---|---|
| Boolean | A single byte: 0x00 for false, 0x01 for true. |
Integer (fits in int64) | 8 bytes, big-endian. This matches the format Bigtable uses for atomic increment operations. |
Integer (wider than int64) | Decimal text (for example "99999999999999999999"). Used when schema inference indicates the value range or string length exceeds int64. |
| Number (floating point) | 8 bytes, big-endian IEEE 754. Special values NaN, Infinity, and -Infinity are accepted. For values whose schema indicates a precision greater than 17 significant digits, the textual form is used instead. |
| String | UTF-8 bytes. |
| Binary | Raw bytes (base64-decoded from the source JSON). |
| Array, object, or multi-type | The original JSON encoding, stored as bytes. The root document is also stored in this form. |
| Null | An empty byte slice. |
A null value and a zero-length string or binary value are both stored as empty bytes and cannot be distinguished after the fact.
Table names
Bigtable table IDs must match the pattern [_a-zA-Z0-9][-_.a-zA-Z0-9]* and are capped at 50 characters (reference). The connector sanitizes binding table names to fit these rules: characters outside the allowed set are replaced with _, leading - and . characters are stripped, and the name is truncated to 50 characters if needed.
Configuration
You configure connectors either in the Estuary web app, or by directly editing the catalog specification file. See connectors to learn more about using connectors. The values and specification sample below provide configuration details specific to the Bigtable materialization connector.
Properties
Endpoint
| Property | Title | Description | Type | Required/Default |
|---|---|---|---|---|
/project_id | Project ID | Google Cloud Project ID that owns the Bigtable instance. | string | Required |
/instance_id | Instance ID | Bigtable instance ID for the materialized tables. | string | Required |
/credentials | Authentication | Credentials for authentication. | Credentials | Required |
/hardDelete | Hard Delete | If enabled, items deleted in the source will also be deleted from the destination. Otherwise, _meta/op in the destination will signify whether rows have been deleted (soft-delete). | boolean | false |
/advanced/endpoint | Bigtable Endpoint | The Bigtable endpoint URI to connect to. Use if you're materializing to a Bigtable-compatible API that isn't provided by Google. | string |
Credentials
Credentials for authenticating with GCP. Use one of the following sets of options:
| Property | Title | Description | Type | Required/Default |
|---|---|---|---|---|
/auth_type | Auth Type | Method to use for authentication. | string | Required: CredentialsJSON |
/credentials_json | Service Account JSON | The JSON credentials of the service account to use for authorization. | string | Required |
| Property | Title | Description | Type | Required/Default |
|---|---|---|---|---|
/auth_type | Auth Type | Method to use for authentication. | string | Required: GCPIAM |
/gcp_service_account_to_impersonate | Service Account | GCP service account email to impersonate. | string | Required |
/gcp_workload_identity_pool_audience | Workload Identity Pool Audience | GCP Workload Identity Pool Audience in the format https://iam.googleapis.com/projects/123/locations/global/workloadIdentityPools/test-pool/providers/test-provider. | string | Required |
Bindings
| Property | Title | Description | Type | Required/Default |
|---|---|---|---|---|
/table | Table Name | The name of the Bigtable table to materialize to. | string | Required |
Sample
materializations:
${PREFIX}/${MATERIALIZATION_NAME}:
endpoint:
connector:
image: ghcr.io/estuary/materialize-bigtable:v1
config:
project_id: my-gcp-project
instance_id: my-bigtable-instance
credentials:
auth_type: CredentialsJSON
credentials_json: <secret>
bindings:
- resource:
table: ${TABLE_NAME}
source: ${PREFIX}/${COLLECTION_NAME}
Hard delete
By default, deletions in the source surface as soft-deletes in Bigtable: the row is rewritten with the deletion document and the _meta/op field set to d, and downstream consumers can filter on that field. To instead remove the row from Bigtable when its source is deleted, set hardDelete: true in the endpoint configuration.