Skip to main content

Google BigQuery

This Flow connector materializes Flow collections into tables within a Google BigQuery dataset. It allows both standard and delta updates.

The connector uses your Google Cloud service account to materialize to BigQuery tables by way of files in a Google Cloud Storage (GCS) bucket. The tables in the bucket act as a temporary staging area for data storage and retrieval.

It is available for use in the Flow web application. For local development or open-source workflows, provides the latest version of the connector as a Docker image. You can also follow the link in your browser to see past image versions.

Performance considerations

Like other Estuary connectors, this is a real-time connector that materializes documents using continuous transactions. However, in practice, there are speed limitations. Standard BigQuery tables are limited to 1500 operations per day. This means that the connector is limited 1500 transactions per day.

To avoid running up against this limit, you should set the minimum transaction time to a recommended value of 2 minutes, or a minimum value of 1 minute. You do this by configuring the materialization's task shard. This causes an apparent delay in the materialization, but is necessary to prevent error. This also makes transactions more efficient, which reduces costs in BigQuery, especially for large datasets.

Instructions to set the minimum transaction time are detailed below.


To use this connector, you'll need:

The Flow collections you materialize must accommodate the following naming restrictions:

  • Field names may not contain hyphens (-), or the materialization will fail.
  • Field names must begin with a letter or underscore (_), or the materialization will fail.
  • Field names may contain non-alphanumeric characters, but these are replaced with underscores in the corresponding BigQuery column name.
  • If two field names become identical after special characters are replaced with underscores (for example, field! and field$ both become field_), the materialization will fail.
  • Collection names may contain non-alphanumeric characters, but all such characters except hyphens are replaced with underscores in the BigQuery table name.

If necessary, you can add projections to your collection specification to change field names.


If you haven't yet captured your data from its external source, start at the beginning of the guide to create a dataflow. You'll be referred back to this connector-specific documentation at the appropriate steps.


To configure your service account, complete the following steps.

  1. Log into the Google Cloud console and create a service account. During account creation:

    1. Grant the user access to the project.
    2. Grant the user roles roles/bigquery.dataEditor, roles/bigquery.jobUser, and roles/storage.objectAdmin.
    3. Click Done.
  2. Select the new service account from the list of service accounts. On the Keys tab, click Add key and create a new JSON key.

    The key is automatically downloaded. You'll use it to configure the connector.


To use this connector, begin with data in one or more Flow collections. Use the below properties to configure a BigQuery materialization, which will direct one or more of your Flow collections to your desired tables within a BigQuery dataset.

A BigQuery dataset is the top-level container within a project, and comprises multiple tables. You can think of a dataset as somewhat analogous to a schema in a relational database. For a complete introduction to resource organization in Bigquery, see the BigQuery docs.



/project_idProject IDThe project ID for the Google Cloud Storage bucket and BigQuery dataset.StringRequired
/billing_project_idBilling project IDThe project ID to which these operations are billed in BigQuery. Typically, you want this to be the same as project_id (the default).StringSame as project_id
/datasetDatasetName of the target BigQuery dataset.StringRequired
/regionRegionThe GCS region.StringRequired
/bucketBucketName of the GCS bucket.StringRequired
/bucket_pathBucket pathBase path within the GCS bucket. Also called "Folder" in the GCS console.String
/credentials_jsonService Account JSONThe JSON credentials of the service account to use for authorization.StringRequired

To learn more about project billing, see the BigQuery docs.


/tableTableTable name.stringRequired
/delta_updatesDelta updates.Whether to use standard or delta updatesbooleanfalse

Shard configuration


UI controls for this workflow will be added to the Flow web app soon. For now, you must edit the materialization specification manually, either in the web app or using the CLI.

To avoid exceeding your BigQuery tables' daily operation limits as discussed in Performance considerations, complete the following steps when configuring your materialization:

  1. Using the Flow web application or the flowctl CLI, create a draft materialization as you normally would.

    1. If using the web app, input the required values and click Discover Endpoint.
    2. If using the flowctl, create your materialization specification manually.
  2. Add the shards configuration to the materialization specification at the same indentation level as endpoint and bindings. Set the minTxnDuration property to at least 1m (we recommend 2m). In the web app, you do this in the Catalog Editor.

    minTxnDuration: 2m

    A full sample is included below.

  3. Continue to test, save, and publish the materialization as usual.


project_id: our-bigquery-project
dataset: materialized-data
region: US
bucket: our-gcs-bucket
bucket_path: bucket-path/
credentials_json: <secret>
- resource:
table: ${table_name}
source: ${PREFIX}/${source_collection}
minTxnDuration: 2m

Delta updates

This connector supports both standard (merge) and delta updates. The default is to use standard updates.

Enabling delta updates will prevent Flow from querying for documents in your BigQuery table, which can reduce latency and costs for large datasets. If you're certain that all events will have unique keys, enabling delta updates is a simple way to improve performance with no effect on the output. However, enabling delta updates is not suitable for all workflows, as the resulting table in BigQuery won't be fully reduced.

You can enable delta updates on a per-binding basis:

- resource:
table: ${table_name}
delta_updates: true
source: ${PREFIX}/${source_collection}