Skip to main content

Google Analytics Data API

This connector captures data from Google Analytics 4 properties into Flow collections via the Google Analytics Data API.

It’s available for use in the Flow web application. For local development or open-source workflows, ghcr.io/estuary/source-google-analytics-data-api-native:dev provides the latest version of the connector as a Docker image. You can also follow the link in your browser to see past image versions.

Supported data resources

The following data resources are supported:

  • Daily active users
  • Devices
  • Four-weekly active users
  • Locations
  • Pages
  • Traffic sources
  • Website overview
  • Weekly active users

Each is fetched as a report and mapped to a Flow collection through a separate binding.

You can also capture custom reports.

Prerequisites

To use this connector, you'll need:

  • The Google Analytics Data API enabled on your Google project with which your Analytics property is associated. (Unless you actively develop with Google Cloud, you'll likely just have one option).

  • Your Google Analytics 4 property ID.

Authentication

Your Google username and password is required to authenticate the connector using OAuth2.

Configuration

You configure connectors either in the Flow web app, or by directly editing a specification file. See connectors to learn more about using connectors. The values and specification sample below provide configuration details specific to the Google Analytics Data API source connector.

Properties

Endpoint

The following properties reflect the manual authentication method. If you authenticate directly with Google in the Flow web app, some of these properties aren't required.

PropertyTitleDescriptionTypeRequired/Default
/property_idProperty IDA Google Analytics GA4 property identifier whose events are tracked.stringRequired
/custom_reportsCustom ReportsA JSON array describing the custom reports you want to sync from Google Analytics. Learn more about custom reports.string
/start_dateStart DateThe date from which you'd like to replicate data, in the format YYYY-MM-DDT00:00:00Z. All data generated after this date will be replicated.stringDefaults to 30 days before the present
/credentialsCredentialsCredentials for the serviceobject
/credentials/credentials_titleAuthentication MethodSet to OAuth Credentials.stringRequired
/credentials/client_idOAuth Client IDThe OAuth app's client ID.stringRequired
/credentials/client_secretOAuth Client SecretThe OAuth app's client secret.stringRequired
/credentials/refresh_tokenRefresh TokenThe refresh token received from the OAuth app.stringRequired

Bindings

PropertyTitleDescriptionTypeRequired/Default
/nameData resourceName of the data resource.stringRequired
/intervalIntervalInterval between data syncsstring

Custom reports

You can include data beyond the default data resources with Custom Reports. These replicate the functionality of Custom Reports in the Google Analytics Web console.

Fill out the Custom Reports property with a JSON array as a string with the following schema:

[{"name": "<report-name>", "dimensions": ["<dimension-name>", ...], "metrics": ["<metric-name>", ...]}]

Filters are also supported. See Google's documentation for examples of filters and valid filter syntax.

[{"name": "<report-name>", "dimensions": ["<dimension-name>", ...], "metrics": ["<metric-name>", ...], "dimensionFilter": "<filter-object>", "metricFilter": "<another-filter-object>"}]

Sample

This sample reflects the manual authentication method.

captures:
${PREFIX}/${CAPTURE_NAME}:
endpoint:
connector:
image: ghcr.io/estuary/source-google-analytics-data-api-native:dev
config:
custom_reports: '[{"name": "my_custom_report_with_a_filter", "dimensions": ["browser"], "metrics": ["totalUsers"], "dimensionFilter": {"filter": {"fieldName": "browser", "stringFilter": {"value": "Chrome"}}}}]'
credentials:
credentials_title: OAuth Credentials
client_id: <secret>
client_secret: <secret>
refresh_token: <secret>
start_date: "2025-02-07T17:00:00Z"
property_id: "123456789"
bindings:
- resource:
name: daily_active_users
interval: PT5M
target: ${PREFIX}/daily_active_users

Performance considerations

Data sampling

The Google Analytics Data API enforces compute thresholds for ad-hoc queries and reports. If a threshold is exceeded, the API will apply sampling to limit the number of sessions analyzed for the specified time range. These thresholds can be found here. If your account is on the Analytics 360 tier, you're less likely to run into these limitations.