Skip to main content

Starburst

This connector materializes transactionally Flow collections into Iceberg or Delta Lake tables using AWS S3 storage in Starburst Galaxy. Starburst Galaxy connector supports only standard(merge) updates.

The connector makes use of S3 AWS storage for storing temporarily data during the materialization process.

ghcr.io/estuary/materialize-starburst:dev provides the latest connector image. You can also follow the link in your browser to see past image versions.

Prerequisites

To use this connector, you'll need:

  • A Starburst Galaxy account (To create one: Staburst Galaxy start that includes:
    • A running cluster containing an Amazon S3 catalog
    • A schema which is a logical grouping of tables
    • Storage on S3 for temporary data with awsAccessKeyId and awsSecretAccessKey which should correspond to used catalog
    • A user with a role assigned that grants access to create, modify, drop tables in specified Amazon S3 catalog
  • At least one Flow collection

Setup

To get host go to your Cluster -> Connection info -> Other clients (Connect clients)

There is also need to grant access to temporary storage (Roles and privileges -> Select specific role -> Privileges -> Add privilege -> Location). "Create schema and table in location" should be selected. Doc

Configuration

To use this connector, begin with data in one or more Flow collections. Use the below properties to configure a Starburst materialization, which will direct one or more of your Flow collections to new Starburst tables.

Properties

Endpoint

PropertyTitleDescriptionTypeRequired/Default
/hostHost and optional portstringRequired
/catalogCatalog NameGalaxy catalog CatalogstringRequired
/schemaSchema NameDefault schema to materialize tostringRequired
/accountAccountGalaxy account namestringRequired
/passwordPasswordGalaxy account passwordstringRequired
/awsAccessKeyIdAWS Access Key IDstringRequired
/awsSecretAccessKeyAWS Secret Access KeystringRequired
/regionAWS RegionRegion of AWS storagestringRequired
/bucketBucket namestringRequired
/bucketPathBucket pathA prefix that will be used to store objects in S3.stringRequired

Bindings

PropertyTitleDescriptionTypeRequired/Default
/tableTableTable namestringRequired
/schemaAlternative SchemaAlternative schema for this tablestring

Sample


materializations:
${PREFIX}/${mat_name}:
endpoint:
connector:
config:
host: HOST:PORT
account: ACCOUNT
password: PASSWORD
catalog: CATALOG_NAME
schema: SCHEMA_NAME
awsAccessKeyId: AWS_ACCESS_KEY_ID
awsSecretAccessKey: AWS_SECRET_KEY_ID
region: REGION
bucket: BUCKET
bucketPath: BUCKET_PATH
image: ghcr.io/estuary/materialize-starburst:dev
# If you have multiple collections you need to materialize, add a binding for each one
# to ensure complete data flow-through
bindings:
- resource:
table: ${table_name}
schema: default
source: ${PREFIX}/${source_collection}

Sync Schedule

This connector supports configuring a schedule for sync frequency. You can read about how to configure this here.

Reserved words

Starburst Galaxy has a list of reserved words that must be quoted in order to be used as an identifier. Flow automatically quotes fields that are in the reserved words list. You can find this list in Trino's documentation here and in the table below.

caution

In Starburst Galaxy, objects created with quoted identifiers must always be referenced exactly as created, including the quotes. Otherwise, SQL statements and queries can result in errors. See the Trino docs.

Reserved words
CUBEINSERTTABLE
CURRENT_CATALOGINTERSECTTHEN
CURRENT_DATEINTOTRIM
CURRENT_PATHISTRUE
CURRENT_ROLEJOINUESCAPE
CURRENT_SCHEMAJSON_ARRAYUNION
CURRENT_TIMEJSON_EXISTSUNNEST
CURRENT_TIMESTAMPJSON_OBJECTUSING
CURRENT_USERJSON_QUERYVALUES
DEALLOCATEJSON_TABLEWHEN
DELETEJSON_VALUEWHERE
DESCRIBELEFTWITH
DISTINCTLIKE
DROPLISTAGG
ELSELOCALTIME
ENDLOCALTIMESTAMP
ESCAPENATURAL
EXCEPTNORMALIZE
EXECUTENOT
EXISTSNULL