This connector materializes Flow collections into tables in an Amazon Redshift database.
The connector uses your AWS account to materialize to Redshift tables by way of files in an S3 bucket. The files in the bucket as as a temporary staging area for data storage and retrieval.
provides the latest connector image. You can also follow the link in your browser to see past image
To use this connector, you'll need:
- A Redshift cluster accessible either directly or using an SSH tunnel. The user configured to connect to Redshift must have at least "create table" permissions for the configured schema. The connector will create new tables in the database per your specification. Tables created manually in advance are not supported. See setup for more information.
- An S3 bucket for staging temporary files. For best performance the bucket should be in the same region as your Redshift cluster. See this guide for instructions on setting up a new S3 bucket.
- An AWS root or IAM user with read and write access to the S3 bucket. For this user, you'll need the access key and secret access key. See the AWS blog for help finding these credentials.
Use the below properties to configure an Amazon Redshift materialization, which will direct one or more of your Flow collections to your desired tables in the database.
|Address||Host and port of the database. Example: red-shift-cluster-name.account.us-east-2.redshift.amazonaws.com:5439||string||Required|
|User||Database user to connect as.||string||Required|
|Password||Password for the specified database user.||string||Required|
|Database||Name of the logical database to materialize to. The materialization will attempt to connect to the default database for the provided user if omitted.||string|
|Database Schema||Database schema for bound collection tables (unless overridden within the binding resource configuration) as well as associated materialization metadata tables.||string|
|Access Key ID||AWS Access Key ID for reading and writing data to the S3 staging bucket.||string||Required|
|Secret Access Key||AWS Secret Access Key for reading and writing data to the S3 staging bucket.||string||Required|
|S3 Staging Bucket||Name of the S3 bucket to use for staging data loads.||string||Required|
|Region||Region of the S3 staging bucket. For optimal performance this should be in the same region as the Redshift database cluster.||string||Required|
|Bucket Path||A prefix that will be used to store objects in S3.||string|
|Advanced Options||Options for advanced users. You should not typically need to modify these.||object|
|Update Delay||Potentially reduce active cluster time by increasing the delay between updates. Defaults to 30 minutes if unset.||string|
|Table||Name of the database table.||string||Required|
|Delta Update||Should updates to this table be done via delta updates. Default is false.||boolean|
|Alternative Schema||Alternative schema for this table (optional).||string|
|Additional Table Create SQL||Additional SQL statement(s) to be run in the same transaction that creates the table.||string|
You must configure your cluster to allow connections from Estuary. This can be accomplished by
making your cluster accessible over the internet for Estuary Flow's IP address
using an SSH tunnel. Connecting to the S3 staging bucket does not use the network tunnel and
connects over HTTPS only.
Instructions for making a cluster accessible over the internet can be found here. When using this option, database connections are made over SSL only.
For allowing secure connections via SSH tunneling:
Refer to the guide to configure an SSH server on using an AWS EC2 instance.
Configure your connector as described in the configuration section above, with the additional of the
networkTunnelstanza to enable the SSH tunnel, if using. See Connecting to endpoints on secure networks for additional details and a sample.
Redshift has requirements for names and
identifiers and this connector will
automatically apply quoting when needed. All table identifiers and column identifiers (corresponding
to Flow collection fields) are treated as lowercase. Table names for bindings must be unique on a
case-insensitive basis, as well as field names of the source collection. If any names are not unique
on a case-insensitive basis (ex:
MyField) the materialization will fail to apply.
If necessary, you can add projections to your collection specification to change field names.
For best performance there should at most one Redshift materialization active per Redshift schema. Additional collections to be materialized should be added as bindings to this single materialization rather than creating a separate materialization for each collection.
In order to achieve exactly-once processing of collection documents, the materialization creates and
uses metadata tables located in the schema configured by the endpoint
schema property. To commit a
transaction, a table-level lock is acquired on these metadata tables. If there are multiple
materializations using the same metadata tables, they will need to take turns acquiring these locks.
This locking behavior prevents "serializable isolation violation" errors in the case of multiple
materializations sharing the same metadata tables at the expense of allowing only a single
materialization to be actively committing a transaction.
Maximum record size
The maximum size of a single input document is 4 MB. Attempting to materialize collections with documents larger than 4 MB will result in an error. To materialize this data you can use a derivation to create a derived collection with smaller documents, or exclude fields containing excessive amounts of data by customizing the materialized fields.
This connector supports both standard (merge) and delta updates. The default is to use standard updates.