Skip to main content

Microsoft SQL Server

This connector uses change data capture (CDC) to continuously capture updates in a Microsoft SQL Server database into one or more Flow collections.

It’s available for use in the Flow web application. For local development or open-source workflows, ghcr.io/estuary/source-sqlserver:dev provides the latest version of the connector as a Docker image. You can also follow the link in your browser to see past image versions.

Supported versions and platforms

This connector will work on both hosted deployments and all major cloud providers. It is designed for databases using any version of SQL Server which has CDC support, and is regularly tested against SQL Server 2017 and up.

Setup instructions are provided for the following platforms:

Prerequisites

To capture change events from SQL Server tables using this connector, you need:

  • For each table to be captured, a primary key should be specified in the database. If a table doesn't have a primary key, you must manually specify a key in the associated Flow collection definition while creating the capture. See detailed steps.

  • CDC enabled on the database and the individual tables to be captured. (This creates change tables in the database, from which the connector reads.)

  • A watermarks table. The watermarks table is a small “scratch space” to which the connector occasionally writes a small amount of data to ensure accuracy when backfilling preexisting table contents.

  • A user role with:

    • SELECT permissions on the CDC schema and the schemas that contain tables to be captured.
    • Access to the change tables created as part of the SQL Server CDC process.
    • SELECT, INSERT, and UPDATE permissions on the watermarks table

Setup

To meet these requirements, follow the steps for your hosting type.

Self-hosted SQL Server

  1. Connect to the server and issue the following commands:
USE <database>;
-- Enable CDC for the database.
EXEC sys.sp_cdc_enable_db;
-- Create user and password for use with the connector.
CREATE LOGIN flow_capture WITH PASSWORD = 'secret';
CREATE USER flow_capture FOR LOGIN flow_capture;
-- Grant the user permissions on the CDC schema and schemas with data.
-- This assumes all tables to be captured are in the default schema, `dbo`.
-- Add similar queries for any other schemas that contain tables you want to capture.
GRANT SELECT ON SCHEMA :: dbo TO flow_capture;
GRANT SELECT ON SCHEMA :: cdc TO flow_capture;
-- Create the watermarks table and grant permissions.
CREATE TABLE dbo.flow_watermarks(slot INTEGER PRIMARY KEY, watermark TEXT);
GRANT SELECT, INSERT, UPDATE ON dbo.flow_watermarks TO flow_capture;
-- Enable CDC on tables. The below query enables CDC the watermarks table ONLY.
-- You should add similar query for all other tables you intend to capture.
EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'flow_watermarks', @role_name = 'flow_capture';
  1. Allow secure connection to Estuary Flow from your hosting environment. Either:

Azure SQL Database

  1. Allow connections between the database and Estuary Flow. There are two ways to do this: by granting direct access to Flow's IP or by creating an SSH tunnel.

    1. To allow direct access:

      • Create a new firewall rule that grants access to the IP address 34.121.207.128.
    2. To allow secure connections via SSH tunneling:

  2. In your SQL client, connect to your instance as the default sqlserver user and issue the following commands.

USE <database>;
-- Enable CDC for the database.
EXEC sys.sp_cdc_enable_db;
-- Create user and password for use with the connector.
CREATE LOGIN flow_capture WITH PASSWORD = 'secret';
CREATE USER flow_capture FOR LOGIN flow_capture;
-- Grant the user permissions on the CDC schema and schemas with data.
-- This assumes all tables to be captured are in the default schema, `dbo`.
-- Add similar queries for any other schemas that contain tables you want to capture.
GRANT SELECT ON SCHEMA :: dbo TO flow_capture;
GRANT SELECT ON SCHEMA :: cdc TO flow_capture;
-- Create the watermarks table and grant permissions.
CREATE TABLE dbo.flow_watermarks(slot INTEGER PRIMARY KEY, watermark TEXT);
GRANT SELECT, INSERT, UPDATE ON dbo.flow_watermarks TO flow_capture;
-- Enable CDC on tables. The below query enables CDC the watermarks table ONLY.
-- You should add similar query for all other tables you intend to capture.
EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'flow_watermarks', @role_name = 'flow_capture';
  1. Note the following important items for configuration:

    • Find the instance's host under Server Name. The port is always 1433. Together, you'll use the host:port as the address property when you configure the connector.
    • Format user as username@databasename; for example, flow_capture@myazuredb.

Configuration

You configure connectors either in the Flow web app, or by directly editing the catalog specification file. See connectors to learn more about using connectors. The values and specification sample below provide configuration details specific to the SQL Server source connector.

Properties

Endpoint

PropertyTitleDescriptionTypeRequired/Default
/addressServer AddressThe host or host:port at which the database can be reached.stringRequired
/databaseDatabaseLogical database name to capture from.stringRequired
/userUserThe database user to authenticate as.stringRequired, "flow_capture"
/passwordPasswordPassword for the specified database user.stringRequired
/advancedAdvanced OptionsOptions for advanced users. You should not typically need to modify these.object
/advanced/backfill_chunk_sizeBackfill Chunk SizeThe number of rows which should be fetched from the database in a single backfill query.integer4096
/advanced/skip_backfillsSkip BackfillsA comma-separated list of fully-qualified table names which should not be backfilled.string
/advanced/watermarksTableWatermarks TableThe name of the table used for watermark writes during backfills. Must be fully-qualified in '<schema>.<table>' form.string"dbo.flow_watermarks"

Bindings

PropertyTitleDescriptionTypeRequired/Default
/namespaceNamespaceThe namespace/schema of the table.stringRequired
/streamStreamTable name.stringRequired
/primary_keyPrimary Key ColumnsarrayThe columns which together form the primary key of the table.

Sample

captures:
${PREFIX}/${CAPTURE_NAME}:
endpoint:
connector:
image: "ghcr.io/estuary/source-sqlserver:dev"
config:
address: "<host>:1433"
database: "my_db"
user: "flow_capture"
password: "secret"
bindings:
- resource:
stream: ${TABLE_NAME}
namespace: dbo
primary_key: ["id"]
target: ${PREFIX}/${COLLECTION_NAME}

Your capture definition will likely be more complex, with additional bindings for each table in the source database.

Learn more about capture definitions.

Specifying Flow collection keys

Every Flow collection must have a key. As long as your SQL Server tables have a primary key specified, the connector will set the corresponding collection's key accordingly.

In cases where a SQL Server table you want to capture doesn't have a primary key, you can manually add it to the collection definition during the capture creation workflow.

  1. After you input the endpoint configuration and click Next, the tables in your database have been mapped to Flow collections. Click each collection's Specification tab and identify a collection where "key": [ ], is empty.

  2. Click inside the empty key value in the editor and input the name of column in the table to use as the key, formatted as a JSON pointer. For example "key": ["/foo"],

    Make sure the key field is required, not nullable, and of an allowed type. Make any other necessary changes to the collection specification to accommodate this.

  3. Repeat with other missing collection keys, if necessary.

  4. Save and publish the capture as usual.