Skip to main content

SQL Server Batch Query Connector

This connector captures data from SQL Server into Flow collections by periodically executing queries and translating the results into JSON documents.

For local development or open-source workflows, ghcr.io/estuary/source-sqlserver-batch:dev provides the latest version of the connector as a Docker image. You can also follow the link in your browser to see past image versions.

We recommend using our SQL Server CDC Connector instead if possible. Using CDC provides lower latency data capture, delete and update events, and usually has a smaller impact on the source database.

However there are some circumstances where this might not be feasible. Perhaps you need to capture from a managed SQL Server instance which doesn't support logical replication. Or perhaps you need to capture the contents of a view or the result of an ad-hoc query. That's the sort of situation this connector is intended for.

The number one caveat you need to be aware of when using this connector is that it will periodically execute its update query over and over. For example, if you set the polling interval to 5 minutes, a naive SELECT * FROM foo query against a 100 MiB view will produce 30 GiB/day of ingested data, most of it duplicated. The default polling interval is set to 24 hours to minimize the impact of this behavior, but even then it could mean a lot of duplicated data being processed depending on the size of your tables.

If you start editing these queries or manually adding capture bindings for views or to run ad-hoc queries, you need to either have some way of restricting the query to "just the new rows since last time" or else have your polling interval set high enough that the data rate <DatasetSize> / <PollingInterval> is an amount of data you're willing to deal with.

Configuration

You configure connectors either in the Flow web app, or by directly editing the catalog specification file. See connectors to learn more about using connectors. The values and specification sample below provide configuration details specific to the SQL Server batch source connector.

Properties

Endpoint

PropertyTitleDescriptionTypeRequired/Default
/addressServer AddressThe host or host:port at which the database can be reached.stringRequired
/databaseDatabaseLogical database name to capture from.stringRequired
/userUserThe database user to authenticate as.stringRequired, "flow_capture"
/passwordPasswordPassword for the specified database user.stringRequired
/advancedAdvanced OptionsOptions for advanced users. You should not typically need to modify these.object
/advanced/discover_viewsDiscover ViewsWhen set views will be automatically discovered as resources. If unset only tables will be discovered.booleanfalse
/advanced/timezoneTimezoneThe IANA timezone name in which datetime columns will be converted to RFC3339 timestamps.stringUTC
/advanced/pollPolling ScheduleWhen and how often to execute fetch queries. Accepts a Go duration string like '5m' or '6h' for frequency-based polling or a string like 'daily at 12:34Z' to poll at a specific time (specified in UTC) every day.string24h
/advanced/discover_schemasDiscovery Schema SelectionIf this is specified only tables in the selected schema(s) will be automatically discovered. Omit all entries to discover tables from all schemas.string[]

Bindings

PropertyTitleDescriptionTypeRequired/Default
/nameNameThe name of the resourcestringRequired
/schemaSchemaSchema where the table is locatedstring
/tableTableThe name of the table to be capturedstring
/cursorCursorThe names of columns which should be persisted between query executions as a cursorstring[]

Sample

captures:
${PREFIX}/${CAPTURE_NAME}:
endpoint:
connector:
image: "ghcr.io/estuary/source-sqlserver-batch:dev"
config:
address: "<host>:1433"
database: "my_db"
user: "flow_capture"
password: "secret"
bindings:
- resource:
name: "transactions"
schema: "main"
table: "transactions"
cursor:
- "id"
target: ${PREFIX}/${COLLECTION_NAME}