MariaDB
This is a change data capture (CDC) connector that captures change events from a MariaDB database via the Binary Log. It's derived from the MySQL capture connector, so the same configuration applies, but the setup steps look somewhat different.
This connector is available for use in the Flow web application. For local development or open-source workflows, ghcr.io/estuary/source-mariadb:dev
provides the latest version of the connector as a Docker image. You can also follow the link in your browser to see past image versions.
Prerequisites
To use this connector, you'll need a MariaDB database setup with the following.
- The
binlog_format
system variable must be set toROW
. - The binary log expiration period should be at least 7 days.
- This value may be set lower if necessary, but we discourage doing so as this may increase the likelihood of unrecoverable failures.
- A database user with appropriate permissions:
REPLICATION CLIENT
andREPLICATION SLAVE
privileges.- Permission to read the tables being captured.
- Permission to read from
information_schema
tables, if automatic discovery is used.
- If the table(s) to be captured include columns of type
DATETIME
, thetime_zone
system variable must be set to an IANA zone name or numerical offset or the capture configured with atimezone
to use by default.
To configure this connector to capture data from databases hosted on your internal network, you must set up SSH tunneling. For more specific instructions on setup, see configure connections with SSH tunneling.
Setup
Self Hosted MariaDB
To meet these requirements, do the following:
- Create the
flow_capture
user with replication permission, and the ability to read all tables.
The SELECT
permission can be restricted to just the tables that need to be
captured, but automatic discovery requires information_schema
access as well.
CREATE USER IF NOT EXISTS flow_capture IDENTIFIED BY 'secret'
GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'flow_capture';
GRANT SELECT ON *.* TO 'flow_capture';
- Configure the binary log to retain data for at least 7 days. We recommend 30 days where possible.
SET PERSIST binlog_expire_logs_seconds = 2592000;
- Configure the database's time zone. See below for more information.
SET PERSIST time_zone = '-05:00'
Azure Database for MariaDB
You can use this connector for MariaDB instances on Azure Database for MariaDB using the following setup instructions.
-
Allow connections to the database from the Estuary Flow IP address.
- Create a new firewall rule that grants access to the Estuary Flow IP addresses.
infoAlternatively, you can allow secure connections via SSH tunneling. To do so:
- Follow the guide to configure an SSH server for tunneling
- When you configure your connector as described in the configuration section above,
including the additional
networkTunnel
configuration to enable the SSH tunnel. See Connecting to endpoints on secure networks for additional details and a sample.
-
Set the
binlog_expire_logs_seconds
server perameter to2592000
. -
Using your preferred MariaDB client, create the
flow_capture
user with replication permission, and the ability to read all tables.The
SELECT
permission can be restricted to just the tables that need to be captured, but automatic discovery requiresinformation_schema
access as well.
Your username must be specified in the format username@servername
.
CREATE USER IF NOT EXISTS flow_capture IDENTIFIED BY 'secret'
GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'flow_capture';
GRANT SELECT ON *.* TO 'flow_capture';
- Note the instance's host under Server name, and the port under Connection Strings (usually
3306
). Together, you'll use the host:port as theaddress
property when you configure the connector.
Capturing from Read Replicas
This connector supports capturing from a read replica of your database, provided that binary logging is enabled on the replica and all other requirements are met.
Setting the MariaDB time zone
MariaDB's time_zone
server system variable is set to SYSTEM
by default.
Flow is not able to detect your time zone when it's set this way, so you must explicitly set the variable for your database.
If you intend to capture tables including columns of the type DATETIME
,
and time_zone
is set to SYSTEM
,
Flow won't be able to detect the time zone and convert the column to RFC3339 format.
To avoid this, you must explicitly set the time zone for your database.
You can:
-
Specify a numerical offset from UTC.
-
Specify a named timezone in IANA timezone format.
For example, if you're located in New Jersey, USA, you could set time_zone
to -05:00
or -04:00
, depending on the time of year.
Because this region observes daylight savings time, you'd be responsible for changing the offset.
Alternatively, you could set time_zone
to America/New_York
, and time changes would occur automatically.
If using IANA time zones, your database must include time zone tables. Learn more in the MariaDB docs.
If you are unable to set the time_zone
in the database and need to capture tables with DATETIME
columns, the capture can be configured to assume a time zone using the timezone
configuration property (see below). The timezone
configuration property can be set as a numerical offset or IANA timezone format.