Skip to main content

OracleDB

This connector captures data from OracleDB into Flow collections using Oracle Logminer.

It is available for use in the Flow web application. For local development or open-source workflows, ghcr.io/estuary/source-oracle:dev provides the latest version of the connector as a Docker image. You can also follow the link in your browser to see past image versions.

Prerequisites

  • Oracle 11g or above
  • Allow connections from Estuary Flow to your Oracle database (if they exist in separate VPCs)
  • Create a dedicated read-only Estuary Flow user with access to all tables needed for replication

Setup

Follow the steps below to set up the OracleDB connector.

Create a Dedicated User

Creating a dedicated database user with read-only access is recommended for better permission control and auditing.

  1. To create the user, run the following commands against your database:
CREATE USER estuary_flow_user IDENTIFIED BY <your_password_here>;
GRANT CREATE SESSION TO estuary_flow_user;
  1. Next, grant the user read-only access to the relevant tables. The simplest way is to grant read access to all tables in the schema as follows:
GRANT SELECT ANY TABLE TO estuary_flow_user;
  1. Alternatively, you can be more granular and grant access to specific tables in different schemas:
GRANT SELECT ON "<schema_a>"."<table_1>" TO estuary_flow_user;
GRANT SELECT ON "<schema_b>"."<table_2>" TO estuary_flow_user;
  1. Create a watermarks table:
CREATE TABLE estuary_flow_user.FLOW_WATERMARKS(SLOT varchar(1000) PRIMARY KEY, WATERMARK varchar(4000));
  1. Finally you need to grant the user access to use logminer, read metadata from the database and write to the watermarks table:
GRANT SELECT_CATALOG_ROLE TO estuary_flow_user;
GRANT EXECUTE_CATALOG_ROLE TO estuary_flow_user;
GRANT SELECT ON V$DATABASE TO estuary_flow_user;
GRANT SELECT ANY TABLE TO estuary_flow_user;
GRANT SELECT ON V$LOG TO estuary_flow_user;
GRANT LOGMINING TO estuary_flow_user;

GRANT INSERT, UPDATE ON estuary_flow_user.FLOW_WATERMARKS TO estuary_flow_user;
  1. Enable supplemental logging:

For normal instances use:

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

For Amazon RDS instances use:

BEGIN rdsadmin.rdsadmin_util.alter_supplemental_logging(p_action => 'ADD', p_type   => 'ALL'); end;
  1. Your database user should now be ready for use with Estuary Flow.

Include Schemas for Discovery

In your Oracle configuration, you can specify the schemas that Flow should look at when discovering tables. The schema names are case-sensitive. If the user does not have access to a certain schema, no tables from that schema will be discovered.

Configuration

You configure connectors either in the Flow web app, or by directly editing the catalog specification file. See connectors to learn more about using connectors. The values and specification sample below provide configuration details specific to the OracleDB source connector.

To allow secure connections via SSH tunneling:

Properties

Endpoint

PropertyTitleDescriptionTypeRequired/Default
/addressAddressThe host or host:port at which the database can be reached.stringRequired
/userUsernameThe database user to authenticate as.stringRequired
/passwordPasswordPassword for the specified database user.stringRequired
/databaseDatabaseLogical database name to capture from. Defaults to ORCL.stringRequired
/historyModeHistory ModeCapture change events without reducing them to a final state.booleanfalse
/advanced/skip_backfillsSkip BackfillsA comma-separated list of fully-qualified table names which should not be backfilled.string
/advanced/watermarksTableWatermarks TableThe name of the table used for watermark writes during backfills. Must be fully-qualified in '<schema>.table' form.string&lt;USER&gt;.FLOW_WATERMARKS
/advanced/backfill_chunk_sizeBackfill Chunk SizeThe number of rows which should be fetched from the database in a single backfill query.integer50000
/advanced/incremental_chunk_sizeIncremental Chunk SizeThe number of rows which should be fetched from the database in a single incremental query.integer10000
/advanced/incremental_scn_rangeIncremental SCN RangeThe SCN range captured at every iteration.integer50000
/advanced/dictionary_modeDictionary ModeHow should dictionaries be used in Logminer: one of online or extract. When using online mode schema changes to the table may break the capture but resource usage is limited. When using extract mode schema changes are handled gracefully but more resources of your database (including disk) are used by the process. Defaults to extract.stringextract
/advanced/discover_schemasDiscover SchemasIf this is specified only tables in the selected schema(s) will be automatically discovered. Omit all entries to discover tables from all schemas.string
/advanced/node_idNode IDNode ID for the capture. Each node in a replication cluster must have a unique 32-bit ID. The specific value doesn't matter so long as it is unique. If unset or zero the connector will pick a value.integer

Bindings

PropertyTitleDescriptionTypeRequired/Default
/namespaceNamespaceThe owner/schema of the table.stringRequired
/streamStreamTable name.stringRequired

Sample

captures:
${PREFIX}/${CAPTURE_NAME}:
endpoint:
connector:
image: ghcr.io/estuary/source-oracle:dev
config:
address: database-1.ukqdmzdqvvsz.us-east-2.rds.amazonaws.com:1521
user: "flow_capture"
password: secret
database: ORCL
historyMode: false
advanced:
incremental_scn_range: 50000
dictionary_mode: extract
networkTunnel:
sshForwarding:
privateKey: -----BEGIN RSA PRIVATE KEY-----\n...
sshEndpoint: ssh://ec2-user@19.220.21.33:22

bindings:
- resource:
namespace: ${TABLE_NAMESPACE}
stream: ${TABLE_NAME}
target: ${PREFIX}/${COLLECTION_NAME}

Dictionary Modes

Oracle writes redo log files using triplet object ID, data object ID and object versions to identify different objects in the database, rather than their name. This applies to table names as well as column names. When reading data from the redo log files using Logminer, a "dictionary" is used to translate the object identification data into user-facing names of those objects. When interacting with the database directly an online dictionary, which is essentially the latest dictionary that knows how to translate currently existing table and column names is used by the database and by Logminer, however when capturing historical data, it is possible that the names of these objects or even their identifiers have changed (due to an ALTER TABLE statement for example). In these instances the online dictionary will be insufficient for translating the object identifiers into names and Logminer will complain about a dictionary mismatch.

To resolve this issue, it is possible to extract a dictionary into the redo log files themselves, so that when there are schema changes, Logminer can automatically handle using the appropriate dictionary for the time period an event is from. This operation however uses CPU and RAM, as well as consuming disk over time.

Using Estuary's Oracle connector you get to choose which mode to operate it:

  1. To extract the dictionary into the redo log files, the extract mode can be used (this is the default mode). Be aware that this mode leads to more resource usage (CPU, RAM and disk).
  2. To always use the online dictionary, the online mode can be used. This mode is more efficient, but it cannot handle schema changes in tables, so only use this mode with caution and when table schemas are known not to change.

Incremental SCN Range and Events Rate

At every iteration, the connector fetches changes in a specific SCN (System Change Number) range, this is roughly equivalent to a specific time range. Depending on how many events happen on the captured tables in a database (by default, a 50,000 range is captured in each iteration), the advanced.incremental_scn_range option can be updated to fit your needs:

  1. If the database processes a large amount of events per unit of time, the connector and/or the database may experience resource shortages while trying to process the data. For example you may see the error PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT which indicates that the memory usage of the database instance has hit a limit. This can happen if too many events are being processed in one iteration. In these cases we recommend lowering the SCN range until the database and the connector are able to handle the load.
  2. If the database does not have many events per time unit, a higher value can help with faster processing, although this is usually not necessary.

Troubleshooting

  1. If you see the following error when trying to connect:
ORA-01950: no privileges on tablespace 'USERS'

The SQL command below may resolve the issue:

ALTER USER estuary_flow_user QUOTA UNLIMITED ON USERS;

Known Limitations

  1. Table and column names longer than 30 characters are not supported by Logminer, and thus they are also not supported by this connector.