Oracle CDC
Last updated
Last updated
Oracle CDC source connector
The Oracle CDC connector allows for reading snapshot data and incremental data from Oracle database. This document describes how to set up the Oracle CDC connector to run SQL queries against Oracle databases.
The Debezium Oracle connector does not rely on the continuous mining option. The connector is responsible for detecting log switches and adjusting the logs that are mined automatically, which the continuous mining option did for you automatically. So, you can not set this property named log.mining.continuous.mine
in the debezium.
To enable Oracle CDC (Change Data Capture) using Logminer in Nexus, which is a built-in tool provided by Oracle, follow the steps below:
The operating system creates an empty file directory to store Oracle archived logs and user tablespaces.
Login as admin and enable Oracle archived logs.
Login as admin and create an account called logminer_user with the password "oracle", and grant it privileges to read tables and logs.
The operating system creates an empty file directory to store Oracle archived logs and user tablespaces.
Login as admin and enable logging
Executing in CDB
Creating debeziume account
Operating in CDB
Operating in PDB
Operating in CDB
INTEGER
INT
FLOAT
DECIMAL(38, 18)
NUMBER(precision <= 9, scale == 0)
INT
NUMBER(9 < precision <= 18, scale == 0)
BIGINT
NUMBER(18 < precision, scale == 0)
DECIMAL(38, 0)
NUMBER(precision == 0, scale == 0)
DECIMAL(38, 18)
NUMBER(scale != 0)
DECIMAL(38, 18)
BINARY_DOUBLE
DOUBLE
BINARY_FLOAT REAL
FLOAT
CHAR NCHAR NVARCHAR2 VARCHAR2 LONG ROWID NCLOB CLOB
STRING
DATE
DATE
TIMESTAMP TIMESTAMP WITH LOCAL TIME ZONE
TIMESTAMP
BLOB RAW LONG RAW BFILE
BYTES
base-url
String
Yes
-
The URL of the JDBC connection. Refer to a case: idbc:oracle:thin:datasource01:1523:xe
.
username
String
Yes
-
Name of the database to use when connecting to the database server.
password
String
Yes
-
Password to use when connecting to the database server.
database-names
List
No
-
Database name of the database to monitor.
schema-names
List
No
-
Schema name of the database to monitor.
table-names
List
Yes
-
Table name of the database to monitor. The table name needs to include the database name, for example: database_name.table_name
table-names-config
List
No
-
Table config list. for example: [{"table": "db1.schema1.table1","primaryKeys":["key1"]}]
startup.mode
Enum
No
INITIAL
Optional startup mode for Oracle CDC consumer, valid enumerations are initial
, earliest
, latest
and specific
.
initial
: Synchronize historical data at startup, and then synchronize incremental data.
earliest
: Startup from the earliest offset possible.
latest
: Startup from the latest offset.
specific
: Startup from user-supplied specific offsets.
startup.specific-offset.file
String
No
-
Start from the specified binlog file name. Note, This option is required when the startup.mode
option used specific
.
startup.specific-offset.pos
Long
No
-
Start from the specified binlog file position. Note, This option is required when the startup.mode
option used specific
.
stop.mode
Enum
No
NEVER
Optional stop mode for Oracle CDC consumer, valid enumerations are never
, latest
or specific
.
never
: Real-time job don't stop the source.
latest
: Stop from the latest offset.
specific
: Stop from user-supplied specific offset.
stop.specific-offset.file
String
No
-
Stop from the specified binlog file name. Note, This option is required when the stop.mode
option used specific
.
stop.specific-offset.pos
Long
No
-
Stop from the specified binlog file position. Note, This option is required when the stop.mode
option used specific
.
snapshot.split.size
Integer
No
8096
The split size (number of rows) of table snapshot, captured tables are split into multiple splits when read the snapshot of table.
snapshot.fetch.size
Integer
No
1024
The maximum fetch size for per poll when read table snapshot.
server-time-zone
String
No
UTC
The session time zone in database server. If not set, then ZoneId.systemDefault() is used to determine the server time zone.
connect.timeout.ms
Duration
No
30000
The maximum time that the connector should wait after trying to connect to the database server before timing out.
connect.max-retries
Integer
No
3
The max retry times that the connector should retry to build database server connection.
connection.pool.size
Integer
No
20
The jdbc connection pool size.
chunk-key.even-distribution.factor.upper-bound
Double
No
100
The upper bound of the chunk key distribution factor. This factor is used to determine whether the table data is evenly distributed. If the distribution factor is calculated to be less than or equal to this upper bound (i.e., (MAX(id) - MIN(id) + 1) / row count), the table chunks would be optimized for even distribution. Otherwise, if the distribution factor is greater, the table will be considered as unevenly distributed and the sampling-based sharding strategy will be used if the estimated shard count exceeds the value specified by sample-sharding.threshold
. The default value is 100.0.
chunk-key.even-distribution.factor.lower-bound
Double
No
0.05
The lower bound of the chunk key distribution factor. This factor is used to determine whether the table data is evenly distributed. If the distribution factor is calculated to be greater than or equal to this lower bound (i.e., (MAX(id) - MIN(id) + 1) / row count), the table chunks would be optimized for even distribution. Otherwise, if the distribution factor is less, the table will be considered as unevenly distributed and the sampling-based sharding strategy will be used if the estimated shard count exceeds the value specified by sample-sharding.threshold
. The default value is 0.05.
sample-sharding.threshold
Integer
No
1000
This configuration specifies the threshold of estimated shard count to trigger the sample sharding strategy. When the distribution factor is outside the bounds specified by chunk-key.even-distribution.factor.upper-bound
and chunk-key.even-distribution.factor.lower-bound
, and the estimated shard count (calculated as approximate row count / chunk size) exceeds this threshold, the sample sharding strategy will be used. This can help to handle large datasets more efficiently. The default value is 1000 shards.
inverse-sampling.rate
Integer
No
1000
The inverse of the sampling rate used in the sample sharding strategy. For example, if this value is set to 1000, it means a 1/1000 sampling rate is applied during the sampling process. This option provides flexibility in controlling the granularity of the sampling, thus affecting the final number of shards. It's especially useful when dealing with very large datasets where a lower sampling rate is preferred. The default value is 1000.
exactly_once
Boolean
No
false
Enable exactly once semantic.
use_select_count
Boolean
No
false
Use select count for table count rather then other methods in full stage.In this scenario, select count directly is used when it is faster to update statistics using sql from analysis table
skip_analyze
Boolean
No
false
Skip the analysis of table count in full stage.In this scenario, you schedule analysis table sql to update related table statistics periodically or your table data does not change frequently
format
Enum
No
DEFAULT
Optional output format for Oracle CDC, valid enumerations are DEFAULT
、COMPATIBLE_DEBEZIUM_JSON
.
debezium
Config
No
-
common-options
no
-
Support multi-table reading
Use the select count(*) instead of analysis table for count table rows in full stage
Use the select NUM_ROWS from all_tables for the table rows but skip the analyze table.
Enabling Logminer without CDB (Container Database) mode.
Oracle 11g is not supported
Grant privileges only to the tables that need to be collected
To enable Logminer in Oracle with CDB (Container Database) + PDB (Pluggable Database) mode, follow the steps below:
Pass-through to Debezium Embedded Engine which is used to capture data changes from Oracle server.
Source plugin common parameters, please refer to for details
Must be used with kafka connector sink, see for details