HiveJdbc
Last updated
Last updated
DBC Hive Source Connector
Definitely supports 3.1.3 and 3.1.2, other versions need to be tested.
supports query SQL and can achieve projection effect.
Read external data source data through JDBC.
Hive
Different dependency version has different driver class.
org.apache.hive.jdbc.HiveDriver
jdbc:hive2://localhost:10000/default
BOOLEAN
BOOLEAN
TINYINT SMALLINT
SHORT
INT INTEGER
INT
BIGINT
LONG
FLOAT
FLOAT
DOUBLE DOUBLE PRECISION
DOUBLE
DECIMAL(x,y) NUMERIC(x,y) (Get the designated column's specified column size.<38)
DECIMAL(x,y)
DECIMAL(x,y) NUMERIC(x,y) (Get the designated column's specified column size.>38)
DECIMAL(38,18)
CHAR VARCHAR STRING
STRING
DATE
DATE
DATETIME TIMESTAMP
TIMESTAMP
BINARY ARRAY INTERVAL MAP STRUCT UNIONTYPE
Not supported yet
url
String
Yes
-
The URL of the JDBC connection. Refer to a case: jdbc:hive2://localhost:10000/default
driver
String
Yes
-
The jdbc class name used to connect to the remote data source,
if you use Hive the value is org.apache.hive.jdbc.HiveDriver
.
user
String
No
-
Connection instance user name
password
String
No
-
Connection instance password
query
String
Yes
-
Query statement
connection_check_timeout_sec
Int
No
30
The time in seconds to wait for the database operation used to validate the connection to complete
partition_column
String
No
-
The column name for parallelism's partition, only support numeric type,Only support numeric type primary key, and only can config one column.
partition_lower_bound
BigDecimal
No
-
The partition_column min value for scan, if not set Nexus will query database get min value.
partition_upper_bound
BigDecimal
No
-
The partition_column max value for scan, if not set NExus will query database get max value.
partition_num
Int
No
job parallelism
The number of partition count, only support positive integer. default value is job parallelism
fetch_size
Int
No
0
For queries that return a large number of objects,you can configure the row fetch size used in the query toimprove performance by reducing the number database hits required to satisfy the selection criteria. Zero means use jdbc default value.
common-options
No
-
useKerberos
Boolean
No
no
Whether to enable Kerberos, default is false
kerberos_principal
String
No
-
When use kerberos, we should set kerberos principal such as 'test_user@xxx'.
kerberos_keytab_path
String
No
-
When use kerberos, we should set kerberos principal file path such as '/home/test/test_user.keytab' .
krb5_path
String
No
/etc/krb5.conf
When use kerberos, we should set krb5 path file path such as '/nexus/krb5.conf' or use the default path '/etc/krb5.conf '.
If partition_column is not set, it will run in single concurrency, and if partition_column is set, it will be executed in parallel according to the concurrency of tasks , When your shard read field is a large number type such as bigint( and above and the data is not evenly distributed, it is recommended to set the parallelism level to 1 to ensure that the data skew problem is resolved
This example queries type_bin 'table' 16 data in your test "database" in single parallel and queries all of its fields. You can also specify which fields to query for final output to the console.
Read your query table in parallel with the shard field you configured and the shard data You can do this if you want to read the whole table
It is more efficient to specify the data within the upper and lower bounds of the query It is more efficient to read your data source according to the upper and lower boundaries you configured
Source plugin common parameters, please refer to for details