Hive
Hive sink connector
Description
Write data to Hive.
Key features
By default, we use 2PC commit to ensure exactly-once
compress codec
lzo
Options
table_name
string
yes
-
metastore_uri
string
yes
-
compress_codec
string
no
none
hdfs_site_path
string
no
-
hive_site_path
string
no
-
hive.hadoop.conf
Map
no
-
hive.hadoop.conf-path
string
no
-
krb5_path
string
no
/etc/krb5.conf
kerberos_principal
string
no
-
kerberos_keytab_path
string
no
-
abort_drop_partition_metadata
boolean
no
true
common-options
no
-
table_name [string]
Target Hive table name eg: db1.table1, and if the source is multiple mode, you can use ${database_name}.${table_name} to generate the table name, it will replace the ${database_name} and ${table_name} with the value of the CatalogTable generate from the source.
metastore_uri [string]
Hive metastore uri
hdfs_site_path [string]
The path of hdfs-site.xml, used to load ha configuration of namenodes
hive_site_path [string]
The path of hive-site.xml
hive.hadoop.conf [map]
Properties in hadoop conf('core-site.xml', 'hdfs-site.xml', 'hive-site.xml')
hive.hadoop.conf-path [string]
The specified loading path for the 'core-site.xml', 'hdfs-site.xml', 'hive-site.xml' files
krb5_path [string]
The path of krb5.conf, used to authentication kerberos
The path of hive-site.xml, used to authentication hive metastore
kerberos_principal [string]
The principal of kerberos
kerberos_keytab_path [string]
The keytab path of kerberos
abort_drop_partition_metadata [list]
Flag to decide whether to drop partition metadata from Hive Metastore during an abort operation. Note: this only affects the metadata in the metastore, the data in the partition will always be deleted(data generated during the synchronization process).
common options
Sink plugin common parameters, please refer to Sink Common Options for details
Example
example 1
We have a source table like this:
We need read data from the source table and write to another table:
The job config file can like this:
Hive on s3
Hive on oss
Run the case.
example 2
We have multiple source table like this:
We need read data from these source tables and write to another tables:
The job config file can like this:
Last updated