Hive

Hive sink connector

Write data to Hive.

Key featuresarrow-up-right

By default, we use 2PC commit to ensure exactly-once

name
type
required
default value

table_name

string

yes

-

metastore_uri

string

yes

-

compress_codec

string

no

none

hdfs_site_path

string

no

-

hive_site_path

string

no

-

hive.hadoop.conf

Map

no

-

hive.hadoop.conf-path

string

no

-

krb5_path

string

no

/etc/krb5.conf

kerberos_principal

string

no

-

kerberos_keytab_path

string

no

-

abort_drop_partition_metadata

boolean

no

true

common-options

no

-

table_name [string]arrow-up-right

Target Hive table name eg: db1.table1, and if the source is multiple mode, you can use ${database_name}.${table_name} to generate the table name, it will replace the ${database_name} and ${table_name} with the value of the CatalogTable generate from the source.

metastore_uri [string]arrow-up-right

Hive metastore uri

hdfs_site_path [string]arrow-up-right

The path of hdfs-site.xml, used to load ha configuration of namenodes

hive_site_path [string]arrow-up-right

The path of hive-site.xml

hive.hadoop.conf [map]arrow-up-right

Properties in hadoop conf('core-site.xml', 'hdfs-site.xml', 'hive-site.xml')

hive.hadoop.conf-path [string]arrow-up-right

The specified loading path for the 'core-site.xml', 'hdfs-site.xml', 'hive-site.xml' files

krb5_path [string]arrow-up-right

The path of krb5.conf, used to authentication kerberos

The path of hive-site.xml, used to authentication hive metastore

kerberos_principal [string]arrow-up-right

The principal of kerberos

kerberos_keytab_path [string]arrow-up-right

The keytab path of kerberos

abort_drop_partition_metadata [list]arrow-up-right

Flag to decide whether to drop partition metadata from Hive Metastore during an abort operation. Note: this only affects the metadata in the metastore, the data in the partition will always be deleted(data generated during the synchronization process).

common optionsarrow-up-right

Sink plugin common parameters, please refer to Sink Common Options for details

We have a source table like this:

We need read data from the source table and write to another table:

The job config file can like this:

Run the case.

We have multiple source table like this:

We need read data from these source tables and write to another tables:

The job config file can like this:

Last updated