Kafka

Kafka source connector

Key Features

Description

Source connector for Apache Kafka.

Source Options

Name

Type

Required

Default

Description

topic

String

Yes

Topic name(s) to read data from when the table is used as source. It also supports topic list for source by separating topic by comma like 'topic-1,topic-2'.

table_list

Map

Topic list config You can configure only one table_list and one topic at the same time

bootstrap.servers

String

Yes

Comma separated list of Kafka brokers.

pattern

Boolean

false

If pattern is set to true,the regular expression for a pattern of topic names to read from. All topics in clients with names that match the specified regular expression will be subscribed by the consumer.

consumer.group

String

Nexus-Consumer-Group

Kafka consumer group id, used to distinguish different consumer groups.

commit_on_checkpoint

Boolean

true

If true the consumer's offset will be periodically committed in the background.

kafka.config

Map

In addition to the above necessary parameters that must be specified by the Kafka consumer client, users can also specify multiple consumer client non-mandatory parameters, covering all consumer parameters specified in the official Kafka document.

schema

Config

The structure of the data, including field names and field types.

format

String

json

Data format. The default format is json. Optional text format, canal_json, debezium_json, ogg_json and avro.If you use json or text format. The default field separator is ", ". If you customize the delimiter, add the "field_delimiter" option.If you use canal format, please refer to Canal Format for details.If you use debezium format, please refer to Debezium Format for details.

format_error_handle_way

String

fail

The processing method of data format error. The default value is fail, and the optional value is (fail, skip). When fail is selected, data format error will block and an exception will be thrown. When skip is selected, data format error will skip this line data.

field_delimiter

String

Customize the field delimiter for data format.

start_mode

StartMode[earliest],[group_offsets],[latest],[specific_offsets],[timestamp]

group_offsets

The initial consumption pattern of consumers.

start_mode.offsets

Config

The offset required for consumption mode to be specific_offsets.

start_mode.timestamp

Long

The time required for consumption mode to be "timestamp".

partition-discovery.interval-millis

Long

-1

The interval for dynamically discovering topics and partitions.

common-options

Source plugin common parameters, please refer to Source Common Options for details

Task Example

Simple

This example reads the data of kafka's topic_1, topic_2, topic_3 and prints it to the client.

# Defining the runtime environment
env {
  parallelism = 2
  job.mode = "BATCH"
}
source {
  Kafka {
    schema = {
      fields {
        name = "string"
        age = "int"
      }
    }
    format = text
    field_delimiter = "#"
    topic = "topic_1,topic_2,topic_3"
    bootstrap.servers = "localhost:9092"
    kafka.config = {
      client.id = client_1
      max.poll.records = 500
      auto.offset.reset = "earliest"
      enable.auto.commit = "false"
    }
  }  
}
sink {
  Console {}
}

Regex Topic

source {
    Kafka {
          topic = ".*nexus*."
          pattern = "true" 
          bootstrap.servers = "localhost:9092"
          consumer.group = "nexus_group"
    }
}

AWS MSK SASL/SCRAM

Replace the following ${username} and ${password} with the configuration values in AWS MSK.

source {
    Kafka {
        topic = "nexus"
        bootstrap.servers = "xx.amazonaws.com.cn:9096,xxx.amazonaws.com.cn:9096,xxxx.amazonaws.com.cn:9096"
        consumer.group = "nexus_group"
        kafka.config = {
            security.protocol=SASL_SSL
            sasl.mechanism=SCRAM-SHA-512
            sasl.jaas.config="org.apache.kafka.common.security.scram.ScramLoginModule required username=\"username\" password=\"password\";"
            #security.protocol=SASL_SSL
            #sasl.mechanism=AWS_MSK_IAM
            #sasl.jaas.config="software.amazon.msk.auth.iam.IAMLoginModule required;"
            #sasl.client.callback.handler.class="software.amazon.msk.auth.iam.IAMClientCallbackHandler"
        }
    }
}

AWS MSK IAM

Please ensure the IAM policy have "kafka-cluster:Connect",. Like this:

"Effect": "Allow",
"Action": [
    "kafka-cluster:Connect",
    "kafka-cluster:AlterCluster",
    "kafka-cluster:DescribeCluster"
],

Source Config

source {
    Kafka {
        topic = "nexus"
        bootstrap.servers = "xx.amazonaws.com.cn:9098,xxx.amazonaws.com.cn:9098,xxxx.amazonaws.com.cn:9098"
        consumer.group = "nexus_group"
        kafka.config = {
            #security.protocol=SASL_SSL
            #sasl.mechanism=SCRAM-SHA-512
            #sasl.jaas.config="org.apache.kafka.common.security.scram.ScramLoginModule required username=\"username\" password=\"password\";"
            security.protocol=SASL_SSL
            sasl.mechanism=AWS_MSK_IAM
            sasl.jaas.config="software.amazon.msk.auth.iam.IAMLoginModule required;"
            sasl.client.callback.handler.class="software.amazon.msk.auth.iam.IAMClientCallbackHandler"
        }
    }
}

Kerberos Authentication Example

Source Config

source {
    Kafka {
        topic = "nexus"
        bootstrap.servers = "127.0.0.1:9092"
        consumer.group = "nexus_group"
        kafka.config = {
            security.protocol=SASL_PLAINTEXT
            sasl.kerberos.service.name=kafka
            sasl.mechanism=GSSAPI
            java.security.krb5.conf="/etc/krb5.conf"
            sasl.jaas.config="com.sun.security.auth.module.Krb5LoginModule required \n        useKeyTab=true \n        storeKey=true  \n        keyTab=\"/path/to/xxx.keytab\" \n        principal=\"[email protected]\";"
        }
    }
}

Multiple Kafka Source

This is written to the same pg table according to different formats and topics of parsing kafka Perform upsert operations based on the id


env {
  execution.parallelism = 1
  job.mode = "BATCH"
}

source {
  Kafka {
    bootstrap.servers = "kafka_e2e:9092"
    table_list = [
      {
        topic = "^test-ogg-sou.*"
        pattern = "true"
        consumer.group = "ogg_multi_group"
        start_mode = earliest
        schema = {
          fields {
            id = "int"
            name = "string"
            description = "string"
            weight = "string"
          }
        },
        format = ogg_json
      },
      {
        topic = "test-cdc_mds"
        start_mode = earliest
        schema = {
          fields {
            id = "int"
            name = "string"
            description = "string"
            weight = "string"
          }
        },
        format = canal_json
      }
    ]
  }
}

sink {
  Jdbc {
    driver = org.postgresql.Driver
    url = "jdbc:postgresql://postgresql:5432/test?loggerLevel=OFF"
    user = test
    password = test
    generate_sink_sql = true
    database = test
    table = public.sink
    primary_keys = ["id"]
  }
}

PreviousWeb3j NextSink

Last updated 1 year ago

Key Features​

Description​

Source Options​

Task Example​

Simple​

Regex Topic​

AWS MSK SASL/SCRAM​

AWS MSK IAM​

Kerberos Authentication Example​

Multiple Kafka Source​

Key Features

Description

Source Options

Task Example

Simple

Regex Topic

AWS MSK SASL/SCRAM

AWS MSK IAM

Kerberos Authentication Example

Multiple Kafka Source