StarRocks

StarRocks source connector

Read external data source data through StarRocks. The internal implementation of StarRocks source connector is obtains the query plan from the frontend (FE), delivers the query plan as a parameter to BE nodes, and then obtains data results from BE nodes.

Key featuresarrow-up-right

name
type
required
default value

node_urls

list

yes

-

username

string

yes

-

password

string

yes

-

database

string

yes

-

table

string

yes

-

scan_filter

string

no

-

schema

config

yes

-

request_tablet_size

int

no

Integer.MAX_VALUE

scan_connect_timeout_ms

int

no

30000

scan_query_timeout_sec

int

no

3600

scan_keep_alive_min

int

no

10

scan_batch_rows

int

no

1024

scan_mem_limit

long

no

2147483648

max_retries

int

no

3

scan.params.*

string

no

-

node_urls [list]arrow-up-right

StarRocks cluster address, the format is ["fe_ip:fe_http_port", ...]

username [string]arrow-up-right

StarRocks user username

password [string]arrow-up-right

StarRocks user password

database [string]arrow-up-right

The name of StarRocks database

table [string]arrow-up-right

The name of StarRocks table

scan_filter [string]arrow-up-right

Filter expression of the query, which is transparently transmitted to StarRocks. StarRocks uses this expression to complete source-side data filtering.

e.g.

schema [config]arrow-up-right

fields [Config]arrow-up-right

The schema of the starRocks that you want to generate

e.g.

request_tablet_size [int]arrow-up-right

The number of StarRocks Tablets corresponding to an Partition. The smaller this value is set, the more partitions will be generated. This will increase the parallelism on the engine side, but at the same time will cause greater pressure on StarRocks.

The following is an example to explain how to use request_tablet_size to controls the generation of partitions

scan_connect_timeout_ms [int]arrow-up-right

requests connection timeout sent to StarRocks

scan_query_timeout_sec [int]arrow-up-right

Query the timeout time of StarRocks, the default value is 1 hour, -1 means no timeout limit

scan_keep_alive_min [int]arrow-up-right

The keep-alive duration of the query task, in minutes. The default value is 10. we recommend that you set this parameter to a value greater than or equal to 5.

scan_batch_rows [int]arrow-up-right

The maximum number of data rows to read from BE at a time. Increasing this value reduces the number of connections established between engine and StarRocks and therefore mitigates overhead caused by network latency.

scan_mem_limit [long]arrow-up-right

The maximum memory space allowed for a single query in the BE node, in bytes. The default value is 2147483648 (2 GB).

max_retries [int]arrow-up-right

number of retry requests sent to StarRocks

scan.params. [string]arrow-up-right

The parameter of the scan data from be

Last updated