Selfuel Docs
  • Welcome to Selfuel Platform
    • Features
    • Capabilities
    • Target Audience
    • $150 Free Trial
  • Registration and Login
  • Platform UI
  • Stream Processing with Cortex
    • Cortex Quickstart Guide
    • Cortex Elements
      • Streams
      • Attributes
      • Mappings
        • 🚧Source Mapping Types
        • 🚧Sink Mapping Types
      • Node and Application Healthchecks
      • Nodes
        • Node Preview
        • Node Connectivites
        • Node Units
      • Expression Builder
        • 🚧Built-in Functions
      • Windows
        • Cron Window
        • Delay Window
        • Unique Event Window
        • First Event Window
        • Sliding Event Count Window
        • Tumbling Event Count Window
        • Session Window
        • Tumbling Event Sort Window
        • Sliding Time Window
        • Tumbling Time Window
        • Sliding Time and Event Count Window
      • Store and Cache
        • RDBMS
        • MongoDB
        • Redis
        • Elasticsearch
    • Applications
      • Applications Page
      • Creating Applications using Canvas
      • Connector Nodes Cluster
        • Source Nodes
          • CDC Source
          • Email Source
          • HTTP Source
          • HTTP Call Response Source
          • HTTP Service Source
          • Kafka Source
          • RabbitMQ Source
          • gRPC Source
          • JMS Source
          • Kafka Multi DC Source
          • JMS Source
          • AWS S3 Source
          • Google Pub-sub Source
          • AWS SQS Source
          • MQTT Source
          • Google Cloud Storage Source
          • HTTP SSE Source
          • WebSubHub Source
        • Sink Nodes
          • Email Sink
          • HTTP Sink
          • HTTP Service Response Sink
          • HTTP Call Sink
          • Kafka Sink
          • RabbitMQ Sink
          • gRPC Sink
          • JMS Sink
          • Kafka Multi DC Sink
          • AWS S3 Sink
          • Google Pub-sub Sink
          • AWS SQS Sink
          • MQTT Sink
          • Google Cloud Storage Sink
          • HTTP SSE Sink
          • WebSubHub Sink
      • Processing Nodes Cluster
        • Query
        • Join
        • Pattern
        • Sequence
        • Processor
        • 🚧On-demand Query
      • Buffer Nodes Cluster
        • Stream
        • Table
        • Window
        • Aggregation
        • Trigger
    • Run Applications
      • Run Applications Using Runners
      • Update Running Applications
      • Application Versioning
  • Data Integration with Nexus
    • Nexus Quickstart Guide
    • Nexus Elements
      • Concept
        • Config
        • Schema Feature
        • Speed Control
      • Connectors
        • Source
          • Source Connector Features
          • Source Common Options
          • AmazonDynamoDB
          • AmazonSqs
          • Cassandra
          • Clickhouse
          • CosFile
          • DB2
          • Doris
          • Easysearch
          • Elasticsearch
          • FakeSource
          • FtpFile
          • Github
          • Gitlab
          • GoogleSheets
          • Greenplum
          • Hbase
          • HdfsFile
          • Hive
          • HiveJdbc
          • Http
          • Apache Iceberg
          • InfluxDB
          • IoTDB
          • JDBC
          • Jira
          • Kingbase
          • Klaviyo
          • Kudu
          • Lemlist
          • Maxcompute
          • Milvus
          • MongoDB CDC
          • MongoDB
          • My Hours
          • MySQL CDC
          • MySQL
          • Neo4j
          • Notion
          • ObsFile
          • OceanBase
          • OneSignal
          • OpenMldb
          • Oracle CDC
          • Oracle
          • OssFile
          • OssJindoFile
          • Paimon
          • Persistiq
          • Phoenix
          • PostgreSQL CDC
          • PostgreSQL
          • Apache Pulsar
          • Rabbitmq
          • Redis
          • Redshift
          • RocketMQ
          • S3File
          • SftpFile
          • Sls
          • Snowflake
          • Socket
          • SQL Server CDC
          • SQL Server
          • StarRocks
          • TDengine
          • Vertica
          • Web3j
          • Kafka
        • Sink
          • Sink Connector Features
          • Sink Common Options
          • Activemq
          • AmazonDynamoDB
          • AmazonSqs
          • Assert
          • Cassandra
          • Clickhouse
          • ClickhouseFile
          • CosFile
          • DB2
          • DataHub
          • DingTalk
          • Doris
          • Druid
          • INFINI Easysearch
          • Elasticsearch
          • Email
          • Enterprise WeChat
          • Feishu
          • FtpFile
          • GoogleFirestore
          • Greenplum
          • Hbase
          • HdfsFile
          • Hive
          • Http
          • Hudi
          • Apache Iceberg
          • InfluxDB
          • IoTDB
          • JDBC
          • Kafka
          • Kingbase
          • Kudu
          • Maxcompute
          • Milvus
          • MongoDB
          • MySQL
          • Neo4j
          • ObsFile
          • OceanBase
          • Oracle
          • OssFile
          • OssJindoFile
          • Paimon
          • Phoenix
          • PostgreSql
          • Pulsar
          • Rabbitmq
          • Redis
          • Redshift
          • RocketMQ
          • S3Redshift
          • S3File
          • SelectDB Cloud
          • Sentry
          • SftpFile
          • Slack
          • Snowflake
          • Socket
          • SQL Server
          • StarRocks
          • TDengine
          • Tablestore
          • Vertica
        • Formats
          • Avro format
          • Canal Format
          • CDC Compatible Debezium-json
          • Debezium Format
          • Kafka source compatible kafka-connect-json
          • MaxWell Format
          • Ogg Format
        • Error Quick Reference Manual
      • Transform
        • Transform Common Options
        • Copy
        • FieldMapper
        • FilterRowKind
        • Filter
        • JsonPath
        • LLM
        • Replace
        • Split
        • SQL Functions
        • SQL
    • Integrations
      • Integrations Page
      • Creating Integrations Using Json
    • Run Integrations
      • Run Integrations Using Runners
      • Integration Versioning
  • Batch Processing/Storage with Maxim
    • Maxim Quickstart Guide
    • Maxim Elements
    • Queries
    • Run Queries
  • Orchestration with Routines
    • Routines Quickstart Guide
    • Routines Elements
    • Routines
    • Run Routines
  • Runners
    • Runners Page
    • Create a Runner to Run Applications
  • Security
    • Vaults
      • Vaults Page
      • Create Vaults
        • Runner-level Vaults
        • Application-level Vaults
      • Edit and Delete Vaults
      • 🚧Utilizing Vaults in Applications and Runners
    • Certificates
      • Certificates Page
      • 🚧Utilizing Certificates in Applications
      • 🟨Setting Up Security Settings
  • Monitoring Performance
    • Dashboard
    • Application Details
    • Runner Details
  • Logging
    • Log Types
  • Cost Management
    • SaaS
      • Pay-as-you-go
        • Hard Budget Cap
        • Soft Budget Cap
      • Subscriptions
    • On-prem
  • Organization Settings
    • General
    • Access Controls
      • User Roles and Privileges
    • Current Costs
    • Billing Addresses
    • Payment Accounts
    • Subscriptions
    • Pricing
    • Invoicing
  • User Settings
  • Troubleshooting
  • FAQs
Powered by GitBook
On this page
  • Description​
  • Key features​
  • Options​
  • Example​
  1. Data Integration with Nexus
  2. Nexus Elements
  3. Connectors
  4. Source

StarRocks

PreviousSQL ServerNextTDengine

Last updated 8 months ago

StarRocks source connector

Description

Read external data source data through StarRocks. The internal implementation of StarRocks source connector is obtains the query plan from the frontend (FE), delivers the query plan as a parameter to BE nodes, and then obtains data results from BE nodes.

Key features

Options

name
type
required
default value

node_urls

list

yes

-

username

string

yes

-

password

string

yes

-

database

string

yes

-

table

string

yes

-

scan_filter

string

no

-

schema

config

yes

-

request_tablet_size

int

no

Integer.MAX_VALUE

scan_connect_timeout_ms

int

no

30000

scan_query_timeout_sec

int

no

3600

scan_keep_alive_min

int

no

10

scan_batch_rows

int

no

1024

scan_mem_limit

long

no

2147483648

max_retries

int

no

3

scan.params.*

string

no

-

StarRocks cluster address, the format is ["fe_ip:fe_http_port", ...]

StarRocks user username

StarRocks user password

The name of StarRocks database

The name of StarRocks table

Filter expression of the query, which is transparently transmitted to StarRocks. StarRocks uses this expression to complete source-side data filtering.

e.g.

"tinyint_1 = 100"

The schema of the starRocks that you want to generate

e.g.

schema {
    fields {
        name = string
        age = int
    }
  }

The number of StarRocks Tablets corresponding to an Partition. The smaller this value is set, the more partitions will be generated. This will increase the parallelism on the engine side, but at the same time will cause greater pressure on StarRocks.

The following is an example to explain how to use request_tablet_size to controls the generation of partitions

the tablet distribution of StarRocks table in cluster as follower

be_node_1 tablet[1, 2, 3, 4, 5]
be_node_2 tablet[6, 7, 8, 9, 10]
be_node_3 tablet[11, 12, 13, 14, 15]

1.If not set request_tablet_size, there will no limit on the number of tablets in a single partition. The partitions will be generated as follows  

partition[0] read data of tablet[1, 2, 3, 4, 5] from be_node_1 
partition[1] read data of tablet[6, 7, 8, 9, 10] from be_node_2 
partition[2] read data of tablet[11, 12, 13, 14, 15] from be_node_3 

2.if set request_tablet_size=3, the limit on the number of tablets in a single partition is 3. The partitions will be generated as follows

partition[0] read data of tablet[1, 2, 3] from be_node_1 
partition[1] read data of tablet[4, 5] from be_node_1 
partition[2] read data of tablet[6, 7, 8] from be_node_2 
partition[3] read data of tablet[9, 10] from be_node_2 
partition[4] read data of tablet[11, 12, 13] from be_node_3 
partition[5] read data of tablet[14, 15] from be_node_3 

requests connection timeout sent to StarRocks

Query the timeout time of StarRocks, the default value is 1 hour, -1 means no timeout limit

The keep-alive duration of the query task, in minutes. The default value is 10. we recommend that you set this parameter to a value greater than or equal to 5.

The maximum number of data rows to read from BE at a time. Increasing this value reduces the number of connections established between engine and StarRocks and therefore mitigates overhead caused by network latency.

The maximum memory space allowed for a single query in the BE node, in bytes. The default value is 2147483648 (2 GB).

number of retry requests sent to StarRocks

The parameter of the scan data from be

source {
  StarRocks {
    nodeUrls = ["starrocks_e2e:8030"]
    username = root
    password = ""
    database = "test"
    table = "e2e_table_source"
    scan_batch_rows = 10
    max_retries = 3
    schema {
        fields {
           BIGINT_COL = BIGINT
           LARGEINT_COL = STRING
           SMALLINT_COL = SMALLINT
           TINYINT_COL = TINYINT
           BOOLEAN_COL = BOOLEAN
           DECIMAL_COL = "DECIMAL(20, 1)"
           DOUBLE_COL = DOUBLE
           FLOAT_COL = FLOAT
           INT_COL = INT
           CHAR_COL = STRING
           VARCHAR_11_COL = STRING
           STRING_COL = STRING
           DATETIME_COL = TIMESTAMP
           DATE_COL = DATE
        }
    }
    scan.params.scanner_thread_pool_thread_num = "3"
    
  }
}

node_urls [list]

username [string]

password [string]

database [string]

table [string]

scan_filter [string]

schema [config]

fields [Config]

request_tablet_size [int]

scan_connect_timeout_ms [int]

scan_query_timeout_sec [int]

scan_keep_alive_min [int]

scan_batch_rows [int]

scan_mem_limit [long]

max_retries [int]

scan.params. [string]

Example

​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​