Selfuel Docs
  • Welcome to Selfuel Platform
    • Features
    • Capabilities
    • Target Audience
    • $150 Free Trial
  • Registration and Login
  • Platform UI
  • Stream Processing with Cortex
    • Cortex Quickstart Guide
    • Cortex Elements
      • Streams
      • Attributes
      • Mappings
        • 🚧Source Mapping Types
        • 🚧Sink Mapping Types
      • Node and Application Healthchecks
      • Nodes
        • Node Preview
        • Node Connectivites
        • Node Units
      • Expression Builder
        • 🚧Built-in Functions
      • Windows
        • Cron Window
        • Delay Window
        • Unique Event Window
        • First Event Window
        • Sliding Event Count Window
        • Tumbling Event Count Window
        • Session Window
        • Tumbling Event Sort Window
        • Sliding Time Window
        • Tumbling Time Window
        • Sliding Time and Event Count Window
      • Store and Cache
        • RDBMS
        • MongoDB
        • Redis
        • Elasticsearch
    • Applications
      • Applications Page
      • Creating Applications using Canvas
      • Connector Nodes Cluster
        • Source Nodes
          • CDC Source
          • Email Source
          • HTTP Source
          • HTTP Call Response Source
          • HTTP Service Source
          • Kafka Source
          • RabbitMQ Source
          • gRPC Source
          • JMS Source
          • Kafka Multi DC Source
          • JMS Source
          • AWS S3 Source
          • Google Pub-sub Source
          • AWS SQS Source
          • MQTT Source
          • Google Cloud Storage Source
          • HTTP SSE Source
          • WebSubHub Source
        • Sink Nodes
          • Email Sink
          • HTTP Sink
          • HTTP Service Response Sink
          • HTTP Call Sink
          • Kafka Sink
          • RabbitMQ Sink
          • gRPC Sink
          • JMS Sink
          • Kafka Multi DC Sink
          • AWS S3 Sink
          • Google Pub-sub Sink
          • AWS SQS Sink
          • MQTT Sink
          • Google Cloud Storage Sink
          • HTTP SSE Sink
          • WebSubHub Sink
      • Processing Nodes Cluster
        • Query
        • Join
        • Pattern
        • Sequence
        • Processor
        • 🚧On-demand Query
      • Buffer Nodes Cluster
        • Stream
        • Table
        • Window
        • Aggregation
        • Trigger
    • Run Applications
      • Run Applications Using Runners
      • Update Running Applications
      • Application Versioning
  • Data Integration with Nexus
    • Nexus Quickstart Guide
    • Nexus Elements
      • Concept
        • Config
        • Schema Feature
        • Speed Control
      • Connectors
        • Source
          • Source Connector Features
          • Source Common Options
          • AmazonDynamoDB
          • AmazonSqs
          • Cassandra
          • Clickhouse
          • CosFile
          • DB2
          • Doris
          • Easysearch
          • Elasticsearch
          • FakeSource
          • FtpFile
          • Github
          • Gitlab
          • GoogleSheets
          • Greenplum
          • Hbase
          • HdfsFile
          • Hive
          • HiveJdbc
          • Http
          • Apache Iceberg
          • InfluxDB
          • IoTDB
          • JDBC
          • Jira
          • Kingbase
          • Klaviyo
          • Kudu
          • Lemlist
          • Maxcompute
          • Milvus
          • MongoDB CDC
          • MongoDB
          • My Hours
          • MySQL CDC
          • MySQL
          • Neo4j
          • Notion
          • ObsFile
          • OceanBase
          • OneSignal
          • OpenMldb
          • Oracle CDC
          • Oracle
          • OssFile
          • OssJindoFile
          • Paimon
          • Persistiq
          • Phoenix
          • PostgreSQL CDC
          • PostgreSQL
          • Apache Pulsar
          • Rabbitmq
          • Redis
          • Redshift
          • RocketMQ
          • S3File
          • SftpFile
          • Sls
          • Snowflake
          • Socket
          • SQL Server CDC
          • SQL Server
          • StarRocks
          • TDengine
          • Vertica
          • Web3j
          • Kafka
        • Sink
          • Sink Connector Features
          • Sink Common Options
          • Activemq
          • AmazonDynamoDB
          • AmazonSqs
          • Assert
          • Cassandra
          • Clickhouse
          • ClickhouseFile
          • CosFile
          • DB2
          • DataHub
          • DingTalk
          • Doris
          • Druid
          • INFINI Easysearch
          • Elasticsearch
          • Email
          • Enterprise WeChat
          • Feishu
          • FtpFile
          • GoogleFirestore
          • Greenplum
          • Hbase
          • HdfsFile
          • Hive
          • Http
          • Hudi
          • Apache Iceberg
          • InfluxDB
          • IoTDB
          • JDBC
          • Kafka
          • Kingbase
          • Kudu
          • Maxcompute
          • Milvus
          • MongoDB
          • MySQL
          • Neo4j
          • ObsFile
          • OceanBase
          • Oracle
          • OssFile
          • OssJindoFile
          • Paimon
          • Phoenix
          • PostgreSql
          • Pulsar
          • Rabbitmq
          • Redis
          • Redshift
          • RocketMQ
          • S3Redshift
          • S3File
          • SelectDB Cloud
          • Sentry
          • SftpFile
          • Slack
          • Snowflake
          • Socket
          • SQL Server
          • StarRocks
          • TDengine
          • Tablestore
          • Vertica
        • Formats
          • Avro format
          • Canal Format
          • CDC Compatible Debezium-json
          • Debezium Format
          • Kafka source compatible kafka-connect-json
          • MaxWell Format
          • Ogg Format
        • Error Quick Reference Manual
      • Transform
        • Transform Common Options
        • Copy
        • FieldMapper
        • FilterRowKind
        • Filter
        • JsonPath
        • LLM
        • Replace
        • Split
        • SQL Functions
        • SQL
    • Integrations
      • Integrations Page
      • Creating Integrations Using Json
    • Run Integrations
      • Run Integrations Using Runners
      • Integration Versioning
  • Batch Processing/Storage with Maxim
    • Maxim Quickstart Guide
    • Maxim Elements
    • Queries
    • Run Queries
  • Orchestration with Routines
    • Routines Quickstart Guide
    • Routines Elements
    • Routines
    • Run Routines
  • Runners
    • Runners Page
    • Create a Runner to Run Applications
  • Security
    • Vaults
      • Vaults Page
      • Create Vaults
        • Runner-level Vaults
        • Application-level Vaults
      • Edit and Delete Vaults
      • 🚧Utilizing Vaults in Applications and Runners
    • Certificates
      • Certificates Page
      • 🚧Utilizing Certificates in Applications
      • 🟨Setting Up Security Settings
  • Monitoring Performance
    • Dashboard
    • Application Details
    • Runner Details
  • Logging
    • Log Types
  • Cost Management
    • SaaS
      • Pay-as-you-go
        • Hard Budget Cap
        • Soft Budget Cap
      • Subscriptions
    • On-prem
  • Organization Settings
    • General
    • Access Controls
      • User Roles and Privileges
    • Current Costs
    • Billing Addresses
    • Payment Accounts
    • Subscriptions
    • Pricing
    • Invoicing
  • User Settings
  • Troubleshooting
  • FAQs
Powered by GitBook
On this page
  • Why We Need Schema
  • SchemaOptions
  • Table
  • schema_first
  • comment
  • Columns
  • PrimaryKey​
  • ConstraintKeys​
  • How to use schema​
  • When we should use it or not​
  1. Data Integration with Nexus
  2. Nexus Elements
  3. Concept

Schema Feature

Why We Need Schema

Some NoSQL databases or message queues do not have a strictly enforced schema, making it impossible to retrieve the schema through an API. In such cases, a schema needs to be defined to convert it to TableSchema and access the data.

SchemaOptions

You can use SchemaOptions to define the schema in Nexus. SchemaOptions includes various configurations to specify the schema, such as columns, primary keys, and constraint keys.

schema = {
    table = "database.schema.table"
    schema_first = false
    comment = "comment"
    columns = [
    ...
    ]
    primaryKey {
    ...
    }
    
    constraintKeys {
    ...
    }
}

Table

The table configuration specifies the full name of the table identifier to which the schema belongs. This includes the database, schema, and table name. Examples are database.schema.table, database.table, or just table.

schema_first

The default value is false.

If schema_first is set to true, the schema will be prioritized. This means that if table is set to "a.b", a will be interpreted as the schema rather than the database. This allows you to specify the table in the format "schema.table".

comment

This field allows you to add a comment to the CatalogTable to which the schema belongs.

Columns

The columns configuration is a list of settings used to define columns in the schema. Each column can include fields such as name, type, nullable, defaultValue, and comment.

columns = [
       {
          name = id
          type = bigint
          nullable = false
          columnLength = 20
          defaultValue = 0
          comment = "primary key id"
       }
]
Field
Required
Default Value
Description

name

Yes

-

The name of the column

type

Yes

-

The data type of the column

nullable

No

true

If the column can be nullable

columnLength

No

0

The length of the column which will be useful when you need to define the length

columnScale

No

-

The scale of the column which will be useful when you need to define the scale

defaultValue

No

null

The default value of the column

comment

No

null

The comment of the column

Data type
Value type in Java
Description

string

java.lang.String

string

boolean

java.lang.Boolean

boolean

tinyint

java.lang.Byte

-128 to 127 regular. 0 to 255 unsigned*. Specify the maximum number of digits in parentheses.

smallint

java.lang.Short

-32768 to 32767 General. 0 to 65535 unsigned*. Specify the maximum number of digits in parentheses.

int

java.lang.Integer

All numbers from -2,147,483,648 to 2,147,483,647 are allowed.

bigint

java.lang.Long

All numbers between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807 are allowed.

float

java.lang.Float

Float-precision numeric data from -1.79E+308 to 1.79E+308.

double

java.lang.Double

Double precision floating point. Handle most decimals.

decimal

java.math.BigDecimal

Double type stored as a string, allowing a fixed decimal point.

null

java.lang.Void

null

bytes

byte[]

bytes

date

java.time.LocalDate

Only the date is stored. From January 1, 0001 to December 31, 9999.

time

java.time.LocalTime

Only store time. Accuracy is 100 nanoseconds.

timestamp

java.time.LocalDateTime

Stores a unique number that is updated whenever a row is created or modified. timestamp is based on the internal clock and does not correspond to real time. There can only be one timestamp variable per table.

row

org.apache.seatunnel.api.table.type.SeaTunnelRow

Row type, can be nested.

map

java.util.Map

A Map is an object that maps keys to values. The key type includes int string boolean tinyint smallint bigint float double decimal date time timestamp null , and the value type includes int string boolean tinyint smallint bigint float double decimal date time timestamp null array map row.

array

ValueType[]

A array is a data type that represents a collection of elements. The element type includes int string boolean tinyint smallint bigint float double.

How to Declare Types Supported

Nexus provides a straightforward method for declaring basic types. The basic type keywords include string, boolean, tinyint, smallint, int, bigint, float, double, date, time, timestamp, and null. These keywords can be used directly as type declarations, and Nexus is case-insensitive with respect to these type keywords. For instance, to declare a field with an integer type, you can use int or "int".

The null type declaration must be enclosed in double quotes, like "null", to avoid confusion with HOCON's null type, which signifies an undefined object.

When declaring complex types such as decimal, array, map, and row, special considerations are required:

  • Decimal Type: Precision and scale must be specified, and the type definition should follow the format "decimal(precision, scale)". The type name must be enclosed in double quotes. For example, to declare a decimal field with precision 10 and scale 2, you would specify the field type as "decimal(10,2)".

  • Array Type: You need to specify the element type, using the format "array<T>", where T represents the element type. Possible element types include int, string, boolean, tinyint, smallint, bigint, float, and double. The type declaration must be enclosed in double quotes. For example, to declare a field with an array of integers, you specify the field type as "array<int>".

  • Map Type: You must specify both the key and value types. The map type definition follows the format "map<K,V>", where K represents the key type and V represents the value type. K can be any basic or decimal type, while V can be any type supported by Nexus. This declaration must also be enclosed in double quotes. For example, to declare a field with a map where the key type is string and the value type is int, you can declare it as "map<string, int>".

  • Row Type: You need to define a HOCON object to describe the fields and their types. Field types can include any type supported by Nexus. For example, to declare a row type with an integer field a and a string field b, you can specify it as {a = int, b = string}. This definition can also be enclosed in double quotes as a string, so "{a = int, b = string}" is equivalent to {a = int, b = string}. Since HOCON is compatible with JSON, {"a":"int", "b":"string"} is also equivalent to {a = int, b = string}.

Here is an example of complex type declarations:

schema {
  fields {
    c_decimal = "decimal(10, 2)"
    c_array = "array<int>"
    c_row = {
        c_int = int
        c_string = string
        c_row = {
            c_int = int
        }
    }
    # Hocon style declare row type in generic type
    map0 = "map<string, {c_int = int, c_string = string, c_row = {c_int = int}}>"
    # Json style declare row type in generic type
    map1 = "map<string, {\"c_int\":\"int\", \"c_string\":\"string\", \"c_row\":{\"c_int\":\"int\"}}>"
  }
}

Primary key is a config used to define the primary key in schema, it contains name, columns field.

Field
Required
Default Value
Description

name

Yes

-

The name of the primaryKey

columns

Yes

-

The column list in the primaryKey

Constraint keys is a list of config used to define the constraint keys in schema, it contains constraintName, constraintType, constraintColumns field.

constraintKeys = [
      {
         constraintName = "id_index"
         constraintType = KEY
         constraintColumns = [
            {
                columnName = "id"
                sortType = ASC
            }
         ]
      },
   ]
Field
Required
Default Value
Description

constraintName

Yes

-

The name of the constraintKey

constraintType

No

KEY

The type of the constraintKey

constraintColumns

Yes

-

The column list in the primaryKey, each column should contains constraintType and sortType, sortType support ASC and DESC, default is ASC

ConstraintType
Description

INDEX_KEY

key

UNIQUE_KEY

unique key

Recommended

source {
  FakeSource {
    parallelism = 2
    result_table_name = "fake"
    row.num = 16
    schema {
        table = "FakeDatabase.FakeTable"
        columns = [
           {
              name = id
              type = bigint
              nullable = false
              defaultValue = 0
              comment = "primary key id"
           },
           {
              name = name
              type = "string"
              nullable = true
              comment = "name"
           },
           {
              name = age
              type = int
              nullable = true
              comment = "age"
           }
       ]
       primaryKey {
          name = "id"
          columnNames = [id]
       }
       constraintKeys = [
          {
             constraintName = "unique_name"
             constraintType = UNIQUE_KEY
             constraintColumns = [
                {
                    columnName = "name"
                    sortType = ASC
                }
             ]
          },
       ]
      }
    }
}

If there is a schema configuration project in Options,the connector can then customize the schema. Like Fake Pulsar Http source connector etc.

PreviousConfigNextSpeed Control

Last updated 8 months ago

What type supported at now

PrimaryKey

ConstraintKeys

What constraintType supported at now

How to use schema

When we should use it or not

​
​
​
​
​
​