Schema Feature

Why We Need Schema

Some NoSQL databases or message queues do not have a strictly enforced schema, making it impossible to retrieve the schema through an API. In such cases, a schema needs to be defined to convert it to TableSchema and access the data.

SchemaOptions

You can use SchemaOptions to define the schema in Nexus. SchemaOptions includes various configurations to specify the schema, such as columns, primary keys, and constraint keys.

schema = {
    table = "database.schema.table"
    schema_first = false
    comment = "comment"
    columns = [
    ...
    ]
    primaryKey {
    ...
    }
    
    constraintKeys {
    ...
    }
}

Table

The table configuration specifies the full name of the table identifier to which the schema belongs. This includes the database, schema, and table name. Examples are database.schema.table, database.table, or just table.

schema_first

The default value is false.

If schema_first is set to true, the schema will be prioritized. This means that if table is set to "a.b", a will be interpreted as the schema rather than the database. This allows you to specify the table in the format "schema.table".

comment

This field allows you to add a comment to the CatalogTable to which the schema belongs.

Columns

The columns configuration is a list of settings used to define columns in the schema. Each column can include fields such as name, type, nullable, defaultValue, and comment.

columns = [
       {
          name = id
          type = bigint
          nullable = false
          columnLength = 20
          defaultValue = 0
          comment = "primary key id"
       }
]

Field

Required

Default Value

Description

name

Yes

The name of the column

type

Yes

The data type of the column

nullable

true

If the column can be nullable

columnLength

The length of the column which will be useful when you need to define the length

columnScale

The scale of the column which will be useful when you need to define the scale

defaultValue

null

The default value of the column

comment

null

The comment of the column

What type supported at now

Data type

Value type in Java

Description

string

java.lang.String

string

boolean

java.lang.Boolean

boolean

tinyint

java.lang.Byte

-128 to 127 regular. 0 to 255 unsigned*. Specify the maximum number of digits in parentheses.

smallint

java.lang.Short

-32768 to 32767 General. 0 to 65535 unsigned*. Specify the maximum number of digits in parentheses.

int

java.lang.Integer

All numbers from -2,147,483,648 to 2,147,483,647 are allowed.

bigint

java.lang.Long

All numbers between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807 are allowed.

float

java.lang.Float

Float-precision numeric data from -1.79E+308 to 1.79E+308.

double

java.lang.Double

Double precision floating point. Handle most decimals.

decimal

java.math.BigDecimal

Double type stored as a string, allowing a fixed decimal point.

null

java.lang.Void

null

bytes

byte[]

bytes

date

java.time.LocalDate

Only the date is stored. From January 1, 0001 to December 31, 9999.

time

java.time.LocalTime

Only store time. Accuracy is 100 nanoseconds.

timestamp

java.time.LocalDateTime

Stores a unique number that is updated whenever a row is created or modified. timestamp is based on the internal clock and does not correspond to real time. There can only be one timestamp variable per table.

row

org.apache.seatunnel.api.table.type.SeaTunnelRow

Row type, can be nested.

map

java.util.Map

A Map is an object that maps keys to values. The key type includes int string boolean tinyint smallint bigint float double decimal date time timestamp null , and the value type includes int string boolean tinyint smallint bigint float double decimal date time timestamp null array map row.

array

ValueType[]

A array is a data type that represents a collection of elements. The element type includes int string boolean tinyint smallint bigint float double.

How to Declare Types Supported

Nexus provides a straightforward method for declaring basic types. The basic type keywords include string, boolean, tinyint, smallint, int, bigint, float, double, date, time, timestamp, and null. These keywords can be used directly as type declarations, and Nexus is case-insensitive with respect to these type keywords. For instance, to declare a field with an integer type, you can use int or "int".

The null type declaration must be enclosed in double quotes, like "null", to avoid confusion with HOCON's null type, which signifies an undefined object.

When declaring complex types such as decimal, array, map, and row, special considerations are required:

Decimal Type: Precision and scale must be specified, and the type definition should follow the format "decimal(precision, scale)". The type name must be enclosed in double quotes. For example, to declare a decimal field with precision 10 and scale 2, you would specify the field type as "decimal(10,2)".
Array Type: You need to specify the element type, using the format "array<T>", where T represents the element type. Possible element types include int, string, boolean, tinyint, smallint, bigint, float, and double. The type declaration must be enclosed in double quotes. For example, to declare a field with an array of integers, you specify the field type as "array<int>".
Map Type: You must specify both the key and value types. The map type definition follows the format "map<K,V>", where K represents the key type and V represents the value type. K can be any basic or decimal type, while V can be any type supported by Nexus. This declaration must also be enclosed in double quotes. For example, to declare a field with a map where the key type is string and the value type is int, you can declare it as "map<string, int>".
Row Type: You need to define a HOCON object to describe the fields and their types. Field types can include any type supported by Nexus. For example, to declare a row type with an integer field a and a string field b, you can specify it as {a = int, b = string}. This definition can also be enclosed in double quotes as a string, so "{a = int, b = string}" is equivalent to {a = int, b = string}. Since HOCON is compatible with JSON, {"a":"int", "b":"string"} is also equivalent to {a = int, b = string}.

Here is an example of complex type declarations:

schema {
  fields {
    c_decimal = "decimal(10, 2)"
    c_array = "array<int>"
    c_row = {
        c_int = int
        c_string = string
        c_row = {
            c_int = int
        }
    }
    # Hocon style declare row type in generic type
    map0 = "map<string, {c_int = int, c_string = string, c_row = {c_int = int}}>"
    # Json style declare row type in generic type
    map1 = "map<string, {\"c_int\":\"int\", \"c_string\":\"string\", \"c_row\":{\"c_int\":\"int\"}}>"
  }
}

PrimaryKey

Primary key is a config used to define the primary key in schema, it contains name, columns field.

Field

Required

Default Value

Description

name

Yes

The name of the primaryKey

columns

Yes

The column list in the primaryKey

ConstraintKeys

Constraint keys is a list of config used to define the constraint keys in schema, it contains constraintName, constraintType, constraintColumns field.

constraintKeys = [
      {
         constraintName = "id_index"
         constraintType = KEY
         constraintColumns = [
            {
                columnName = "id"
                sortType = ASC
            }
         ]
      },
   ]

Field

Required

Default Value

Description

constraintName

Yes

The name of the constraintKey

constraintType

KEY

The type of the constraintKey

constraintColumns

Yes

The column list in the primaryKey, each column should contains constraintType and sortType, sortType support ASC and DESC, default is ASC

What constraintType supported at now

ConstraintType

Description

INDEX_KEY

key

UNIQUE_KEY

unique key

How to use schema

source {
  FakeSource {
    parallelism = 2
    result_table_name = "fake"
    row.num = 16
    schema {
        table = "FakeDatabase.FakeTable"
        columns = [
           {
              name = id
              type = bigint
              nullable = false
              defaultValue = 0
              comment = "primary key id"
           },
           {
              name = name
              type = "string"
              nullable = true
              comment = "name"
           },
           {
              name = age
              type = int
              nullable = true
              comment = "age"
           }
       ]
       primaryKey {
          name = "id"
          columnNames = [id]
       }
       constraintKeys = [
          {
             constraintName = "unique_name"
             constraintType = UNIQUE_KEY
             constraintColumns = [
                {
                    columnName = "name"
                    sortType = ASC
                }
             ]
          },
       ]
      }
    }
}

When we should use it or not

If there is a schema configuration project in Options,the connector can then customize the schema. Like Fake Pulsar Http source connector etc.

PreviousConfig NextSpeed Control

Last updated 11 months ago

Why We Need Schema

SchemaOptions

Table

schema_first

comment

Columns

PrimaryKey​

ConstraintKeys​

How to use schema​

Recommended

When we should use it or not​

PrimaryKey

ConstraintKeys

How to use schema

When we should use it or not