Schema Feature
Why We Need Schema
Some NoSQL databases or message queues do not have a strictly enforced schema, making it impossible to retrieve the schema through an API. In such cases, a schema needs to be defined to convert it to TableSchema
and access the data.
SchemaOptions
You can use SchemaOptions
to define the schema in Nexus. SchemaOptions
includes various configurations to specify the schema, such as columns, primary keys, and constraint keys.
Table
The table
configuration specifies the full name of the table identifier to which the schema belongs. This includes the database, schema, and table name. Examples are database.schema.table
, database.table
, or just table
.
schema_first
The default value is false
.
If schema_first
is set to true
, the schema will be prioritized. This means that if table
is set to "a.b"
, a
will be interpreted as the schema rather than the database. This allows you to specify the table in the format "schema.table"
.
comment
This field allows you to add a comment to the CatalogTable
to which the schema belongs.
Columns
The columns
configuration is a list of settings used to define columns in the schema. Each column can include fields such as name
, type
, nullable
, defaultValue
, and comment
.
Field
Required
Default Value
Description
name
Yes
-
The name of the column
type
Yes
-
The data type of the column
nullable
No
true
If the column can be nullable
columnLength
No
0
The length of the column which will be useful when you need to define the length
columnScale
No
-
The scale of the column which will be useful when you need to define the scale
defaultValue
No
null
The default value of the column
comment
No
null
The comment of the column
Data type
Value type in Java
Description
string
java.lang.String
string
boolean
java.lang.Boolean
boolean
tinyint
java.lang.Byte
-128 to 127 regular. 0 to 255 unsigned*. Specify the maximum number of digits in parentheses.
smallint
java.lang.Short
-32768 to 32767 General. 0 to 65535 unsigned*. Specify the maximum number of digits in parentheses.
int
java.lang.Integer
All numbers from -2,147,483,648 to 2,147,483,647 are allowed.
bigint
java.lang.Long
All numbers between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807 are allowed.
float
java.lang.Float
Float-precision numeric data from -1.79E+308 to 1.79E+308.
double
java.lang.Double
Double precision floating point. Handle most decimals.
decimal
java.math.BigDecimal
Double type stored as a string, allowing a fixed decimal point.
null
java.lang.Void
null
bytes
byte[]
bytes
date
java.time.LocalDate
Only the date is stored. From January 1, 0001 to December 31, 9999.
time
java.time.LocalTime
Only store time. Accuracy is 100 nanoseconds.
timestamp
java.time.LocalDateTime
Stores a unique number that is updated whenever a row is created or modified. timestamp is based on the internal clock and does not correspond to real time. There can only be one timestamp variable per table.
row
org.apache.seatunnel.api.table.type.SeaTunnelRow
Row type, can be nested.
map
java.util.Map
A Map is an object that maps keys to values. The key type includes int
string
boolean
tinyint
smallint
bigint
float
double
decimal
date
time
timestamp
null
, and the value type includes int
string
boolean
tinyint
smallint
bigint
float
double
decimal
date
time
timestamp
null
array
map
row
.
array
ValueType[]
A array is a data type that represents a collection of elements. The element type includes int
string
boolean
tinyint
smallint
bigint
float
double
.
How to Declare Types Supported
Nexus provides a straightforward method for declaring basic types. The basic type keywords include string
, boolean
, tinyint
, smallint
, int
, bigint
, float
, double
, date
, time
, timestamp
, and null
. These keywords can be used directly as type declarations, and Nexus is case-insensitive with respect to these type keywords. For instance, to declare a field with an integer type, you can use int
or "int"
.
The null
type declaration must be enclosed in double quotes, like "null"
, to avoid confusion with HOCON's null type, which signifies an undefined object.
When declaring complex types such as decimal
, array
, map
, and row
, special considerations are required:
Decimal Type: Precision and scale must be specified, and the type definition should follow the format
"decimal(precision, scale)"
. The type name must be enclosed in double quotes. For example, to declare a decimal field with precision 10 and scale 2, you would specify the field type as"decimal(10,2)"
.Array Type: You need to specify the element type, using the format
"array<T>"
, whereT
represents the element type. Possible element types includeint
,string
,boolean
,tinyint
,smallint
,bigint
,float
, anddouble
. The type declaration must be enclosed in double quotes. For example, to declare a field with an array of integers, you specify the field type as"array<int>"
.Map Type: You must specify both the key and value types. The map type definition follows the format
"map<K,V>"
, whereK
represents the key type andV
represents the value type.K
can be any basic or decimal type, whileV
can be any type supported by Nexus. This declaration must also be enclosed in double quotes. For example, to declare a field with a map where the key type isstring
and the value type isint
, you can declare it as"map<string, int>"
.Row Type: You need to define a HOCON object to describe the fields and their types. Field types can include any type supported by Nexus. For example, to declare a row type with an integer field
a
and a string fieldb
, you can specify it as{a = int, b = string}
. This definition can also be enclosed in double quotes as a string, so"{a = int, b = string}"
is equivalent to{a = int, b = string}
. Since HOCON is compatible with JSON,{"a":"int", "b":"string"}
is also equivalent to{a = int, b = string}
.
Here is an example of complex type declarations:
Primary key is a config used to define the primary key in schema, it contains name, columns field.
Field
Required
Default Value
Description
name
Yes
-
The name of the primaryKey
columns
Yes
-
The column list in the primaryKey
Constraint keys is a list of config used to define the constraint keys in schema, it contains constraintName, constraintType, constraintColumns field.
Field
Required
Default Value
Description
constraintName
Yes
-
The name of the constraintKey
constraintType
No
KEY
The type of the constraintKey
constraintColumns
Yes
-
The column list in the primaryKey, each column should contains constraintType and sortType, sortType support ASC and DESC, default is ASC
ConstraintType
Description
INDEX_KEY
key
UNIQUE_KEY
unique key
Recommended
If there is a schema
configuration project in Options,the connector can then customize the schema. Like Fake
Pulsar
Http
source connector etc.
Last updated