SQL

SQL transform plugin

Description

Use SQL to transform given input row.

SQL transform use memory SQL engine, we can via SQL functions and ability of SQL engine to implement the transform task.

Options

name

type

required

default value

source_table_name

string

yes

result_table_name

string

yes

query

string

yes

source_table_name [string]

The source table name, the query SQL table name must match this field.

query [string]

The query SQL, it's a simple SQL supported base function and criteria filter operation. But the complex SQL unsupported yet, include: multi source table/rows JOIN and AGGREGATE operation and the like.

the query expression can be select [table_name.]column_a to query the column that named column_a. and the table name is optional. or select c_row.c_inner_row.column_b to query the inline struct column that named column_b within c_row column and c_inner_row column. In this query expression, can't have table name.

Example

The data read from source is a table like this:

name

age

Joy Ding

May Ding

Kin Dom

Joy Dom

We use SQL query to transform the source data like this:

transform {
  Sql {
    source_table_name = "fake"
    result_table_name = "fake1"
    query = "select id, concat(name, '_') as name, age+1 as age from fake where id>0"
  }
}

Then the data in result table fake1 will update to

name

age

Joy Ding_

May Ding_

Kin Dom_

Joy Dom_

Struct query

if your upstream data schema is like this:

source {
  FakeSource {
    result_table_name = "fake"
    row.num = 100
    string.template = ["innerQuery"]
    schema = {
      fields {
        name = "string"
        c_date = "date"
        c_row = {
          c_inner_row = {
            c_inner_int = "int"
            c_inner_string = "string"
            c_inner_timestamp = "timestamp"
            c_map_1 = "map<string, string>"
            c_map_2 = "map<string, map<string,string>>"
          }
          c_string = "string"
        }
      }
    }
  }
}

Those query all are valid:

select 
name,
c_date,
c_row,
c_row.c_inner_row,
c_row.c_string,
c_row.c_inner_row.c_inner_int,
c_row.c_inner_row.c_inner_string,
c_row.c_inner_row.c_inner_timestamp,
c_row.c_inner_row.c_map_1,
c_row.c_inner_row.c_map_1.some_key

But this query are not valid:

select 
c_row.c_inner_row.c_map_2.some_key.inner_map_key

The map must be the latest struct, can't query the nesting map.

Job Config Example

env {
  job.mode = "BATCH"
}

source {
  FakeSource {
    result_table_name = "fake"
    row.num = 100
    schema = {
      fields {
        id = "int"
        name = "string"
        age = "int"
      }
    }
  }
}

transform {
  Sql {
    source_table_name = "fake"
    result_table_name = "fake1"
    query = "select id, concat(name, '_') as name, age+1 as age from fake where id>0"
  }
}

sink {
  Console {
    source_table_name = "fake1"
  }
}

PreviousSQL Functions NextIntegrations

Last updated 9 months ago

Description​

Options​

source_table_name [string]​

query [string]​

Example​

Struct query​

Job Config Example​

Description

Options

source_table_name [string]

query [string]

Example

Struct query

Job Config Example