Config

In Nexus, the configuration file is crucial, allowing users to customize their data synchronization requirements and fully harness the capabilities of Nexus. In the following section, I will guide you through how to configure this file.

The primary format of the config file is HOCON. For more details, you can refer to the HOCON guide. Additionally, Nexus supports the JSON format, but it's important to ensure the file name ends with .json.

We also provide support for SQL format configurations. For more information, please refer to the SQL configuration guide.

Example

Before proceeding, you can explore some config file examples available here in the config directory of the binary package.

Config File Structure

The structure of the configuration file is generally similar to the following example:

hocon

env {
  job.mode = "BATCH"
}

source {
  FakeSource {
    result_table_name = "fake"
    row.num = 100
    schema = {
      fields {
        name = "string"
        age = "int"
        card = "int"
      }
    }
  }
}

transform {
  Filter {
    source_table_name = "fake"
    result_table_name = "fake1"
    fields = [name, card]
  }
}

sink {
  Clickhouse {
    host = "clickhouse:8123"
    database = "default"
    table = "nexus_console"
    fields = ["name", "card"]
    username = "default"
    password = ""
    source_table_name = "fake1"
  }
}

Multi-line Support

In HOCON, multi-line strings are supported, enabling you to include long blocks of text without needing to manage newline characters or special formatting manually. This is done by enclosing the text within triple quotes ("""). For example:

var = """
Nexus is a
next-generation high-performance,
distributed, massive data integration tool.
"""
sql = """ select * from "table" """

json

{
  "env": {
    "job.mode": "batch"
  },
  "source": [
    {
      "plugin_name": "FakeSource",
      "result_table_name": "fake",
      "row.num": 100,
      "schema": {
        "fields": {
          "name": "string",
          "age": "int",
          "card": "int"
        }
      }
    }
  ],
  "transform": [
    {
      "plugin_name": "Filter",
      "source_table_name": "fake",
      "result_table_name": "fake1",
      "fields": ["name", "card"]
    }
  ],
  "sink": [
    {
      "plugin_name": "Clickhouse",
      "host": "clickhouse:8123",
      "database": "default",
      "table": "nexus_console",
      "fields": ["name", "card"],
      "username": "default",
      "password": "",
      "source_table_name": "fake1"
    }
  ]
}

As you can see, the configuration file consists of several sections: env, source, transform, and sink. Each module serves a specific function. Once you understand how these modules operate, you will see how Nexus functions as a whole.

env

This section is used to configure optional engine parameters. Whether you're using Zeta, Spark, or Flink, the corresponding optional parameters should be specified here.

Note that the parameters are organized by engine. For common parameters, you can configure them as before. For the Flink and Spark engines, refer to the JobEnvConfig for specific configuration rules.

source

The source module defines where Nexus should fetch data from, and this data will be used in subsequent steps. Multiple sources can be defined at once. The list of supported sources can be found in the Nexus source documentation. Each source has specific parameters to define how the data is fetched. Nexus also extracts common parameters, such as result_table_name, which specifies the name of the data generated by the current source for use by subsequent modules.

transform

After defining the data source, further processing may be required, which is where the transform module comes in. The key word here is "may," meaning the transform module is optional, and you can bypass it, going directly from source to sink as shown below.

env {
  job.mode = "BATCH"
}

source {
  FakeSource {
    result_table_name = "fake"
    row.num = 100
    schema = {
      fields {
        name = "string"
        age = "int"
        card = "int"
      }
    }
  }
}

sink {
  Clickhouse {
    host = "clickhouse:8123"
    database = "default"
    table = "nexus_console"
    fields = ["name", "age", "card"]
    username = "default"
    password = ""
    source_table_name = "fake"
  }
}

Similar to the source module, the transform module has specific parameters associated with each function. The supported transforms can be found in the Transform V2 documentation of Nexus.

sink

The primary goal of Nexus is to synchronize data from one location to another, so defining how and where the data is written is crucial. The sink module in Nexus allows you to perform this operation efficiently. While the sink and source modules are similar, the key difference lies in reading versus writing. For a list of supported sinks, please refer to the Supported Sinks documentation.

other

When defining multiple sources and sinks, it is important to understand which data is read by each sink and which data each transform module handles. This is managed through two key configurations: result_table_name and source_table_name. Each source module uses result_table_name to specify the name of the data generated, which can then be referenced by other transform and sink modules using source_table_name. This indicates which data should be read for processing. The transform module can utilize both result_table_name and source_table_name configurations simultaneously. However, in Nexus, if these parameters are not specified, the data generated by the last module of the previous node is used by default. This convention simplifies configurations, especially when there is only one source.

PreviousConcept NextSchema Feature

Last updated 8 months ago