Http

Http source connector

Description

Used to read data from Http.

Key features

Supported DataSource Info

Datasource

Supported Versions

Http

universal

Source Options

Name

Type

Required

Default

Description

url

String

Yes

Http request url.

schema

Config

Http and Nexus data structure mapping

schema.fields

Config

The schema fields of upstream data

json_field

Config

This parameter helps you configure the schema,so this parameter must be used with schema.

pageing

Config

This parameter is used for paging queries

pageing.page_field

String

This parameter is used to specify the page field name in the request parameter

pageing.total_page_size

Int

This parameter is used to control the total number of pages

pageing.batch_size

Int

The batch size returned per request is used to determine whether to continue when the total number of pages is unknown

content_json

String

This parameter can get some json data.If you only need the data in the 'book' section, configure content_field = "$.store.book.*".

format

String

text

The format of upstream data, now only support json text, default text.

method

String

get

Http request method, only supports GET, POST method.

headers

Map

Http headers.

params

Map

Http params,the program will automatically add http header application/x-www-form-urlencoded.

body

String

Http body,the program will automatically add http header application/json,body is jsonbody.

poll_interval_millis

Int

Request http api interval(millis) in stream mode.

retry

Int

The max retry times if request http return to IOException.

retry_backoff_multiplier_ms

Int

100

The retry-backoff times(millis) multiplier if request http failed.

retry_backoff_max_ms

Int

10000

The maximum retry-backoff times(millis) if request http failed

enable_multi_lines

Boolean

false

connect_timeout_ms

Int

12000

Connection timeout setting, default 12s.

socket_timeout_ms

Int

60000

Socket timeout setting, default 60s.

common-options

Source plugin common parameters, please refer to Source Common Options for details

How to Create a Http Data Synchronization Jobs

env {
  parallelism = 1
  job.mode = "BATCH"
}

source {
  Http {
    result_table_name = "http"
    url = "http://mockserver:1080/example/http"
    method = "GET"
    format = "json"
    schema = {
      fields {
        c_map = "map<string, string>"
        c_array = "array<int>"
        c_string = string
        c_boolean = boolean
        c_tinyint = tinyint
        c_smallint = smallint
        c_int = int
        c_bigint = bigint
        c_float = float
        c_double = double
        c_bytes = bytes
        c_date = date
        c_decimal = "decimal(38, 18)"
        c_timestamp = timestamp
        c_row = {
          C_MAP = "map<string, string>"
          C_ARRAY = "array<int>"
          C_STRING = string
          C_BOOLEAN = boolean
          C_TINYINT = tinyint
          C_SMALLINT = smallint
          C_INT = int
          C_BIGINT = bigint
          C_FLOAT = float
          C_DOUBLE = double
          C_BYTES = bytes
          C_DATE = date
          C_DECIMAL = "decimal(38, 18)"
          C_TIMESTAMP = timestamp
        }
      }
    }
  }
}

# Console printing of the read Http data
sink {
  Console {
    parallelism = 1
  }
}

Parameter Interpretation

format

when you assign format is json, you should also assign schema option, for example:

upstream data is the following:

{
  "code": 200,
  "data": "get success",
  "success": true
}

you should assign schema as the following:


schema {
  fields {
    code = int
    data = string
    success = boolean
  }
}

connector will generate data as the following:

code

data

success

200

get success

true

when you assign format is text, connector will do nothing for upstream data, for example:

upstream data is the following:

{
  "code": 200,
  "data": "get success",
  "success": true
}

connector will generate data as the following:

content

{"code": 200, "data": "get success", "success": true}

content_json

This parameter can get some json data.If you only need the data in the 'book' section, configure content_field = "$.store.book.*".

If your return data looks something like this.

{
  "store": {
    "book": [
      {
        "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      {
        "category": "fiction",
        "author": "Evelyn Waugh",
        "title": "Sword of Honour",
        "price": 12.99
      }
    ],
    "bicycle": {
      "color": "red",
      "price": 19.95
    }
  },
  "expensive": 10
}

You can configure content_field = "$.store.book.*" and the result returned looks like this:

[
  {
    "category": "reference",
    "author": "Nigel Rees",
    "title": "Sayings of the Century",
    "price": 8.95
  },
  {
    "category": "fiction",
    "author": "Evelyn Waugh",
    "title": "Sword of Honour",
    "price": 12.99
  }
]

Then you can get the desired result with a simpler schema,like

Http {
  url = "http://mockserver:1080/contentjson/mock"
  method = "GET"
  format = "json"
  content_field = "$.store.book.*"
  schema = {
    fields {
      category = string
      author = string
      title = string
      price = string
    }
  }
}

Here is an example:

Test data can be found at this link mockserver-config.json
See this link for task configuration http_contentjson_to_assert.conf.

json_field

This parameter helps you configure the schema,so this parameter must be used with schema.

If your data looks something like this:

{ 
  "store": {
    "book": [
      {
        "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      {
        "category": "fiction",
        "author": "Evelyn Waugh",
        "title": "Sword of Honour",
        "price": 12.99
      }
    ],
    "bicycle": {
      "color": "red",
      "price": 19.95
    }
  },
  "expensive": 10
}

You can get the contents of 'book' by configuring the task as follows:

source {
  Http {
    url = "http://mockserver:1080/jsonpath/mock"
    method = "GET"
    format = "json"
    json_field = {
      category = "$.store.book[*].category"
      author = "$.store.book[*].author"
      title = "$.store.book[*].title"
      price = "$.store.book[*].price"
    }
    schema = {
      fields {
        category = string
        author = string
        title = string
        price = string
      }
    }
  }
}

Test data can be found at this link mockserver-config.json
See this link for task configuration http_jsonpath_to_assert.conf.

pageing

source {
    Http {
      url = "http://localhost:8080/mock/queryData"
      method = "GET"
      format = "json"
      params={
       page: "${page}"
      }
      content_field = "$.data.*"
      pageing={
       total_page_size=20
       page_field=page
       #when don't know the total_page_size use batch_size if read size<batch_size finish ,otherwise continue
       #batch_size=10
      }
      schema = {
        fields {
          name = string
          age = string
        }
      }
    }
}

PreviousHiveJdbc NextApache Iceberg

Last updated 11 months ago

Description​

Key features​

Supported DataSource Info​

Source Options​

How to Create a Http Data Synchronization Jobs​

Parameter Interpretation​

format​

content_json​

json_field​

pageing​

Description

Key features

Supported DataSource Info

Source Options

How to Create a Http Data Synchronization Jobs

Parameter Interpretation

format

content_json

json_field

pageing