Aggregation

Aggregation Nodes in Cortex provide a powerful way to incrementally aggregate events across various time granularities, allowing for real-time analytics with millisecond precision.

  • Every Aggregation Node is defined by a specific schema for a window definition. (cf. Step 1 - General)

  • The aggregation logic and schema are detailed in the aggregation definition, which specifies names, inputs, logic, time granularities, and output attributes. (cf. Step 2 - Aggregation Parameters )

  • You also have the option of enabling automatic data purging helps manage data growth, ensuring efficiency. (cf. Step 3 - Purge and Retention)

  • While these aggregations are typically stored in-memory for quick access, Store Element also allows for storage in external databases, enhancing data longevity and distribution. (cf. Step 4 - Store Type)

Aggregation Nodes are essential for producing detailed reports, dashboards, and making informed decisions by analyzing time-series data over extended periods.

Aggregation Nodes directly consumes events coming from Stream Nodes; so they are beyond the scope of Node Units.

Step 1 - General

Name and Description

When configuring a Aggregation Node in Cortex, Step 1 involves several key actions.

  • Assign a unique name to the Node, distinct from other Nodes in your Application.

  • Optionally, add a Description for detail and clarity.

  • The Node Name and Description will help distinguish it in the Canvas via Node Preview.

Enable Vault

You can also integrate Application-level Vaults into your Table Node configuration.

  • Activate the toggle to use Application-level Vaults.

  • Choose from existing Vaults and enter the corresponding Personal Encryption Key for access.

Attributes

For the Aggregation Node in Cortex, the Attributes are automatically derived as inputs from Sink or Stream Nodes connected to its left side.

  • Attributes in the Aggregation Node automatically originate from Sink or Stream Nodes to its left.

  • Attributes Table displays a list of Attributes Names, Input Node Names, and Attribute Types for each input attribute.

Step 2 - Aggregation Parameters

Aggregated Attributes

In Aggregated Attributes, you will create your aggregation logic using input node attributes. The Aggreagtion Node should have at least one Aggregated Attribute to be functional.

  • You can add as many Aggregated Attributes as you need by clicking the Add Attributes button. After clicking, you can see an Aggregated Attribute is added to the bottom of the list below.

  • You can use the Expression Builder to construct the processing logic for your Aggregated Attributes.

  • After you name your Aggregated Attributes and utilize Function Builder, the Attribute Type of the Aggregated Attribute display automatically after you Save your expression.

Event Grouping

Event Grouping aggregates events based on unique combinations of Attributes. As an optional selection, you may choose to have no Event Grouping or create a combination that uses some or all Attributes.

  • Event Grouping aggregates functions independently for each given Attribute.

Event Grouping is optional, and if not specified, all events are aggregated collectively.

e.g. Imagine you have a stream of sales transactions, each with attributes like SaleAmount, Date, and ProductType. Without Event Grouping, an aggregation (e.g., Sum on SaleAmount) would calculate the total sales amount across all transactions. However, if you use Event Grouping with ProductType attribute, the aggregation calculates the total sales amount per each type of product, giving you separate sums for each ProductType. This way, the aggregation is segmented based on the unique values of the ProductType attribute.

Aggregation Criteria

Aggregation Criteria let users define the granularity of data aggregation, from seconds to years, and select an Attribute Timestamp to be used a saggregation timestamp. This flexibility facilitates tailored data analysis, using either a range of granularities for comprehensive insights or specific ones for focused observations. If a Attribute Timestamp is not provided, aggregations default to using the event's time, ensuring versatility in handling temporal data within aggregations.

By Time Period

  • To specify the time-based limits for aggregations, from seconds up to years, use checkboxes fom below.

  • You must specify at least one Time Period to continue, as without a time granularity, aggregations cannot take place.

The Aggregation Criteria calculate each time interval with precision according to the UTC timestamp, considering every turn of the designated unit, whether it's seconds, minutes, hours, days, or years, down to the nanosecond level.

e.g. For an aggregation with Hours selected as a Time Interval, aggregations occur at intervals such as every hour between the start of an hour (like 13:00:00.000000000) and the start of the next hour (like 14:00:00.000000000).

By Timestamp Attribute

Enabling an optional Timestamp Attribute comes in very handy when you want to use it as the reference timestamp for aggregations.

The Aggregation Criteria with a Timestamp Attribute calculate each time interval with precision according to the Timestamp Attribute's timestamp, considering every turn of the designated unit, whether it's seconds, minutes, hours, days, or years, down to the nanosecond level.

e.g. For an aggregation with Hours selected as a Time Interval and with a Timestamp Attribute at 02:04:12.982920345, aggregations occur at intervals such as every hour between the start of the hour with respect to the Timestamp Attribute (like 02:04:12.982920345) and the start of the next hour (like 03:04:12.982920345).

Step 3 - Purge and Retention

Data purging is an essential aspect of managing aggregated data within Aggregation Nodes. It ensures that memory resources are efficiently utilized and that only relevant data is retained for analysis purposes.

Purging Interval

Purging Interval sets how frequently aggregated data is removed from the Aggregation Node. It determines the cadence at which the system cleanses outdated aggregated data.

Data purging in Aggregation Node is enabled by defaults with 15 minutes Purging Intervals.

This default setting is configurable using Purging Interval and Retention Periods.

Retention Period

Retention period specifies the duration for which aggregated data remains accessible before it is purged. It governs the lifespan of aggregated data, ensuring its availability for analysis within defined timeframes.

Multiple Retention Periods for Different Time Period of Aggregation Criteria

Aggregation Node provides the flexibility to configure multiple Retention Periods for different Time Periods in the Aggregation Criteria By Time Period.

Users can also specify different Retention Periods per Time Period; however user-defined Retention Periods cannot be smaller that Minimum Retention Period of Each Time Interval.

Aggregation Criteria - By Time Period
Default Retention Period
Minimum Retention Period

Seconds

120 Seconds

120 Seconds

Minutes

24 Hours

120 Minutes

Hours

30 Days

25 Hours

Days

1 Year

32 Days

Months

All

13 Months

Years

All

None

Configuring Purge Intervals and Retention Periods allows users to manage aggregated data effectively, tailoring the data lifecycle to suit their analysis requirements. Adjusting these parameters ensures optimal resource utilization and timely access to relevant aggregated data for analytical insights.

Step 4 - Store Type

You can choose to store Aggregations using external stores. This way, you can create, update, and read aggregations tables in your Application flow.

Store Element in Cortex are interfaces for external data stores like RDBMS, MongoDB, and Elasticsearch using Aggregation Node tables as a proxy.

For further information you can refer to Store and Cache.

Enable Store

You can Enable Store and then choose from Store Types. In Step 4, you can configure the chosen Store Type with your requirements.

Step 5 - Store Parameters

RDBMS

You can manipulate aggregation tables that are kept in and RDBMS that reside out of Cortex.

For further information and configuration instructions refer to RDBMS.

MongoDB

You can manipulate aggregation tables that are kept in a MongoDB that reside out of Cortex.

For further information and configuration instructions refer to MongoDB.

Elasticsearch

You can manipulate aggregation tables that are kept in an Elasticsearch DB that reside out of Cortex.

For further information and configuration instructions refer to Elasticsearch.

Step 6 - Enable Cache

Using external Stores to manipulate aggregation tables in the Table Node may lead to significant I/O latency. As Applications' performance are directly related to latency, you should be aware of the performance-wise advantages and disadvantages enabling Stores.

Enable Cache

Working with external data Stores in Cortex can result in higher I/O latency compared to in-memory tables.

  • To address performance decrease, defining a Cache can be effective.

  • By caching recently accessed data in-memory, retrieval times are significantly improved, leading to faster and more efficient data access.

  • This approach helps mitigate the latency challenges associated with external data store interactions.

For further information and configuration instructions refer to Cache.

Step 7 - Preview

In Preview Step, you're provided with a concise summary of all the changes you've made to the Aggregation Node. This step is pivotal for reviewing and ensuring that your configurations are as intended before completing node setup.

  • Viewing Configurations: Preview Step presents a consolidated view of your node setup.

  • Saving and Exiting: Use the Complete button to save your changes and exit the node and return back to Canvas.

  • Revisions: Use the Back button to return to any Step of modify node setup.

The Preview Step offers a user-friendly summary to manage and finalize node settings in Cortex.

Last updated