Aggregation
Aggregation Nodes in Cortex provide a powerful way to incrementally aggregate events across various time granularities, allowing for real-time analytics with millisecond precision.
Every Aggregation Node is defined by a specific schema for a window definition. (cf. Step 1 - General)
The aggregation logic and schema are detailed in the aggregation definition, which specifies names, inputs, logic, time granularities, and output attributes. (cf. Step 2 - Aggregation Parameters )
You also have the option of enabling automatic data purging helps manage data growth, ensuring efficiency. (cf. Step 3 - Purge and Retention)
While these aggregations are typically stored in-memory for quick access, Store Element also allows for storage in external databases, enhancing data longevity and distribution. (cf. Step 4 - Store Type)
Aggregation Nodes are essential for producing detailed reports, dashboards, and making informed decisions by analyzing time-series data over extended periods.
Aggregation Nodes directly consumes events coming from Stream Nodes; so they are beyond the scope of Node Units.
Step 1 - General
Name and Description
When configuring a Aggregation Node in Cortex, Step 1 involves several key actions.
Assign a unique name to the Node, distinct from other Nodes in your Application.
Optionally, add a Description for detail and clarity.
The Node Name and Description will help distinguish it in the Canvas via Node Preview.
Enable Vault
You can also integrate Application-level Vaults into your Table Node configuration.
Activate the toggle to use Application-level Vaults.
Choose from existing Vaults and enter the corresponding Personal Encryption Key for access.
Attributes
For the Aggregation Node in Cortex, the Attributes are automatically derived as inputs from Sink or Stream Nodes connected to its left side.
Attributes in the Aggregation Node automatically originate from Sink or Stream Nodes to its left.
Attributes Table displays a list of Attributes Names, Input Node Names, and Attribute Types for each input attribute.
Step 2 - Aggregation Parameters
Aggregated Attributes
In Aggregated Attributes, you will create your aggregation logic using input node attributes. The Aggreagtion Node should have at least one Aggregated Attribute to be functional.
You can add as many Aggregated Attributes as you need by clicking the Add Attributes button. After clicking, you can see an Aggregated Attribute is added to the bottom of the list below.
You can use the Expression Builder to construct the processing logic for your Aggregated Attributes.
Different from other Buffer Nodes (Stream, Table, Window, Trigger); you must name Aggregated Attributes to the right of the 'as' clause.
After you name your Aggregated Attributes and utilize Function Builder, the Attribute Type of the Aggregated Attribute display automatically after you Save your expression.
Event Grouping
Event Grouping aggregates events based on unique combinations of Attributes. As an optional selection, you may choose to have no Event Grouping or create a combination that uses some or all Attributes.
Event Grouping aggregates functions independently for each given Attribute.
e.g. Imagine you have a stream of sales transactions, each with attributes like SaleAmount
, Date
, and ProductType
. Without Event Grouping, an aggregation (e.g., Sum on SaleAmount
) would calculate the total sales amount across all transactions. However, if you use Event Grouping with ProductType
attribute, the aggregation calculates the total sales amount per each type of product, giving you separate sums for each ProductType
. This way, the aggregation is segmented based on the unique values of the ProductType
attribute.
Aggregation Criteria
Aggregation Criteria let users define the granularity of data aggregation, from seconds to years, and select an Attribute Timestamp to be used a saggregation timestamp. This flexibility facilitates tailored data analysis, using either a range of granularities for comprehensive insights or specific ones for focused observations. If a Attribute Timestamp is not provided, aggregations default to using the event's time, ensuring versatility in handling temporal data within aggregations.
By Time Period
To specify the time-based limits for aggregations, from seconds up to years, use checkboxes fom below.
You must specify at least one Time Period to continue, as without a time granularity, aggregations cannot take place.
e.g. For an aggregation with Hours selected as a Time Interval, aggregations occur at intervals such as every hour between the start of an hour (like 13:00:00.000000000) and the start of the next hour (like 14:00:00.000000000).
By Timestamp Attribute
Enabling an optional Timestamp Attribute comes in very handy when you want to use it as the reference timestamp for aggregations.
e.g. For an aggregation with Hours selected as a Time Interval and with a Timestamp Attribute at 02:04:12.982920345, aggregations occur at intervals such as every hour between the start of the hour with respect to the Timestamp Attribute (like 02:04:12.982920345) and the start of the next hour (like 03:04:12.982920345).
Step 3 - Purge and Retention
Data purging is an essential aspect of managing aggregated data within Aggregation Nodes. It ensures that memory resources are efficiently utilized and that only relevant data is retained for analysis purposes.
Purging Interval
Purging Interval sets how frequently aggregated data is removed from the Aggregation Node. It determines the cadence at which the system cleanses outdated aggregated data.
This default setting is configurable using Purging Interval and Retention Periods.
Retention Period
Retention period specifies the duration for which aggregated data remains accessible before it is purged. It governs the lifespan of aggregated data, ensuring its availability for analysis within defined timeframes.
Multiple Retention Periods for Different Time Period of Aggregation Criteria
Aggregation Node provides the flexibility to configure multiple Retention Periods for different Time Periods in the Aggregation Criteria By Time Period.
Seconds
120 Seconds
120 Seconds
Minutes
24 Hours
120 Minutes
Hours
30 Days
25 Hours
Days
1 Year
32 Days
Months
All
13 Months
Years
All
None
Configuring Purge Intervals and Retention Periods allows users to manage aggregated data effectively, tailoring the data lifecycle to suit their analysis requirements. Adjusting these parameters ensures optimal resource utilization and timely access to relevant aggregated data for analytical insights.
Step 4 - Store Type
You can choose to store Aggregations using external stores. This way, you can create, update, and read aggregations tables in your Application flow.
Store Element in Cortex are interfaces for external data stores like RDBMS, MongoDB, and Elasticsearch using Aggregation Node tables as a proxy.
Enable Store
You can Enable Store and then choose from Store Types. In Step 4, you can configure the chosen Store Type with your requirements.
Step 5 - Store Parameters
RDBMS
You can manipulate aggregation tables that are kept in and RDBMS that reside out of Cortex.
MongoDB
You can manipulate aggregation tables that are kept in a MongoDB that reside out of Cortex.
Elasticsearch
You can manipulate aggregation tables that are kept in an Elasticsearch DB that reside out of Cortex.
Step 6 - Enable Cache
Using external Stores to manipulate aggregation tables in the Table Node may lead to significant I/O latency. As Applications' performance are directly related to latency, you should be aware of the performance-wise advantages and disadvantages enabling Stores.
For highly latency-critical Applications, we do not recommended to enable Stores.
Enable Cache
Working with external data Stores in Cortex can result in higher I/O latency compared to in-memory tables.
To address performance decrease, defining a Cache can be effective.
By caching recently accessed data in-memory, retrieval times are significantly improved, leading to faster and more efficient data access.
This approach helps mitigate the latency challenges associated with external data store interactions.
Step 7 - Preview
In Preview Step, you're provided with a concise summary of all the changes you've made to the Aggregation Node. This step is pivotal for reviewing and ensuring that your configurations are as intended before completing node setup.
Viewing Configurations: Preview Step presents a consolidated view of your node setup.
Saving and Exiting: Use the Complete button to save your changes and exit the node and return back to Canvas.
Revisions: Use the Back button to return to any Step of modify node setup.
The Preview Step offers a user-friendly summary to manage and finalize node settings in Cortex.
Last updated