Integrating event stream processing in ML and analytics systems

RELAX

6 months ago

By Jingyu Liu

Currently, stream processing is widely used in the fields of the Internet of Things (IoT) and edge computing, with use cases in various industries like smart grids and financial ones. In the field of IoT, it is used for real-time data collection, monitoring and real-time calculation of index data of IoT devices.

Challenges of streaming processing in resource utilization

In this big data era, most of the applications (including AI ones) are streaming data into Stream Processing Engines (SPEs) like Apache Flink, Spark Streaming, or Kafka Streams. These systems can hardly cope with data volumes that are increasing exponentially. Furthermore, optimizing such systems often involves making trade-offs among multiple performance metrics. For instance, improving throughput might come at the cost of reduced analysis accuracy. Similarly, minimizing data processing latency might require increased resource consumption. In general, navigating the existing optimization spaces involves carefully balancing metrics such as resource utilization, latency, accuracy, and throughput depending on the application’s constraints and goals.

The problem we want to solve is how to find optimally trade-off applications’ resource utilization and performance within given constraints by leveraging AI techniques. Despite decades of research in operator scheduling, elasticity, resource allocation, and so on, the dynamic adaptation of stream processing systems is quite limited, and we believe using live-tuning tools, such as those based on AI techniques like Reinforcement Learning, can overcome existing limitations.

Approach

According to the DataFlow model, a data stream (e.g., a sequence of position reports from a moving car) is an unbounded sequence of tuples, sharing the same schema. Edge devices, such as vehicles, typically contain many embedded computers or sensors can support data pipelines which transform the edge data into meaningful could-based insights within cloud-based systems. These pipelines often rely on data-intensive processing paradigms like stream processing, where the applications are defined as Directed Acyclic Graphs (DAGs) of operators and run by SPEs, which control how these pipelines are deployed through the distribution and parallelization of the operators in a DAG.

Stream processing queries are composed of sources, operators, and sinks. Sources forward tuples (e.g. events reported by sensors) to operators. Operators connected in a DAG process and forward/produce tuples. Eventually, tuples are fed to sinks, which deliver results to end-users or other applications. Operators are distinguished into stateless and stateful. Stateless ones do not maintain a state that evolves with the tuples they process, while stateful ones produce results from a state dependent on one or more tuples. Because data stream are unbounded, stateful operators usually rely on windows to bound the amount of state they maintain. In this project, we focus on stateful operators, specifically Aggregates – central for data summarization and often bottlenecks in large pipelines with workload fluctuations – to explore how to balance resource utilization and performance. To reach this trade-off, we consider using auto-tuning tools based on AI, and we think RL is one of the most mature and popular methods.

Memory compression through RL

When using a streaming Aggregate, data is typically grouped into windows that summarize recent events in stream processing, and these windows are partitioned by keys, such as user IDs, sensors or devices. Consider, for example, two vehicles, one transmits data frequently, like once per second, resulting in constantly evolving data windows that are actively processed. In contrast, another vehicle sends data infrequently, maybe once per day. While the total volume of data from the second vehicle is small, the system still needs to maintain its window in memory, even though these windows are rarely updated or accessed, which leads to poorly used memory. In resource-constrained environments, such as edge deployments, this accumulation of rarely updated windows can lead to memory pressure, which motivates the need for on-demand memory management, particularly for rarely updated streams. To address this, our idea is to compress the windows belonging to infrequently updated partitions first and decompress them later for outputting the results or shifting the window.

We investigated an algorithm that uses a single parameter to control how much to compress, like a “knob”, to control the amount of the window instances that should be compressed maintained by the Aggregate, within a definition of compression ratio, ranging from 0 – compressing all the windows, to 100 – no window is to be compressed. As for how to adjust this “knob”, we introduce an Agent, trained by a neural network, that acts on the live Aggregate.

The steps taken by the framework are the following: (1) define an environment, mainly provided by the Stram Processing Engine (SPE); (2) find a controller, as the communication tool between the environment and the Reinforcement Learning (RL) Agent; (3) construct RL Agent, with an appropriate training algorithm embedded inside the Agent. In the first place, the environment gives an initial state (such as input rate, throughput, output rate, latency, compression ratio, CPU consumption) to the Agent, which usually relies on a neural network. Secondly, the Agent gives a proper action (such as compress more, stay unchanged, compress less) to interact with the environment to update the state while generating rewards, and then sends it back to the Agent to reinforce itself to get as optimal actions as possible. This whole process looks like a continuous iteration until it reaches the terminal state or condition.

Window-adaption through RL

While SPEs offer flexibility in defining application logic through operator graphs, they often lack dynamic adaptability once deployed. Among the various operators, their performance and resource usage are directly influenced by how windows are defined and maintained. In most SPEs, window parameters, window advance (WA) and window size (WS), are fixed at deployment time, which may result in suboptimal performance under dynamically changing workloads. For example, many fine-grained windows (e.g., with small WA and large WS) can introduce considerable computational and memory pressure in periods of high input rate. Conversely, during low load periods, coarse-grained windows may lead to under-utilization of available resources and delayed output. This motivates us to explore adaptive window configuration to balance the trade-off between performance (e.g., latency and throughput) and resource utilization (e.g., memory and CPU).

We will propose a mechanism that allows the Aggregate to dynamically adjust its window configuration during runtime. Specifically, the Aggregate switches between a pre-defined set of (WA, WS) pairs in response to changes in the SPE state. By enabling such dynamic reconfiguration, the Aggregate will better align its behavior with the characteristics of the incoming data and the current system load.

Similar to the approach in 2.1, we introduce a tunable “knob” that determines how frequently the window configuration should be changed. For example, when the “knob” is set to a low value, the system is allowed to reconfigure more frequently, which may help it respond quickly to workload changes but at the cost of increased overhead. On the other hand, higher values result in less frequent changes, reducing reconfiguration cost but potentially missing opportunities for optimization. To determine this “knob”, we can also use RL, in which the RL Agent observes states from the environment, selects actions, such as larger/smaller WS/WA, or keep unchanged, to change the window configurations, and learns to adapt its policy over time based on the observed reward, probably defined in terms of system performance and resource consumption.

Conclusions

This project is situated within the broader vision of building adaptive, intelligent stream processing systems that can respond effectively to dynamic data and workloads. In particular, as modern applications demand increasingly flexible and resource-efficient processing pipelines, it becomes crucial to enable automatic trade-offs between performance and resource usage.