Top Left Top Right

Hazelcast updates its Jet distributed processing engine

IoT 6,131 views
Posted Wednesday, June 14, 2017 by MICHAEL HAYNES, Associate Editor

Hazelcast updates its Jet distributed processing engine
Hazelcast, an open source in-memory data grid (IMDG), has announced the 0.4 release of Hazelcast Jet - an application-embeddable, distributed processing engine for big data stream and batch. Major new functionality in Jet 0.4 includes event-time processing with tumbling, sliding and session windowing. Using these new capabilities, users benefit from a feature-rich stream processing architecture which provides a flexible mechanism to build and evaluate windows over continuous data streams. Easy to use, deploy and program, Jet is appropriate for applications such as sensor updates in IoT architectures (house thermostats, lighting systems), in-store e-commerce systems and social media platforms.

Stream processing has overtaken batch processing as a preferred method of processing big data sets for companies that require immediate insight into data. However, to get value from data, it must be partitioned i.e. take a fragment of the stream and analyze it. To classify data windows during processing, each data element in the stream needs to be associated with a timestamp. In Hazelcast Jet 0.4 this is achieved via event-time processing (a logical, data-dependent timestamp, embedded in the event itself). However, a major drawback of event-time processing is that events may arrive out of order or late, so you can never be sure if you see all events in a given time window.

To alleviate this issue, the latest release of Jet also includes windowing functionality which enables users to evaluate stream processing jobs at regular time intervals, regardless of how many incoming messages the job is processing.

Three types of windows:


- Fixed/tumbling: time is partitioned into same-length, non-overlapping chunks. Each event belongs to exactly one window.

- Sliding: windows have fixed length, but are separated by a time interval (step) which can be smaller than the window length. Typically the window interval is a multiplicity of the step.

- Session: windows have various sizes and are defined basing on data, which should carry some session identifiers.

Key new features


- Users are now able to use the ICache/Hazelcast integration as a source and sink of data.

- java.util.stream can be used on top of ICache to enable basic data processing.

- Streaming File Connector - improved connector allows users to watch files and directories for changes.

- Numerous code samples are now available which can be used as building blocks for Jet applications, providing a gradual learning experience.

In a new latency benchmark study published by Jet "outperformed its competitors with a 40ms average latency for stream processing computations which remained flat as messages increased. Flink and Spark’s execution latencies were hundreds of ms rising to seconds at the higher message throughputs."

The study compares the average latencies of Hazelcast Jet, Flink and Spark Streaming under various different criteria such as message rate and window size. Results of their test can be viewed in the tables below (all results are given in milliseconds).

*Latencies increased as the framework was not able to keep up with input

With IMDG providing storage functionality, Jet is an Apache 2 licensed open source project that performs parallel execution to enable data-intensive applications to operate in near real-time. Built on top of a one-record-per-time architecture (sometimes known as continuous operators), Jet processes incoming records as soon as possible, opposed to accumulating records into micro-batches, consequently lowering latency for applications.

Greg Luck, CEO of Hazelcast, said: “The Jet project is progressing faster than we could have hoped. The new functionality in 0.4 brings stream processing for the first time. As with batch, we are achieving a new performance level, giving us a real edge over alternative market solutions. Jet’s architecture is performance and low latency driven, which is why there are no real surprises in the results of our latest benchmark. Driven by the community, Jet is an easy to deploy fast data solution for programmers built on the premise of simplicity.” 


READ MORE: https://hazelcast.com/products/jet/...




Subscribe to App Developer Daily

Latest headlines delivered to you daily.