Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Look out, Spark and Storm, here comes Apache Apex

Ian Pointer | April 21, 2016
A new open source streaming analytics solution derived from DataTorrent's RTS platform, Apex offers blazing speed and simplified programmability. Let's give it a spin

{INFO=1}

{ERROR=1, INFO=1}

{ERROR=1, INFO=2}

{ERROR=1, INFO=2, DEBUG=1}

...

And so on.

Malhar: A box of useful bricks

The great thing about operators is that they are small, well-defined bits of code, so they're easy to construct and test. They snap together like Lego bricks, with the slight difference that you don't normally have to make your own Lego bricks.

Enter Malhar, essentially a giant bucket of Lego that includes everything from your standard 2-by-4 to that up-down bit you "just need" from time to time. Do you need to read from Splunk, merge that information with text files stored on an FTP site, then store the result in HBase? Malhar has you covered.

Thus, Apex is really appealing to work with because Malhar comes with such an array of included operators that you often only have to worry about your business logic. Sometimes the documentation on the Malhar operators is a bit sparse, but almost everything in the repository has a brace of tests, so you can see how they work with a little effort.

Apex has a few more tricks up its sleeve, too. Along with the usual assortment of metrics and reporting, the dtCli application allows you to dynamically change a submitted application at runtime. Did you want to add a set of operators that write the logging lines to HDFS without bringing your entire application down? You can do that with Apex, unlike most other DAG-based systems available today.

The world of open source data stream processing engines is crowded, but Apex is a formidable entrant. With the Malhar library providing an impressive array of connectors, and Apex itself providing a stable base of fault tolerance, low latency, and scalability, it's easy to get up to speed and be productive with the framework. One caveat is that the operator concept is a little closer to the nuts and bolts of processing instead of Flink and Spark's higher-level constructs.

I would suggest that DataTorrent would be wise to implement an Apex runner for Apache Beam to make it easier for developers to port their application from existing frameworks. Nonetheless, I'd definitely recommend giving Apex a whirl when you evaluate streaming data processing engines.

Source: Infoworld 

 

Previous Page  1  2  3 

Sign up for CIO Asia eNewsletters.