Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Apache Arrow aims to accelerate analytical workloads

Thor Olavsrud | Feb. 18, 2016
Arrow is designed to serve as a common data representation for big data processing and storage systems, allowing data to be shared between systems and processes without the CPU overhead caused by serialization, deserialization or memory copies.

In addition to traditional relational data, Arrow supports complex data and dynamic schemas. It can handle JSON data commonly used in Internet of Things (IoT) workloads, modern applications and log files, and implementations are already available (or underway) for programming languages including Java, C++ and Python. Nadeau says implementations of R and JavaScript should come by the end of the year, and Drill, Ibis, Impala, Kudu, Parquet and Spark will all adopt Arrow by the end of the year. Additional projects are also expected to adopt Arrow in that timeframe.

"Real-world use cases often include complex combinations of structured and rapidly growing complex-data," says Parth Chandra, Apache Drill PMC and Apache Arrow PMC. "Already tested with Apache Drill, the efficient in-memory columnar representation and processing in Arrow will enable users to enjoy the performance of columnar processing with the flexibility of JSON."

Nadeau expects the first formal release of Arrow to come within a few months.

 

Previous Page  1  2 

Sign up for CIO Asia eNewsletters.