Hadoop specialist Hortonworks today announced that the newest version of its Hadoop distribution, Hortonworks Data Platform (HDP) 2.2, will become generally available in November.
Jim Walker, director of product marketing for Hortonworks, says the release incorporates the last six months of innovation in the Apache Hadoop community, including more than 100 new features from Apache Hadoop and its related projects.
Walker notes that Hortonworks strives to expand its distribution both horizontally and vertically. Vertically, Hortonworks is integrating the projects within its Hadoop distribution with YARN and HDFS to allow it to span workloads, whether batch, interactive or real-time. He notes that deep integration of both Apache Storm and Apache Spark in HDP 2.2 represents this approach.
With more Hadoop projects going into production, the HDP 2.2 release also focuses horizontally on a slate of enterprise requirements. "This release represents the next major step forward for enterprise-readiness of Hadoop," Walker says. "Hadoop needs to act like any of the other technologies you have in the data center, meeting requirements around governance, security and operations."
HDP 2.2 Now Supports Apache Argus, Ambari and Kafka; SQL Semantics, Too
For instance, this release is the first to include Apache Argus, an Apache Incubator project based on the technology of security company XA Secure. Hortonworks acquired XA Secure in May and contributed its technology to the open source community via the Apache Software Foundation.
"Argus allows you to administer a central policy for security across a cluster and then enforce that policy across all different engines," Walker says.
HDP 2.2 also introduces new innovations from the Apache Ambari project, providing improvements to managing and monitoring clusters. The Ambari Views Framework offers a systematic way to plug in UI capabilities to bring out custom visualization, management and monitoring features in the Ambari Web console — including the ability to allow third parties to plug in new resource types along with the APIs, providers and UI to support them. Meanwhile, Ambari Blueprints deliver a template approach to cluster deployment, allowing you to specify a stack, component layout and configuration for cluster instances without having to use the Ambari Cluster Install Wizard.
To further support enterprise SQL at scale in Hadoop, HDP 2.2 delivers updated SQL semantics for Hive transactions for update and delete, and a cost-based optimizer that gives Hive a major performance increase, Walker says.
To support the Internet of Things, it adds Apache Kafka, a high-scale, fault-tolerant publish-subscribe messaging system for Hadoop that's often used with Storm or Spark to stream events into Hadoop in real time.
Finally, the new release provides support for rolling upgrades and an automated policy for cloud backup to Microsoft Azure or Amazon S3.
Sign up for CIO Asia eNewsletters.