The Hadoop security project called Ranger supposedly was named in tribute to Chuck Norris in his "Walker, Texas Ranger" role. The project has its roots in XA Secure, which was acquired by Hortonworks, then renamed to Argus before settling in at the Apache Software Foundation as Ranger.
When Hadoop started, it was a set of loosely coupled parts primarily used in the back end of the big Internet companies like Yahoo. These parts were wrapped into distributions and marketed as Hadoop by the likes of MapR, Cloudera, and Hortonworks.
Such piecemeal architecture isn't unusual in the world of open source or even in the wide world of commercial software. It does, however, result in security challenges. Some will read this as "it's insecure," but that isn't necessarily the case — though it can be. The problem is more how do you authenticate users to all parts of this system of parts — and once you authenticate them how do you authorize them to do only what you mean to allow them to do?
Each part of Hadoop has its own LDAP and Kerberos authentication, as well as its own means and rules of authorization (and in most cases totally separate implementations of the same). This means you get to configure Kerberos or LDAP to each individual part, then define those rules in each separate configuration. What Apache Ranger does is provide a plug-in to each of these parts of Hadoop and a common authentication repository, as well as allow you to define policies in a centralized location.
Ranger is clearly a Hortonworks-sponsored project (as opposed to a Cloudera or MapR or now Databricks). You can tell this in part by the way it's skinned (green) and in part because of what it supports. At present, Ranger supports the following:
Except for HDFS and HBase, which are supported as part of the core of Hadoop and Solr, these are some of the more "Hortonworksy" projects. In a modern deployment, you'll likely see other components, such as Spark or possibly Impala (from Cloudera). Nonetheless, Ranger is a great thing.
How Ranger works
In Ranger, for each component you work with a Repository. These repositories are based on an underlying plug-in or agent that operates with that component.
The repository manager from Hortonworks' Ranger documentation
Associated with each of these repositories is a set of policies, which are associated with the resource you are protecting (a table, folder, or column) and a group (such as administrators) and what they are allowed to do with that thing (read, write, and so on). You give each policy a name — say, "Only the grp_nixon can read the apac_china table."
Sign up for CIO Asia eNewsletters.