Security for big data analytics is challenging. Here's why: When you can't analyze in place, you need to copy that data -- at which point all the stipulations about who can see or change all manner of data under what circumstance should be replicated, too. Today, that's nearly impossible to do.
On the Hadoop/Spark side, we have only role-based, limited access control lists (ACLs) or the Wild West. But I believe there's a way forward: Adopt the policy-based approach that has arisen in the broader security market. To explore how that could work, we need to revisit the history of access control and how it evolved to produce a policy-based model.
A three-minute history of access control
In the beginning, there were usernames and passwords to keep out everyone who might want in, despite what Richard Stallman said.
There was an inherent problem with this system. The number of user/password combinations tended to explode as new applications were written, so we ended up with a different user/password for each application. Worse, some applications asked for different passwords to reach different levels of security.
We became smarter and divided up "roles" from usernames. We'd have one "user/password," but to access the administrative functions, that user/password would also need an "admin" role, for example. However, each application tended to implement this on its own, so you still had a growing list of passwords to remember.
We became even smarter and created central systems that eventually becameLDAP, Active Directory, and the like. These united the user/password in a core repository and established one place to look up the roles for a given user -- but this replaced one problem with another.
In an ideal world, each new application looks at the list of roles in Active Directory and maps them to application roles, so there's a clean, one-to-one relationship. In reality, most applications think of roles differently, and besides, simply because you're an admin for one application doesn't mean you should be an admin for another. In the end, you've replaced an explosion of user/password combinations with an explosion in the number of roles.
Which begs the question: Who ends up in charge of adding new roles? It tends to be either some IT-administrative or shared-HR function. Since there's a good chance none of those people with the menial task of adding roles will actually understand the application very well, this usually ends up being a "manager approval" or "rubber stamp," and that isn't, as they say, good.
Many applications still punt on the question of roles by using AD for authentication and having the application handle its own local role implementation. There's a lot to be said for this approach, because it's clearly the application administrator who knows who should have what level of access.
Sign up for CIO Asia eNewsletters.