To ensure that data security is observed, public sector organisations that adopt Hadoop need to apply well known security measures to the underlying infrastructure and systems. For example:
- Turning off services that are not required
- Restricting access to users
- Limiting super-user permissions
- Locking down network ports and protocols
- Enabling audit, logging and monitoring
- Applying all the latest OS security patches
- Using centralised corporate authentication
- Enabling encryption in-transit
2) What type of perimeter security is in place?
With a Hadoop cluster installed on a secure platform, the next questions to address revolves around the perimeter security: who can access the Hadoop cluster, from where, and how are users authenticated?
Perimeter security restricts users by requiring entry through a secure gateway over secured networks and with approved credentials. Just as agencies need multiple data sources and multiple frameworks to truly instill a data-driven workflow within their organisations, government leaders also need a secure enough network system that is agile and can handle a variety of workforce needs.
3) What security regulations must I meet?
There are two kinds of interests when it comes to compliance: those that have to be compliant, and those that want to follow compliance guidelines.
For those that must be compliant, they are usually operating under a mandate such as FISMA, which establishes the compliance and regulations required, including data encryption. Data encryption is the safety lock to the most sensitive data an organisation has access to.
As for those that are following compliance guidelines, they typically do so to establish differentiation, mitigate risks, and promote a culture and mindset of security.
However, public sector organisations need to keep in mind that compliance is not just about the technology. It is also about the people and processes. Organisations first need have a security culture in place before. For instance, users need to consistently adhere to simple security guidelines like encrypting sensitive data and locking devices with secure passwords.
4) Who are the 'need to know' users on the Hadoop platform?
It is important for a public sector organisation to only share data on a need-to-know basis internally. This is, however, where many public sector agencies struggle the most. There are sub-groups and divisions built into larger agencies, and with increased organisational complexity comes increased difficulty in monitoring and accessing data.
The power to bring data together, like that of a Hadoop-powered EDH, also comes with a challenge: who are the 'need-to-know' users within a larger organisation that require access to critical data?
Solutions like Apache Sentry that enable role-based access controls to fine grain data sets may be useful here. Users are defined by 'need-to-know' roles rather than organisational structures. Essentially, Sentry is the critical, central authorisation framework that gives the Hadoop platform the ability to store sensitive data, while providing secure access to the agency's 'need-to-knows.'
Sign up for CIO Asia eNewsletters.