This vendor-written piece has been edited by Executive Networks Media to eliminate product promotion, but readers should note it will likely favour the submitter's approach.
Doug Cutting, Chief Architect, Cloudera
Because of my long-standing association with the Apache Software Foundation, I'm often asked the question, "What's next for open source technology?" My typical response is variations of "I don't know" to "the possibilities are endless."
Over the past year, we've seen open source technology make strong inroads into the mainstream of enterprise technology. Who would have thought that my work on Hadoop ten years ago would impact so many industries - from manufacturing to telecom to finance. They have all taken hold of the powers of the open source ecosystem not only to improve the customer experience, become more innovative and grow the bottom line, but also to support work toward the greater good of society through genomic research, precision medicine and programs to stop human trafficking, as just a few examples.
Below I've listed five tips for folks who are curious about how to begin working with open source and what to expect from the ever-changing ecosystem.
1. Embrace the Constant Change and Evolution of Open Source
Constant change: this is the first lesson anyone who is new to open source technology needs to learn and one of open source's biggest differentiators from traditional software. The nature of open source is fluid and flexible with new projects regularly being invented for specific use cases. This dynamic cycle propels products to get better faster. So, in order for companies to reap the full benefits of open source, they must be open to this change. The Spark vs. MapReduce debate is a perfect illustration of why this is important:
It's true that folks are building fewer new applications based on MapReduce and instead are using Spark as their default data-processing engine. MapReduce is gradually being replaced as the underlying engine in tools like Hive and Pig, but that doesn't make MapReduce obsolete. It will continue to work well for existing applications for many years, and, for certain large-scale batch loads, may remain the superior tool. This trend follows the natural evolution of open source technology: MapReduce was the 1.0 engine for the open-source data ecosystem, Spark is its 2.0 engine, and someday there will be a 3.0 that will make Spark the legacy engine.
2. When Introducing a New Technology Stack, Start Small and Go From the Top Down
Rather than architecting and deploying point solutions, we now have general-purpose data platforms with many tools that can be combined flexibly for search, streaming, machine learning and more. Together these aspects require not just a different set of skills but a cultural shift around management style and organizational structure. For this reason, it's important to gain high-level support within an organization and introduce data management as an important boardroom-level discussion. I'd also recommend gradually building a new culture around a few new applications rather than replacing everything all at once to help everyone acclimate and starting with one specific use case.
Sign up for CIO Asia eNewsletters.