SAS chief, Dr Jim Goodnight, discusses what people are doing wrong with Hadoop, which industries are best at data analysis, and how improving education for pre-schoolers from low income families could help with the skills crisis.
He also spoke to CIO about the why healthcare is the $3.09 billion company's biggest opportunity, and why SAS may find it tough to maintain its impeccable 38-year run of growth this year.
CIO: The data analytics market is getting fairly crowded, there's lots of new players out there. Has this impacted your business? SAS has been such a dominant player for so long -- how will you maintain that position and stay ahead of your competitors?
Goodnight: Things like massively parallel computing that we have been working on for many years, this stuff is several years ahead of the market right now. As a matter of fact, we are a couple of years ahead of our customers, and [convincing them] to convert from the old ways of doing things to the new massively parallel approach -- is somewhat difficult.
[Customers] are using Hadoop as a data store and to analyse the data, they are pulling it out of Hadoop and putting it into another server to analyse. That is a horrible approach because Hadoop by its nature is resting in a Hadoop cluster of servers. We can pull the data directly out of Hadoop straight up into the memory in parallel and analyse it right there.
So pulling it over the network -- just treating it like a SAN device -- that's a terrible waste. We should analyse it where it is right there. It's not just one single [Hadoop] file, it's hundreds of little files and each one of those little files can be read directly into memory.
So we have 100 little files we can read straight up into memory in a couple of seconds. But if it's 100 million records and you have to send it over the internal network, it's a huge waste of time.
CIO: And you find that a lot of customers are doing that?
Goodnight: There are and I think we've seen IT be at the forefront of Hadoop adoption because they look at it as a way to save money for storage. It's one-third of the price of some of the big SAN devices.
It's definitely cheaper storage but it's a shame that more companies aren't taking advantage of the fact that with SAS we can analyse the data right there where it sits and do it 1000 times faster than they can do by bringing it over the network. It's a hard one to get across.
Sign up for CIO Asia eNewsletters.