Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Lack of data scientists is the new Von Neumann bottleneck

Brian Proffitt | Jan. 31, 2012
Data is a huge presence within much of business and technology, and the next installment of the O'Reilly Strata Conference will provide attendees a look into the revolutionary ways data is driving, well, everything.

Data is a huge presence within much of business and technology, and the next installment of the O'Reilly Strata Conference will provide attendees a look into the revolutionary ways data is driving, well, everything.

The Winter 2012 edition of the O'Reilly Strata Conference will offer sessions for everyone to the businessperson trying to figure out just what this whole big data thing is all about, to the hard-core data scientist wonks who are bringing all this new technology to the fore.

Big data has gotten a lot of attention in the past couple of years, as Hadoop, Cassandra, MapReduce, and other open source technologies have enabled businesses and governments to use data in ways unheard of when using relational database technology. The Strata Conference is the first and most prominent gathering for any party interested in learning about just what makes big data tick.

And that, according to Founding Chair Edd Dumbill, is part of the whole point of Strata: educating users and data scientists about the benefits and applications of big data.

"There are three main themes examined at Strata," Dumbill said in a recent interview, "The increasing of data and the growth of ubiquitous computing are two, which form the start of an arc to the third aspect."

The arc, Dumbill continued, leads to a much higher level of interconnectivity, the so-called "Internet of Things," which describes the billions of objects tagged and otherwise connected to the Internet, each providing massive amounts of data to be collected and processed.

But processed by whom? Stored how? And utilized in what manner? Those are the key questions that gatherings, like Strata, hope to address, particularly that last, third part of the arc: how data is used. This is what Dumbill euphemistically refers to as "data and the final mile."

The "final mile" is likely a familiar term to network engineers: it refers to the all-important connectivity between the end-user and the rest of the Internet.

"So it is with data science and analytics within a business," Dumbill. For data, the "final mile" refers to the capability to properly process data and convey what's really important: information.

The bridge of turning data to information (which can then be used to acquire knowledge) is exactly where the data scientist lives, and it's a skill that is still lacking within this burgeoning field.

Data scientists are described by Strata organizers as being talented in engineering, data management, mathematics, and writing. "The art of storytelling and visualization are also important," Dumbill explained.

I suggested to Dumbill that an example might be the work of Hans Rosling, who very effectively uses stunning graphics to convey a wealth of information. Dumbill agreed that this was pretty much the same sort of work, though Rosling was not working with truly massive data sets. Data scientists for big data will be able to create models beyond even the work of Rosling.

 

1  2  Next Page 

Sign up for CIO Asia eNewsletters.