Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Data is eating the software that is eating the world

James Kobielus | July 31, 2017
The data-driven machine learning algorithms that power AI will not only upend programming, but lower the barriers to AI itself

To compound the marginalization of programmers in this new era, we’re likely to see more ML-driven code generation along the lines that I discussed in this recent post. Amazon, Google, Facebook, Microsoft, and other software-based powerhouses have made huge investments in data science, hoping to buoy their fortunes in the post-programming era. They all have amassed growing sets of training data from their ongoing operations. For these reasons, the “Silicon Valley-style” monoliths are confident that they have the resources needed to build, tune, and optimize increasingly innovative AI/ML-based algorithms for every conceivable application.

However, any strategic advantages that these giants gain from these AI/ML assets may be short-lived. Just as data-driven approaches are eroding the foundations of traditional programming, they’re also beginning to nibble at the edges of what highly skilled data scientists do for a living. These trends are even starting to chip away at the economies of scale available to large software companies with deep pockets.


AI and the Goliaths

We’re moving into an era in which anyone can tap into cloud-based resources to cheaply automate the development, deployment, and optimization of innovative AI/ML apps. In a “snake eating its own tail” phenomenon, ML-driven approaches will increasingly automate the creation and optimization of ML models, per my discussion here. And, from what we’re seeing in research initiatives such as Stanford’s Snorkel project, ML will also play a growing role in automating the acquisition and labeling of ML training data. What that means is that, in addition to abundant open-source algorithms, models, code, and data, the next-generation developer will also be able to generate ersatz but good-enough labeled training data on the fly to tune new apps for their intended purposes.

As the availability of low-cost generative training data grows, the established software companies’ massive data lakes, in which their developers maintain petabytes of authentic from-the-source training data, may become more of an overhead burden than a strategic asset. Likewise, the need to manage the complex data-preparation logic for use of this source data may become a bottleneck that impedes the ability of developers to rapidly build, train, and deploy new AI apps.

When any developer can routinely make AI apps just as accurate as Google’s or Facebook’s – but with far less expertise, budget, and training data than the big shots – a new era will have dawned. When we reach that tipping point, the next generation of data-science-powered disruptors will start to eat away at yesteryear’s software startups.


Previous Page  1  2 

Sign up for CIO Asia eNewsletters.