Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

10 hot data analytics trends — and 5 going cold

Martin Heller | Aug. 11, 2017
Big data, machine learning, data science — the data analytics revolution is evolving rapidly. Keep your BA/BI pros and data scientists ahead of the curve with the latest technologies and strategies for data analysis.

In some cases, one or more legacy systems (which may date back to the 1960s in some cases) can only run analyses or back up their data at night when not otherwise in use. In other cases there is no technical reason to run batch analysis, but "that's how we've always done it."

You're better than that, and your management deserves up-to-the-minute data analysis.


Heating up: Microsoft Cognitive Toolkit 2.0

Who: Data scientists

The Microsoft Cognitive Toolkit, also known as CNTK 2.0, is a unified deep-learning toolkit that describes neural networks as a series of computational steps via a directed graph. It has many similarities to TensorFlow and MXNet, although Microsoft claims that CNTK is faster than TensorFlow especially for recurrent networks, has inference support that is easier to integrate in applications, and has efficient built-in data readers that also support distributed learning.

There are currently about 60 samples in the Model Gallery, including most of the contest-winning models of the last decade. The Cognitive Toolkit is the underlying technology for Microsoft Cortana, Skype live translation, Bing, and some Xbox features.


Heating up: Scikit-learn

Who: Data scientists

Scikits are Python-based scientific toolboxes built around SciPy, the Python library for scientific computing. Scikit-learn is an open source project focused on machine learning that is careful about avoiding scope creep and jumping on unproven algorithms. On the other hand, it has quite a nice selection of solid algorithms, and it uses Cython (the Python to C compiler) for functions that need to be fast, such as inner loops.

Among the areas Scikit-learn does not cover are deep learning, reinforcement learning, graphical models, and sequence prediction. It is defined as being in and for Python, so it doesn't have APIs for other languages. Scikit-learn doesn't support PyPy, the fast just-in-time compiling Python implementation, nor does it support GPU acceleration, which aside from neural networks, Scikit-learn has little need for.

Scikit-learn earns the highest marks for ease of development among all the machine learning frameworks I've tested. The algorithms work as advertised and documented, the APIs are consistent and well-designed, and there are few "impedance mismatches" between data structures. It's a pleasure to work with a library in which features have been thoroughly fleshed out and bugs thoroughly flushed out.


Cooling down: Caffe

Who: Data scientists

The once-promising Caffe deep learning project, originally a strong framework for image classification, seems to be stalling. While the framework has strong convolutional networks for image recognition, good support for CUDA GPUs, and decent portability, its models often need excessively large amounts of GPU memory, the software has year-old bugs that haven't been fixed, and its documentation is problematic at best.

Caffe finally reached its 1.0 release mark in April 2017 after more than a year of struggling through buggy release candidates. And yet, as of July 2017, it has over 500 open issues. An outsider might get the impression that the project stalled while the deep learning community moved on to TensorFlow, CNTK and MXNet.


Previous Page  1  2  3  4  5  Next Page 

Sign up for CIO Asia eNewsletters.