Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Under the hood of Cisco’s Tetration Analytics platform

Brandon Butler | June 21, 2016
Apache Spark, Kakfa, Druid and more are under the covers.

Cisco’s entrance into the data center analytics market with the introduction of Tetration is the culmination of two years worth of wrangling various open source projects and developing proprietary algorithms in the areas of big data, streaming analytics and machine learning.

Tetration is an analytics platform that provides deep visibility into data center and cloud infrastructure operational information. Here’s a description from Network World’s story on Tetration:

The platform, Cisco Tetration Analytics gathers information from hardware and software sensors and analyzes the information using big data analytics and machine learning to offer IT managers a deeper understanding of their data center resources. The system will dramatically simplify operational reliability, application migrations to SDN and the cloud as well as security monitoring.

But what’s under the hood of Tetration? Below are some of the components used to build the product (by the way, the word Tetration is a mathematical term used to indicate very large numbers):

Apache Spark

Apache Spark is an engine for large-scale data processing. To understand what Spark is, it’s helpful to understand the basics of Hadoop. Hadoop has two main components: the Hadoop Distributed File System (HDFS), which is the storage layer and MapReduce, which is the analytics and compute layer. Spark was developed as an alternative to MapReduce as an in-memory cluster processing platform that can deliver up to 100x faster response times compared to MapReduce for some applications. A key feature for Spark is that programs can load data into Spark’s cluster memory system and be queried repeatedly, making it an ideal platform for machine learning and artificial intelligence applications.

Kakfa

Kakfa is an Apache publish-subscribe messaging platform used in big data analytics programs. Kakfa is meant to serve as the “central data backbone” for programs and can handle hundreds of reads and writes per second from thousands of clients. “Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers,” according to the Apache Kafka description.

Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact, according to Apache.

Druid

While Cisco used Spark and Kafka for data processing and messaging, Tetration engineers also used Druid as a column-oriented distributed data storage system. “Druid provides low latency (real-time) data ingestion, flexible data exploration, and fast data aggregation,” according to Apache’s Druid description, noting that existing Druid deployments have scaled to managing trillions of events and petabytes of data.

Secret Sauce

While these were the major open source components Cisco used to create Tetration, the company also developed customized software to link it all together. “Some critical new components we wrote because there’s no equivalent in the open source domain or they have not been open sourced,” a Cisco spokesperson wrote in an email. The company indicated this was most of the case in the machine learning aspects of the product. “Naturally we need to keep private the smart algorithms where a lot of the magic and differentiation occur.”

 

1  2  Next Page 

Sign up for CIO Asia eNewsletters.