Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Databricks lays production data pipelines

Thor Olavsrud | April 17, 2017
The new Databricks for Data Engineering edition of the Apache Spark-based cloud platform is optimized for combining SQL, structured streaming, ETL and machine learning workloads running on Spark.

That last feature is extremely important, Ghodsi says.

"It's actually really hard to transition between interactive computations and production pipelines," he says. "I think a lot of people have this mental model that there are two different things you can do: either you're doing interactive analysis or you're building data pipelines. That's not how developers work. While they're developing a data pipeline, they have to explore the data, debug and test to make sure the data pipeline is actually working. During this process, they need interactive analysis."


Moving among modes

And while you want your data pipelines to run without humans in the loop, if you do run into problems, you need to be able to seamlessly enter an interactive mode to further develop it.

"We want to make sure that you can easily and seamlessly move between these two modes," Ghodsi says.

"Databricks' latest developments for data engineering make it exceedingly easy to get started with Spark — providing a platform that is apt as both an integrated development environment and deployment pipeline, "Brett Bevers, engineering manager, Data Engineering, at Dollar Shave Club, added in a statement Wednesday. "On our first day using Databricks, we were equipped to grapple with an entirely new class of data challenges."

The new offering is immediately available. It's priced based on data engineering workloads such as ETL and automated jobs ($0.20 per Databricks Unit plus the cost of AWS).


Previous Page  1  2 

Sign up for CIO Asia eNewsletters.