Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

How to collect and analyze data from 100,000 weather stations

John Brandon | June 17, 2015
If you want to understand what it takes to collect, track and analyze reams of data, just check the weather. There are constant fluctuations, scores of data points and intense interest from all over the planet. Analyze the data correctly and someone in the state of Washington knows whether or not to wear a raincoat. Do it poorly and there might be a massive traffic pileup from people driving too fast on slick roads.

The Weather Company acts as a "clearinghouse" for this data collection, says Koehler. The company monitors the stations and knows exactly how each one works that the station is a RainWise product that collects data every second versus a Netatmo station that might not collect as often, for example. 

Part of the challenge is in interpreting the data correctly. The Weather Company might look for trends from data collected from multiple phones and stations in the same area. The company has figured out how to compare data sets with varying levels of accuracy and quality and still derive some value, especially in terms of weather trends. All of the collected data is valuable, Koehler says. 

Interestingly, the data sets are typically quite small. In total, Koehler says his company collects a "couple hundred" terabytes from personal weather stations. 

A whole lotta ping

"It's a very chatty environment," he says. "There is a high frequency of ping. So we have to use a very scalable infrastructure, since there are a few hundred devices added every day. And the frequency of the data input continues to rise." 

The Weather Company had previously been using Amazon Web Services for the data collection and processing. At the time of this writing, the company had switched to IBM Cloud, primarily due to costs and presence in the market. 

"IBM Cloud has been growing rapidly, particularly as a resource for large enterprises," says Charles King, a noted IT expert with Pund-IT. "IBM is dedicating significant budget to rolling out a global network of cloud data centers. By partnering with IBM, the Weather Channel will benefit from IBM's global cloud resources [to support its own global network] and should also be able to monetize its assets as part of the [Internet of Things] services IBM is envisioning. 

"If, as many scientists and insurance companies believe, we're heading into a future where extreme weather events become increasingly common, the partnership should be a good deal for both companies and their respective customers," King adds. 

"The increasing use of social and sensor networks are producing significant amounts of high-throughput data available for mining in areas like customer behavior, biological systems and environmental conditions," says Matt Wood, general manager for Data Science at Amazon Web Services. "The critical barrier to big data, which has traditionally been the infrastructure required to collect, compute and collaborate, is now being transformed through the use of cloud computing with AWS." 

In the end, what makes the collection from 100,000 sensors so noteworthy is that it is a major test of cloud infrastructure. King says the data is rich and layered, but fairly consistent and predictable in terms of how often the stations send in reports. Whether the reports are from an airplane, a station in Iceland or a smartphone, the algorithms are ready to help provide a more accurate weather forecast with every single ping.


Previous Page  1  2 

Sign up for CIO Asia eNewsletters.