When reviewing the 5 steps for transforming your business using data, we highlighted that real-time business intelligence is the first step toward the digitalization of business. How far-reaching (and useful) the insight produced by business intelligence depends to a great degree on how much input data it was given to start with.
In the early days of business intelligence, data was stored in expensive enterprise data warehouses and data marts. Expensive because disk storage was pricey, but even more so because licenses of data warehouse databases were proportional to the amount of data stored. Hence the cost-efficiency compromise always had to be analyzed when considering new data as candidates for the data warehouse.
Storage has decreased dramatically in price. Available options abound -- from commodity disks in redundant architectures to in-memory drives, from storage appliances to cloud storage services, there are storage options that fit every need and every budget. The advent of Hadoop has broadened this palette, combining infinitely scalable storage and processing engine in the same platform. What's more, these new alternatives coexist well with existing infrastructures, many vendors going to great lengths to provide closely integrated hybrid architectures.
But are some of the new deposits of data you can (and should) go and tap?
Measurements from sensors
Of course, if your business consists of selling intelligence from connected objects, you already collect sensor data -- but then you are already a digital company. However, many organizations just discard incredibly valuable sensor measurements, using only the ones that trigger a actual failure. But if you take measurements collected by the hundreds of sensors built all over a car, or by pressure/temperature gauges in a factory, you will get data that is really handy for the mechanic or the production line manager to diagnose patterns of failures, optimize performance, or even perform preventive maintenance -- avoiding failures before they even occur.
Logs come in all shapes and use cases: access logs from facilities doors, web surfing logs, GPS tracking logs, online intrusion logs -- they all serve a specific purpose, but they also grow very quickly. As a result, most IT organizations discard them after a few days or weeks, loosing valuable sources of insight into visitor flow, elevator traffic, seasonal HVAC regulation, etc.
Log data is usually reasonably structured -- tabular or polystructured -- and is fairly straightforward to incorporate in a data lake, where it can easily be used to complement more traditional data warehouse structures.
Unstructured data, such as audio, video, textual information, etc. -- stored on individual computers, on servers, in the Cloud -- can be mined for information. Of course, proper data preparation needs to be used to ensure a correct parsing and tagging of content and concepts.
Sign up for CIO Asia eNewsletters.