The Semantic Data Model in Big Data
One of the keys to taking unstructured data--audio, video, images, unstructured text, events, tweets, wikis, forums and blogs--and extracting useful data from it is to create a semantic data model as a layer that sits on top of your data stores and helps you make sense of everything.
"We have to put data together from disparate sources and make sense of it," said David Saul, chief scientist at State Street, a US financial services provider that serves global institutional investors. "Traditionally, the way in which we've done that and the way in which the industry has done that is we'll take extractions of that data from however many different places and build a repository and produce reports off that repository. That's a time-consuming process and not an extremely flexible one. Every time you make a change, you have to go back and change the data repository."
To make that process more efficient, State Street set out to establish a semantic layer that allows data to stay where it is, but provides additional descriptive information about it.
"We have to deal with a lot of reference information," Saul said. "Reference information can come from different sources. Our customers may call the same thing by two different names. Semantic technology has the ability to indicate those things are in fact the same thing. For instance, someone might call IBM 'IBM' or 'International Business Machines' or 'IBM Corporation' or some other variation. They really are the same thing. By showing that equivalence within the semantic layer, you can indicate they're the same thing."
Another example involves State Street's risk management business.
"If we're trying to pull together a risk profile for all of the exposures we have to a particular entity or geography or whatever, that information is kept in lots of different places -- numerical information in databases, unstructured information in documents or spreadsheets,” said Saul. “We see that providing a semantic description for these various sources of risk information means we can quickly pull together a consolidated risk profile or an ad hoc request.”
He added: “One of the other benefits that we see is that semantic technology, unlike a lot of other things, doesn't mean we have to go back and redo all of our legacy systems and database definitions. It lays on top of that, so it's much less disruptive than another type of technology that would require us to go to a clean slate. We can do it incrementally. Once we've provided a semantic definition for one of these sources, we can add on other definitions from other sources without having to go back and redo the first one."
Sign up for CIO Asia eNewsletters.