Diffbot is one of a number of companies building such "knowledge graphs," through various sets of technologies, said Dave Schubmehl, an IDC research director who covers content analytics, discovery and cognitive systems. Such technology could be of potential value to any business that relies on understanding large amounts of external data, he said via email.
Another company working in this field is IBM, Schubmehl wrote. Last year, IBM purchased two companies to install similar capabilities in its Watson cognitive computing service. One was AlchemyAPI, which builds taxonomies of data assets, and the other is Blekko, which developed software for indexing Web sites.
Some organizations use other technologies to organize and synthesize large sets of otherwise unstructured information, according to Schubmehl. Neo4J and Oracle both offer graph databases, which are well-suited for identifying the connections across large collections of data. Others rely on semantic Web standards, such as the Sesame Java Framework, which is used for converting data into the structured RDF (Rich Description Framework) format.
Sign up for CIO Asia eNewsletters.