BLOG: Unstructured data really isn’t

Bradley S. Fordham, PhD | March 12, 2013
All data has structure that allows us to inspect, analyse, transform, and derive value from it.

Is this structure sufficient for the purpose of rendering a visual or audible experience to a Web surfer? Certainly. 

Is the semantics we understand at the level of HTML tags alone sufficient for finding all the students in Mr. Johnson's 3rd grade science class, even if that information is clearly part of the content of these pages? No. 

Lucky for us, HTML (or more precisely XHTML since it enforces the syntax more rigorously) is just a subset or specialised form of the eXtensible Markup Language (XML) which in turn is a subset of the Standard Generalized Markup Language (SGML). At these higher levels of structure we can achieve deeper levels of semantic understanding. In fact, we can find schemata very similar to what we see in databases. So, it is indeed possible that this data we have been given might be sufficient to perform this task of finding Mr. Johnson's 3rd grade science students if we can just raise our level of understanding of the structure of the information.

In conclusion, the next time someone starts to talk to you about unstructured data, think "balderdash!" quietly - but loudly - to yourself and start asking the right question. Is your understanding of the structure that exists sufficient to answer the questions or solve the problems that you would like to with the data at hand? If the answer is initially no, do not give up so quickly. Perhaps you can raise your sights a bit and reach another level of structural understanding that is sufficient to the challenges at hand.

Dr. Bradley Fordham PhD, The (ART+DATA) Institute.


