A mandate can be connected to a common control only if the verbs and nouns in the mandate are related. And this is where the multiple variations in terms in authority documents become a problem. If your common control states, "protect Personally Identifiable Information" and the citation's mandate states, "safeguard private information," you have to prove the terms are connected. In this case, you must show that "protect" and "safeguard" are synonyms and that the nouns "private information" and "Personally Identifiable Information" match.
The Compliance Dictionary uses a combination of Natural Language Processing (NLP), Part of Speech Tagging and Named Entity Recognition engines to map citations in a repeatable, scientific method to common controls.
"Corralling the horse out of the barn is a fairly accurate way of explaining what the UCF team does with terms of art it finds in published authority documents," Cougias says. "During the citation mapping phase, the team of mappers use a combination of software and processes to scrape each citation for terms that already exist in its dictionary. The process then follows a patented procedure and uses additional patented tools to decipher whether the new terms should stand on their own, or whether they are simply additional non-standard forms of already accepted terms."
At that point, Cougias says, each new term is tagged by the mapper who then creates a "new term process" derived directly from the citation it was taken from.
"Think of it, most of the terms you've learned to use in your life, you learned in context and weren't given definitions for them," says Vicki McEwen, the UCF's head lexicographer. "It's the same process we use to scrape authority document citations."
If the authority document's citation doesn't provide enough evidence of a well-formed definition, the rest of the citations are also scoured for recurring usage and additional clues. Various additional dictionaries are then accessed through another patented structure for even more clues to the definition. Finally, if no authoritative sources prove useful, the team performs multiple Google searches before sending the term of art and its newly associated definition to the UCF's lexicographer.
"Term definitions and making the distinction between allowing a term to stand on its own or be associated as a non-standard derivation of an existing term takes a great deal of effort," McEwen says.
Machine learning keeps definitions clean
Machine learning assists the team in establishing a good definition and the capability to differentiate between new terms and sloppy nonstandard uses of a known terms.
The dictionary tracks exactly which term was pulled from which authority document and each element is linked, so users can search for terms and see how everything connects. Users can identify which authority documents apply to their organization and it will display which mandates apply and where they overlap.
Sign up for CIO Asia eNewsletters.