Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Compliance Dictionary aims for a simpler life

Thor Olavsrud | July 7, 2016
With the assistance of machine learning, the UCF's Compliance Dictionary seeks to simplify the process of creating common controls with a lexicon that maps the connection between terms in authority documents.

A mandate can be connected to a common control only if the verbs and nouns in the mandate are related. And this is where the multiple variations in terms in authority documents become a problem. If your common control states, "protect Personally Identifiable Information" and the citation's mandate states, "safeguard private information," you have to prove the terms are connected. In this case, you must show that "protect" and "safeguard" are synonyms and that the nouns "private information" and "Personally Identifiable Information" match.

The Compliance Dictionary uses a combination of Natural Language Processing (NLP), Part of Speech Tagging and Named Entity Recognition engines to map citations in a repeatable, scientific method to common controls.

"Corralling the horse out of the barn is a fairly accurate way of explaining what the UCF team does with terms of art it finds in published authority documents," Cougias says. "During the citation mapping phase, the team of mappers use a combination of software and processes to scrape each citation for terms that already exist in its dictionary. The process then follows a patented procedure and uses additional patented tools to decipher whether the new terms should stand on their own, or whether they are simply additional non-standard forms of already accepted terms."

At that point, Cougias says, each new term is tagged by the mapper who then creates a "new term process" derived directly from the citation it was taken from.

"Think of it, most of the terms you've learned to use in your life, you learned in context and weren't given definitions for them," says Vicki McEwen, the UCF's head lexicographer. "It's the same process we use to scrape authority document citations."

If the authority document's citation doesn't provide enough evidence of a well-formed definition, the rest of the citations are also scoured for recurring usage and additional clues. Various additional dictionaries are then accessed through another patented structure for even more clues to the definition. Finally, if no authoritative sources prove useful, the team performs multiple Google searches before sending the term of art and its newly associated definition to the UCF's lexicographer.

"Term definitions and making the distinction between allowing a term to stand on its own or be associated as a non-standard derivation of an existing term takes a great deal of effort," McEwen says.

Machine learning keeps definitions clean

Machine learning assists the team in establishing a good definition and the capability to differentiate between new terms and sloppy nonstandard uses of a known terms.

The dictionary tracks exactly which term was pulled from which authority document and each element is linked, so users can search for terms and see how everything connects. Users can identify which authority documents apply to their organization and it will display which mandates apply and where they overlap.

 

Previous Page  1  2  3  Next Page 

Sign up for CIO Asia eNewsletters.