Two important developments have come together to make possible the kind of precision medicine research that Penn Medicine's IBI is doing. First, EHRs have become widespread in the past few years: most hospitals and more than 80 percent of physicians now have these systems. Second, the cost of genomic sequencing has dropped to around $1,000 for a complete genome. The cost of partial genome or exome sequencing is less than that. As a result of these trends, the idea of correlating genotypic and phenotypic variants to discover individual responses to diseases and drugs is now feasible.
To perform this kind of research, Penn Medicine has created a specialized "bio-bank" that, so far, has stored about 20,000 genomic samples with patients' permission, says Brian Wells, associate vice president of health technology and academic computing for the healthcare system. A separate center for personalized diagnostics has sequenced tumor genomes for more than 5,000 patients, he notes.
The sheer volume of genomic data is staggering. For example, Penn Medicine has two petabytes of disk space in its high performing computer cluster, and it plans to expand that, says Wells.
"One researcher told us that in the next few years, he might go from five to 30 petabytes of space related to neuroscience sequencing. So we're prepared to add to that as we need to," he notes.
Challenges for CMIOs and CIOs
The biggest challenges that Hanson faces as Penn Medicine grapples with its big data projects, he says, is the lack of interoperability among EHRs and the need for good, clean, structured data. Currently, Penn has different EHRs in its hospital, ER, ICU and ambulatory practices, but it is moving to a single system. Structured clinical data is harder to deliver, however, because "clinicians tend to document in an unstructured way," he says.
Penn intends to use natural language processing (NLP) to mine unstructured data in EHRs and convert it into structured information, Wells notes. "That’s for retrospective analysis rather than clinical decision support, because you can't rely it on it 100 percent of the time," he adds.
Current big data methods are adequate for processing the huge flood of genomic data, but bio-informaticians who know how to work with this data are in short supply, Steinhubl says. He predicts that a bottleneck will develop in data processing and storage when healthcare providers begin to review the physiologic data that is expected to flow in from mobile devices and wearable sensors.
Nevertheless, Steinhubl is very excited about the promise of big data in fields like precision medicine and clinical quality improvement. "Eventually, it's going to completely change medicine and the way we treat common chronic conditions,” he says.
Sign up for CIO Asia eNewsletters.