Archives September 5, 2011

Web 3.0 Data-Mining for Comparative Effectiveness and CDS

by Barry P Chaiken, MD

“Turbulent times” accurately describes the state of the American healthcare system. The list of critical challenges is well known—upward spiraling healthcare costs now approaching 17% of GDP, healthcare payment reform, shortage of clinical professionals, aging population, and the economic downturn. While current investments in health information technology (HIT) begin to deliver increased reimbursements to providers, these same at-risk organizations, along with payors, seek better ways to leverage HIT to enhance quality care and reduce costs.

Although much effort focuses on improvement of clinical workflows, an opportunity exists to transform healthcare delivery by implementing evidence-based clinical decision support at the point of care. Such clinical content delivered effectively within new, efficient clinical workflows directs patients toward evidence-based therapeutic plans that produce desired clinical and financial outcomes. While informaticists work on developing these clinical workflows, the lack of clinical knowledge limits the ability of organizations to leverage HIT in order to personalize therapeutic care plans.

Identifying affordable therapies

Comparative effectiveness research, supported by data mining, allows organizations to identify affordable therapies that enhance patient care. With the implementation of HIT, data warehouses contain petabytes of searchable clinical, outcomes, genomic, and financial data across multiple patient populations. Bringing together this data using sophisticated knowledge analytic tools and domain-specific interfaces allows researchers to discover relationships among multiple variables gleaned from previously unconnected databases.

In turn, this new clinical knowledge enables clinicians to personalize treatment for patients based upon their genetic background by linking it to descriptive patient data and outcomes. Personalized medicine transcends analysis of a population-based cohort by placing the patient within a sub-population that better reflects the expected outcome from a prescribed treatment. Embedding this personalized medicine knowledge within an EMR’s clinical decision support module facilitates the delivery of these evidence-based best practices at the point of care.

Use of protocols by payors

Payor organizations regularly utilize clinical protocols to manage clinical and financial outcomes among their various covered-lives populations. Although some payor organizations simply put up administrative barriers to limit care, most payors employ clinical experts to determine appropriate care. These experts develop protocols that direct care managers responsible for approving the care plans submitted by treating clinicians.

While in the past five years we have seen payors gradually move toward preventive-care focused on population health, this is now accelerating due to a provision in the Affordable Care Act that excludes pre-existing conditions as a cause for coverage denial. Payors’ financial survival depends upon their ability to identify high-risk populations and manage their care efficiently. They can no longer “cherry pick” the lowest risk individuals by denying coverage to those of higher risk.

Predictive modeling has offered payors only a crude method to identify high-risk beneficiaries. The ability to data mine clinical data sets, as noted above, offers payors an entirely new tool to more accurately identify high-risk populations requiring targeted interventions and customize those targeted interventions based on clinical, genomic, and other factors.

In addition, as accountable care organizations (ACOs) accept financial responsibility for providing patient care, they too will look to better manage their patient populations to reduce their economic risk.

Example: The diabetic patient

Diabetic patients represent an important target population for at-risk organizations. Preventing hospital admissions, emergency department visits, and cardiovascular complications among this population greatly serves their financial interests (e.g., medical-loss ratio goals of payors, outcomes metrics for providers). Pharmaceutical management of these patients, normally limited to a selection of hypoglycemic agents disconnected from the biological characteristics of the patient, delivers a level of care calibrated to the average patient in the population rather than the individual patient.

Through data mining of the now available repositories, organizations can discover treatment plans customized to very small sub-populations of their diabetic patients. The organizations can then develop protocols based upon that new knowledge that deliver better clinical outcomes for the patients and financial outcomes for their organization.

This approach is far superior to solely using an expert committee to develop treatment protocols, as this expert panel approach is often tainted by acceptance of unsubstantiated conclusions and clinical training bias. Data mining of their own data offers organizations scientifically founded conclusions with high probability of delivering expected results. These expert committees can then shift to utilizing these data-mining results to develop evidence-based protocols for multiple subpopulations of targeted disease patients

Semantic web

Wikipedia defines the semantic web (often referred to as Web 3.0) as:

“a web of data that facilitates machines to understand the semantics, or meaning, of information on the World Wide Web. It extends the network of hyperlinked human-readable web pages by inserting machine-readable metadata about pages and how they are related to each other, enabling automated agents to access the Web more intelligently and perform tasks on behalf of users.” (2011)

Effective use of semantic web technology in medical research requires the indexing of the available clinical data sets. These data sets include clinical data taken from EMRs, patient genomic data, existing genomic pharmaceutical databases, curated disease specific peer reviewed research, financial information (e.g., claims), and expert opinion.

Sophisticated software indexes the databases on metadata that “describe” each data point. Although the indexing allows for rapid retrieval of the data, it more importantly builds links among each data point based upon the descriptive information contained in the metadata. Discovery of these relationships is impossible without semantic web technology and the ability of computers to utilize it to read and understand metadata. Experts can utilize semantic web technology to query multiple large data sets to explore comparative effectiveness hypotheses. These results then form the basis for evidence-based protocols, specifically targeted at a variety of subpopulations.

For the entire history of medical research, investigators posed hypotheses and tested them to see what therapies proved effective. Advances in clinical knowledge grew from frequent comparison of different therapies, with clinicians shifting to those that offered the best results. Comparative effectiveness analysis forms the basis of all medical research. The availability of semantic web technology and newly constructed clinical data sets presents researchers with an extraordinary opportunity to rapidly explore clinical relationships within subpopulations of patients using data formerly unavailable. Perhaps the age of personalized medicine is finally upon us.


  1. Semantic Web. (2011, September 2). In Wikipedia, The Free Encyclopedia. Retrieved 18:12, September 2, 2011, from

Excerpts from “Web 3.0 Data-Mining for Comparative Effectiveness and CDS” published in Patient Safety and Quality Healthcare

Comments 1
Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.