How to Keep Score

by Barry P Chaiken, MD

In this age of Yelp, TripAdvisor, and Amazon, product reviews greatly influence the items consumers purchase. Anyone who regularly buys online understands over time the validity of various product reviews and learns how to apply a personal algorithm to cut through to meaningful product information while ignoring prefabricated, biased content. These reviews answer product questions, guide consumers in the use of a service, and set overall expectations to a proper level.

In addition to reviews, scoring systems provide a shortcut to evaluating a product or service by assigning a numeric value to the item. For example, wine critics employ a variety of scoring systems to evaluate wine. One system, created by the famous wine critic Robert Parker, uses a scale that ranges from 50 to 100 points. Of the 50 points up for grabs, 5 points are available for rating color or appearance, aroma and bouquet may be awarded 15 points, flavor and finish merit 20 points, and the potential for improvement—or aging—assigns up to 10 points.

The scores try to assign an overall value to the wine generated from the individual scores given to those four criteria. For someone interested in choosing a wine for drinking tonight, a wine with a 90-point score that reflects its higher aging potential may be an inferior choice when contrasted with a 88 point wine that is not as age worthy. If the consumer intends to drink the wine immediately, its age worthiness does not matter, and in some cases may be unpleasant to drink as it often requires aging to become palatable.

Scoring Presents Bias

If you think this scoring systems appears misleading and a bit vulnerable to subjectivity, join the parade. Review of the wine literature clearly demonstrates that no human being is able to repeatedly and consistently assign the same score to the same wine evaluated on separate occasions. Yet, wine shops price their wines based on this same unreliable scoring system, and we often purchase wine using this flawed information. Although it is obvious that the difference between a 90-point and 89-point wine is insignificant, and the higher scored wine may potentially be inferior depending on our intended use, the price difference to the consumer for the higher scored wine can easily exceed $30.

Purchasing health information technology is obviously a risky and more difficult decision than choosing a bottle of red wine for dinner. Clinical system price tags often exceed $100s of millions and administrative systems frequently push past six figures. Many a C-suite career is made or destroyed by these purchasing decisions.

To help with this purchasing decision, organizations often use product ratings provided by self-appointed industry organizations to evaluate enterprise applications. Although often valuable in guiding a purchasing decision, these ratings suffer from the same problems consumers face in purchasing wine, choosing a hotel, or buying a product from Amazon. The rules that define scientifically valid statistics apply equally to determine the value and usefulness of product reviews.

For example, sample size represents one criteria that helps determine the validity and reliability of reviews. We all trust a product review that includes hundreds of individual opinions much more than one that includes only ten responses. This same intuitive rationale applies to the use of statistics in evaluating product reviews.

Beware Those Under 30

One rule of thumb is that a sample size of less than 30 indicates a questionable result. Therefore, the next time you read an enterprise software product review, examine the number of respondents. If it is less than 30, ignore the overall product review and focus on the individual responses. They provide the most valuable information for your decision-making process. Note: practically all of these product reviews evolve from surveys, a relatively unreliable form of data collection fraught with statistical faults, including sampling and response bias.

Many of these product reviews include overall product scores. These scores, subject to similar statistical flaws as wine scoring systems, often deliver even more misleading information than the product reviews themselves. These scores, frequently “calculated out” to three or more digits, ignore the basic mathematical significant digits concept taught in every freshman chemistry class. To refresh, significant digits are digits in a number that represent the true precision of a measurement. This includes all digits except:

    • All leading zeros – (e.g., 0.00045 has only two significant digits)
    • Trailing zeros – (e.g., 4.5 X 106 or 4,500,000 both have only 2 significant digits)
    • Spurious digits introduced to imply greater precision than that of the original data – (e.g., 22 /17 = 1.3 and not 1.294117647)

Without doing the calculations here, it is easy to logically conclude that a product review with a sample size of less than 30 should not include a product score with three significant digits. Making such calculations by introducing digits to imply a level pf precision that does not exist misleads those relying upon the product review to make important product decisions.

How to Choose

In “Eyes Wide Open: Purchasing Clinical IT” (2007), I presented several ideas on how to approach the purchase of complex, difficult-to-evaluate health information technology systems.

Senior executives must accept the fact that full investigation of the features and functionality of clinical information technology systems before purchase is nothing less than an impossible task.

No individual or even committee has both the technical expertise and available time to effectively evaluate or deeply review the capabilities of a comprehensive clinical information technology system. Therefore, organizations must base their decisions to purchase systems on factors that function as surrogates for the usefulness and appropriateness of the systems in their institutions.

I also offered the following principles that apply to the purchase of any health information technology product:

    • Start any evaluation with an honest assessment of the organization’s strategic vision and link that vision to reason for purchasing the software
    • Broadly explore available options without allowing review scores to bias this initiative
    • Seek a strong vendor partner who you trust
    • Avoid software demonstrations to evaluate features and functionality, an impossible task for these complex software systems
    • Establish pre-implementation metrics to evaluate the purchasing decision

The importance of choosing health information technology software demands the most comprehensive and objective approach to decision making. Interjecting bias anywhere in the process presents a significant threat to making the correct decision for an organization. The complexity of the software makes the decision-making process vulnerable to bias, the same way consumers can be misled by focusing solely on scores when purchasing wine, a similarly complex, intimidating, and mysterious process.

Until rating organizations agree to embrace more scientific and defensible methods in their building of enterprise software reviews, provider organizations must resist the temptation to bias their decision making on these flawed evaluations. These organizations are better served by defining their decision making process, appropriately utilizing product reviews, and executing on their decision-making plan to evaluate all viable vendor organizations.


  1. Chaiken, B. P. (2007). Eyes wide open: Buying clinical it. Patient Safety & Qaulity Healthcare, 4(1), 6-7.

Excerpts from “How to Keep Score” published in Patient Safety and Quality Healthcare

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.