Diagnostic Testing 101.1: The Importance of Sensitivity, Specificity and Diagnostic Test Accuracy

To have striven, to have made an effort, to have been true to certain ideals — this alone is worth the struggle. We are here to add what we can to, not to get what we can from, life. – William Osler


IMG_8263

Diagnostic Medicine

Diagnostic medicine is the process of identifying the condition or disease that a patient has and  ruling out conditions or diseases the patient does not have through assessment of  the patient’s signs, symptoms, and results of various diagnostic tests.

Diagnostic Test Accuracy

Diagnostic test accuracy is simply the ability of the test to discriminate among alternative states of health (Zweig and Campbell, 1993).

If a test’s results do not differ between alternative states of health, then the test has insignificant accuracy; if the results do not overlap with other states of health then the test has perfect accuracy.  Most tests accuracies fall between these two extremes.

The intrinsic accuracy of a test is measured by comparing the test results to the “true condition status.”

‘True condition status”  refers to one of  two mutually exclusive states.  Either a condition is present or it is absent.  

We determine true condition status by means of a “gold standard” which is a source of information completely different from the test under evaluation which tells us the true condition status of the patient.

Say we want to develop a new rapid test for detecting strep throat.    Strep throat is caused by the Streptococcus bacteria.   Although more common in children and adolescents it can occur in people of all ages.  Strep throat is one of many possible causes of sore throat and pharyngitis.   It is contagious and can cause complications such as rheumatic and scarlet fever.  Treatment with antibiotics can shorten the course of the disease and reduce the risk of complications.

Pos_strep

A throat culture is obtained by swabbing the patient’s throat with a cotton swab.  The sample is then sent to the lab where it is cultured.  If strep is present it will grow on the culture and look as below.     The bacteria either grows on the culture or it doesn’t.  A throat culture is the “gold standard” for diagnosing strep throat.  The problem is it may take two days to get back.

imgres-4

Sensitivity and Specificity

The two most important measures of diagnostic test accuracy are sensitivity and specificity.     

The probability that a test will be positive in someone with the condition =  Sensitivity

The Probability that a test will be negative in someone without the condition = Specificity

For diagnosing strep throat we want our test to be as close as possible to the gold standard in terms of both sensitivity and specificity.

Sensitivity and specificity can be illustrated by a table with two rows and two columns.  This simple  Decision Matrix  where the rows summarize the data  according to the true condition status of the patients and the columns summarize the test results.  This table is called a “count table” because it indicates the numbers of patients in various categories.      The total number of patients with and without the condition is, respectively n\ and n0; the total number of patients with the condition who test positive and negative is respectively s\ and s0; and the total number of patients without the condition who test positive and negative is respectively r\ and ro.

The total number of patients in the study group N, is equal to N = si+so+rx+ro, or N = n\ + no·

The true condition status is symbolized by the variable D, where D = 1 if the condition is present and D= 0 if the condition is absent.

Test results indicating the condition is present are called positive; those indicating the condition is absent are called negative.

Test results are symbolized  by the variable T, where T =1 denotes positive test results and T= 0 denotes negative test results.

Screen Shot 2015-02-02 at 1.32.12 PM

The sensitivity (Se) of a test is its ability to detect the condition when it is present.

We write sensitivity as Se = P(T = 1 | D = 1), which is read:

“sensitivity (Se) is the probability (P) that the test result is positive (T = 1), given that the condition is present (D = 1).”

Among the n\ patients with the condition, s\ test positive; thus, Se = s\/n\.

The specificity (Sp) of a test is its ability to exclude the condition in patients without the condition.

We write specificity as Sp — P(T = 0 | D — 0), which is read:

“specificity (Sp) is the probability (P) that the test result is negative (T = 0), given that the condition is absent (D = 0).”

Among no patients without the condition, ro test negative; thus, Sp — TQ/UQ

False Negative and False Positive Tests

There are consequences associated with all test results.

False Negative Tests:   If a test falsely indicates the absence of a condition in someone who truly has it then treatment can be delayed or not provided.

The consequences of a false negative strep test depend on what we do with it.   Serious consequences can arise if we use our new strep test as the sole basis for subsequent decision making.     Putting complete trust in the negative test result would lead to no antibiotic treatment provided to a patient with Strep  and can lead to continued illness,  spread of the disease and complications that would not have occurred if antibiotics were provided.  The patient could potentially get rheumatic or scarlet fever.

If the new test is negative  but a culture was drawn the false results could delay treatment by a couple days or so but treatment is nevertheless provided.  The consequences are likely to be minimal.   It is highly unlikely a patient would get rheumatic or scarlet fever  as, although a little later, they are still  being treated with the proper antibiotics.

False Positive Tests:   If a test falsely indicates the presence of a condition in someone who does not truly have it then unnecessary tests and treatments can occur.  Incorrect treatment and false labeling of patients can also occur.

In the case of a false positive strep test, a patient may undergo a course of antibiotics when they do not need them.     Although the patient may suffer side-effects from the antibiotics the severity and duration of any  of these consequences are minimal.

Screen Shot 2015-02-02 at 9.14.11 PM

The importance of a Diagnostic Accuracy in testing is directly proportional to the tests potential to cause patient consequences and harm.

Diagnostic Medicine uses a patient’s signs, symptoms and the results of various diagnostic tests to arrive at a diagnosis.

In diagnosing strep throat a good clinician will take into account  a number of variables in consideration of a differential diagnosis and base testing and treatment on the preponderance of information supporting or opposing the diagnosis.

For strep throat using the new test in addition to a throat culture, history and careful physical exam and basing the decision to prescribe antibiotics on clinical acumen based on the overall picture is the best approach.     The test can  be considered a piece of the puzzle but does not define it.  Therefore the risk of a false positive or false negative is minimal as it is just one data point.

Diagnostic accuracy is necessary if a test is being used as the  basis for further tests and treatment.  If  a test  is  being used as the sole basis for further tests and treatment it needs to be accurate.   If the results of a test can cause significant patient harm or death then it needs to  be either 100% accurate or combined with other highly accurate tests to confirm the diagnosis.

The specificity of a test is particularly important as a false positive can result in unneeded interventions and treatment.     Stand-alone tests used in diagnosis and treatment need to be both sensitive and specific.    Diagnostic accuracy is a product of consequences of  false-negative and false positive tests.

 Diagnostic Research Methodology

Research to discover the accuracy of a diagnostic test should be straightforward; administer the test to a group of people and see if it works.

The test being tested is the “index test”. Results of the index test are compared with the results of a “gold standard” reference test.

The research question is, “How accurately do index test results predict the (true, gold standard) reference test results?”

Diagnostic test accuracy studies require a sample of subjects  who have been given the test under evaluation,  some form of scoring of the tests findings and a reference or “gold standard” to which the test findings are compared.   Examples include autopsy reports, surgery findings and pathology results from biopsy findings.

The gold standard for a patient’s true disease status may not always be available.    A  brain biopsy could be considered a gold standard for diagnosing Alzheimer’s disease but is neither practical nor humane.

The Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool is a set of fourteen questions that investigate the methodologic quality of scientific studies that quantify diagnostic test performance.

Screen Shot 2015-02-02 at 6.07.29 PM

The questions identify research methodologies known to bias the accuracies research discovers.

Multiple factors need to be considered in  evaluating the diagnostic accuracy of a test including diagnostic validation and  verification.   Is the test testing what it is supposed to be testing for and are we doing it correctly?

Diagnostic accuracy of a test necessitates a reference standard,  The reference standard can be the best available method for establishing the presence or absence of a condition (such as the throat culture for strep throat) or a combination of methods (imaging, neuropsychological testing, clinical exam, etc. in Alzheimer’s disease.

Any test that is going to be used as a basis for decisions that impact other human beings needs to  be validated before it is introduced on the market.  The literature needs to  be reviewed critically and trials must be designed using objective evidence that validates the test is testing for what it purports to be and verifies the correct methodology of the test.  Verification that the test is being collected, handled, stored, transported and processed  correctly is requisite.

Cutoff levels, , cross-reactivity and myriad other issues need to be worked out prior to bringing a diagnostic test to market.

Screen Shot 2015-02-02 at 8.51.26 PM

\Screen Shot 2015-02-02 at 8.51.49 PM

Screen Shot 2015-02-02 at 8.52.02 PM

The reliability, validity and accuracy of drug test results needs to  be known prior to using a test.  Specificity and sensitivity must be known prior to using a test on any population.

This should go without saying as to do anything else would be irresponsible and careless.

References

Evidence-based medicine, systematic reviews, and guidelines in interventional pain management: part 7: systematic reviews and meta-analyses of diagnostic accuracy studies Pain Physician 2009, 12(6):929-963. PubMed Abstract | Publisher Full Text

Jaeschke R, Guyatt G, Lijmer J: Diagnostic tests. In Users’ guides to the medical literature: a manual for evidence-based clinical practice. Edited by Guyatt G, Rennie D. AMA Press; 2002:121-140.

Lundh A, Gøtzsche PC: Recommendations by Cochrane review groups for assessment of the risk of bias in studies.BMC Med Res Methodol 2008, 8:22.doi:10.1186/1471-2288-8-22 PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

Streiner DL: Diagnosing tests: using and misusing diagnostic and screening tests.J Pers Assess 2003, 81(3):209-219. PubMed Abstract | Publisher Full Text OpenURL

Whiting P, Rutjes AW, Reitsma JB, Bossuyt PM, Kleijnen J: The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol 2003., 3(25)  http://www.biomedcentral.com/1471-2288/3/25 webcite

OpenURL

GCP, good clinical practice; GCLP, good clinical laboratory practice; GLP, good laboratory practice; STARD, standards for reporting of diagnostic accuracy. See Section III, 2.13  From Nature Reviews Microbiology 4,S20–S32(1 December 2006) | doi:10.1038/nrmicro1570

GCP, good clinical practice; GCLP, good clinical laboratory practice; GLP, good laboratory practice; STARD, standards for reporting of diagnostic accuracy. See Section III, 2.13 From Nature Reviews Microbiology 4, S20–S32 (1 December 2006) | doi:10.1038/nrmicro1570

13 thoughts on “Diagnostic Testing 101.1: The Importance of Sensitivity, Specificity and Diagnostic Test Accuracy

  1. Very informative. Glad you’re educating people about this stuff. A lot of people are becoming suspicious and distrusting of modern medicine for whatever reason. Maybe if they read more things like this they’d not be so prone to believing a bunch of nonsense that doesn’t work. I’m not saying that there aren’t alternative things out there that work, I’m just saying people should never throw modern medicine out the door completely.

    Like

  2. It’s hard to believe that psychiatry even qualifies as a field of medicine. I don’t see why its peers allow it to use the term “diagnosis”. It’s swagger is directly proportional to its lack of any biological basis

    Like

  3. Reblogged this on Disrupted Physician and commented:

    Multiple factors need to be considered in evaluating the diagnostic accuracy of a test including diagnostic validation and verification. Is the test testing what it is supposed to be testing for and are we doing it correctly?

    Diagnostic accuracy of a test necessitates a reference standard, The reference standard can be the best available method for establishing the presence or absence of a condition (such as the throat culture for strep throat) or a combination of methods (imaging, neuropsychological testing, clinical exam, etc. in Alzheimer’s disease.

    Any test that is going to be used as a basis for decisions that impact other human beings needs to be validated before it is introduced on the market. The literature needs to be reviewed critically and trials must be designed using objective evidence that validates the test is testing for what it purports to be and verifies the correct methodology of the test. Verification that the test is being collected, handled, stored, transported and processed correctly is requisite.

    Cutoff levels, , cross-reactivity and myriad other issues need to be worked out prior to bringing a diagnostic test to market.

    Like

  4. Very pleased to see your explanation of this Michael, especially in your coverage of the dangers of false positives.
    While you’ve covered it elsewhere in your blog, it bears highlighting that while a “false positive” in clinical medicine can lead to more refined testing before one begins on a costly treatment regimen, a “false positive” in forensic medicine can lead not only to loss of one’s right to practice but in fact to one’s very freedom. And, since the very test that is yielding the false positive is explicitly known to produce such, the likelihood that more people are going to be falsely deprived of their civil rights and their fundamental liberty is concomitantly higher. PHPs and Medical Boards know this and have been complicit with this scam.
    This would be bad enough even if considered alone. But in the white collar licensed professions like medicine, the use of such a test occurs in a setting where the deprivation of the protections afforded by due process is routine. In other words, if the test comes back positive – even though it’s a false positive – you’re guilty until proven innocent. And you’ve got to spend a fortune to prove your innocence while you’re removed from your practice, deprived of making a living, and coerced into a “preferred program” for extended treatment and 5 years of “monitoring” – ironically using the very same test that falsely established your diagnosis! And it will take years to extract yourself from such a bureaucratic entanglement. One thing is certain – you will not come out of this mauling intact. You will be like the increasing numbers of unfortunates who have been set up by a deeply broken judicial system, framed on false evidence, and sent to prison.
    Astoundingly, some Physician Health Programs (PHPs and PHSs and congeners) are using an alcohol usage screening test (the EtG amongst others) that they got approved as a LDT – a laboratory developed test (see elsewhere on this blog). That LDT bypassed the FDA process which requires rigorous testing to establish its sensitivity and specificity, in other words to prevent a test from being introduced into the market which yields too many false positives.
    But here’s the more amazing thing – SAMHSA, the Substance Abuse and Mental Health Services Administration (a division, I believe, of DHHS) actually issued two explicit alerts in both 2006 and 2012 specifically advising against these tests’ usage in the forensic environment. The PHP enterprise is explicitly a forensic enterprise as, by definition of their role, they are conducting “fitness-for-duty” forensic diagnostic psychiatric evaluations on behalf of a professional (here, the medical) licensing board.
    The conclusions are obvious. PHPs are knowingly using tests which produce false positives to incriminate physicians and compel them to enter into their “preferred network” of costly evaluation and lengthy 3 month treatment programs. Ad they are under state protection in doing so. And the medical licensing boards with which they are affiliated are fully complicit in this crime.
    I am convinced this will turn out to be a scandal equivalent in magnitude to the Annie Dookhan falsified evidence case and the forensic fraud committed by the FBI’s hair and fiber analysis forensic lab (uncovered by FBI whistleblower Fred Whitehurst) and the compounding pharmacy contamination scandal.
    Do you realize how many physicians (and many other medical professionals’) careers have been sabotaged by this fraud? Do you realize how many other professionals are soon going to be subjected to similar “professionals health / employee assistance” programs testing abuses? And then marched into their licensing boards for a kangaroo court? If physicians and other professionals don’t wake up and demand accountability, especially given the worsening prognosis for being provided due process in responding to such contrived findings, it’s going to be too late. Their careers, as unbelievable as it may seem, will be wiped out.
    In a separate essay, I’ll write about the perverse incentives that keep such a system embedded. For now: it feeds the legal “professional license defense” industry; it makes it look like the medical licensing board and their legal department and investigators are “protecting the public;” it let’s PHPs keep their lucrative referral pipeline of falsely diagnosed docs flowing to their “preferred programs,” all of which are FSPHP members; and it creates an exceedingly fine profit potential for the drug testing labs, a select number of which the member PHPs also have “preferred relationships” with. (Some treatment programs actually own their own labs!).

    Liked by 1 person

  5. Reblogged this on Petrossa's Blog and commented:
    This comment just about nails it down. Any test only tests what is defined in it’s parameters. Get the parameters wrong and the test is worthless or at least highly questionable. This principle imho lies at the root of how DSM wildly diverges from actual occurrence of described ‘afflictions’ and what those who make their living diagnosing it & ‘curing’ it assume exist

    Liked by 1 person

  6. Isn’t it odd that in a courtroom, testimony from witnesses is limited to what the witness perceived. When I sent an affidavit in for a court hearing, as witness, I was told I couldn’t add opinion nor judgement. In fact, I knew that to do so would only hurt those I was trying to assist. This was a property law case and I was called in to make a statement about the organization (a hospital) that was trying to illegally obtain land by stating it was an educational institution. All this was done based on fact and not opinion. You would think that opinion wouldn’t be desirable in any court situation. However, court shrinks are highly paid for their “opinion” simply because there’s no reliable test for “mental illness.” The experiments done decades ago show that people without mental problems entered a hospital and were kept there under the assumption that they were gravely ill. They were seen only through the distorted lens of phony psych diagnoses, and this clouded the view of every single worker who dealt with the patients. This microcosm shows us that “opinion” is not a reliable test for what is supposedly a medical condition. I don’t think shrinks should be given God-like status as far as their opinions go. An orthopedist can only make a judgment based on real evidence, such as an xray. The bone is broken or it isn’t. A person is dead or alive. For godsakes they’d better get that one right. I like your strep throat analogy and your reasoning as to why an accurate test is vital even if there’s a waiting period.

    Liked by 1 person

Leave a comment