I always get confused as to the meaning of Specificity and Sensitivity when it comes to interpreting the validity (validity is the ability of a “test” to actually measure what we think it is measuring – for example if we apply a test designed to quantify the number of people in a room who have a rotator cuff tear – does the test do what we presume it can do – ie identify the number of people with a rotator cuff tear in the room) of the numerous physical tests that therapists use in arriving at a diagnosis for patients. I was always quite skeptical of the percentage figures that papers arrive at when evaluating the value of a certain diagnostic test. Equally, I often felt uneasy when Therapists would reel off the fact that the tests they performed were 80 % specific and/or sensitive and used this to confidently put forward a clinical diagnosis. This is mainly because these figures have been calculated on a very narrow patient range which may or may not represent the person that sits in front of the therapist. To this end the confidence interval is important – this tells us the range so that given the same population and on a different occasion the figure could be as low as the lower end of the confidence interval.
However, here is what I have managed to understand from reading around a bit. Sensitivity is the proportion of people with the “condition of interest” (on whom we are applying a certain test) who will have a positive result. A test that correctly identifies a person who actually has the condition is termed a TRUE POSITIVE. Very highly sensitive tests are able to determine alot of the people who actually have the condition as positive (seeks to identify TRUE POSITIVES), however what it might do also is identify alot of people who do not have the condition as positive – this is termed a FALSE POSITIVE. So lets say 100 people come into the clinic, 50 of whom have a tear in their rotator cuff muscle (a muscle in the shoulder) and 50 do not, a very highly sensitive test may indeed identify almost all of the 50 people with an actual cuff tear as POSITIVE (TRUE POSITIVES) but it may also identify alot of people as positive who do NOT have a tear (FALSE POSITIVES). So we could interpret the test in another way and say, if the test is negative and it is able to identify pretty much all of the people with tears then any negative test would tell us that the person does not have a tear. (This is because there are FEW FALSE NEGATIVES with a highly sensitive test). So a mnemonic was devised : SNNOUT (SeNsitive tests when Negative, rule the condition OUT).
Now then, how do we work out a sensitivity value ? ie a percentage value. For example a 100% Sensitive test, run on a cohort of 100 people with the condition we are testing would identify all 100 people as TRUE POSITIVE. Another example:
Let’s say the size of the population is 100 and the prevalence of say a rotator cuff tear is 30 out of 100 ie 30% (this means that we know in a certain cohort of patients – let’s say those over the age of 40 years old – the prevalence of a cuff tear might be 30% – ie 30 out of the 100 have a rotator cuff tear (NOT STRICTLY TRUE but just an example), we apply a test and get the result below:
We can see that the test managed to identify 24 people as TRUE POSITIVE but it also identified 14 FALSE POSITIVES, 6 FALSE NEGATIVES and 56 TRUE NEGATIVES. We calculate the test’s SENSITIVITY by looking at how many of the TRUE POSITIVES it was able to identify ie 24/30 = 0.8. Multiply this by 100 to give us our percentage sensitivity value ie 80%. So this test is 80% sensitive. It is failing to identify 20% of the people who actually have the disease.
Antithetically, the SPECIFICITY of a test – is the test’s ability to identify people who DO NOT have the condition we are testing for i.e SPECIFICITY is the tests ability to identify TRUE NEGATIVES ie of the people who DO NOT have the condition, how many are correctly identified. So if 100 people did not have the condition who underwent a 100% SPECIFIC test – all should test negative. HOWEVER, a highly SPECIFIC test may be able to identify most of the people as negative who do NOT have the condition but in doing so it may well identify alot of people who do have the condition as negative also (FALSE NEGATIVES). So in contrast to a SENSITIVE test, a highly specific test would have few FALSE POSITIVES. Therefore a SPecific test when testing Positive would rule the condition IN (Hence the mnemonic SPPIN). So a test for a rotator cuff tear that is highly SPECIFIC which comes back negative does NOT mean that they have not got a rotator cuff tear because as stated above: a highly SPECIFIC test identifies the TRUE NEGATIVES but in doing so, it also captures a number of FALSE NEGATIVES. BUT if that test came back positive and the test was highly SPECIFIC – then we would be pretty confident that they HAVE got a rotator cuff tear.
SO in the example above, the test identified 56 TRUE NEGATIVES and missed 14 TRUE NEGATIVES – because these were identified as FALSE POSITIVE. So it identified 56/(56+14) = 56/70 = 80% so the SPECIFICITY of this test is 80%.
There is something called the POSITIVE PREDICTIVE VALUE – this tells us what the chance is that a POSITIVE result is actually POSITIVE. So to calculate this – we look at ALL of the positive results (ie the shaded area in the picture) – there were 24 TRUE POSITIVES and 14 FALSE POSITIVES – so it identified 24 out of a total of 38 (24+14) positives ie 63%. This tells us that 63% of the POSITIVE results are correct. It’s PPV is 63%.
The NEGATIVE PREDICTIVE VALUE looks at the negative results in the chart (the unshaded area in the picture). There were 56 TRUE NEGATIVES and 6 FALSE NEGATIVES ie it determined 56/62 ie 90% correctly. So the NEGATIVE PREDICTIVE VALUE is 90% – ie the chances of the NEGATIVE result actually being NEGATIVE is 90%.
This would make sense because we have stated from the outset that the prevalence of a Rotator cuff tear in this cohort of people was 30% – ie it was LOW – so it seems obvious then that the chance of the test coming back NEGATIVE would be higher than it coming back POSITIVE. So we can see that the PPV and the NPV changes if the prevalence (i.e the number of people WITH or WITHOUT the condition in the cohort you are testing, changes) i.e prevalence is the probability that a person has a condition before we do the test.
This means that we need to take into consideration the prevalence of that condition for which we are testing in the person in front of us. For example if i have a 20 year old patient in front of me with shoulder pain who tests positive for a rotator cuff tear which uses a test which is 80% specific and 80% sensitive – i need to remember that in actual fact the prevalence of a rotator cuff tear in such a young person is actually very small. This begs the question – “How was the Specificity and Sensitivity values arrived at in the first place ?” – what Research paper calculated this figure and on what kind of population ? if they calculated it on a bunch of people over the age of 65 (let’s say) then the values do not really reflect the 20 year old patient sat in front of me when I perform the test.
Another important aspect to remember is – when the values were calculated – how were they sure that the patient had a rotator cuff tear in the first place. We need to be sure that the patient had a rotator cuff tear when we are measuring whether the test we are performing in the research paper is actually capable of measuring whether they do or don’t have a cuff tear. There must be a GOLD STANDARD measurement against which we can measure our test. I.e if we cut open the patients shoulder and have a look inside and see a tear with our eyes – we are then pretty much sure that they DO HAVE a cuff tear and the GOLD STANDARD against which we are measuring is the “cutting the shoulder open to look inside”.
Here is an example: let’s say the prevalence of a cuff tear is 33 in 100,000 and a rotator cuff tear test has a 94% Sensitivity and a 97% Specificity. So how many TRUE POSITIVES (sensitivity) will it identify? 94/100 x 33 = 31 TRUE POSITIVES leaving 2 FALSE NEGATIVES.
The test has a 97% SPECIFICITY so how many TRUE NEGATIVES will it identify ? Well we know that there are 100,000 – 33 ie 99967 TRUE NEGATIVES. So 97/100 x 99967 = 96968 TRUE NEGATIVES will be identified, leaving 99967-96968 ie 2999 which are TRUE NEGATIVE which aren’t identified and are FALSE POSITIVE. The PPV is 31/31+2999 x 100 = 1% ie 1% of the total number of positives ARE actually positive because there are so many FALSE POSITIVES. The NPV is 96968 (TRUE NEGATIVES) identified out of a total of 96968 +2 TRUE NEGATIVES (because there were 2 FALSE NEGATIVES) IE 96968/96970 X 100 = 99.99% NPV.
So despite the SPECIFICTY being so high ie 97% and we know that SPPIN (Specificity tests when positive rule the condition IN), in actual fact the probability that the positive result IS actually positive is only 1% !
HOWEVER we need to take other factors into consideration – let us say that the person in front of us was over 65 and had had a fall onto their arm recently and the arm was weak. Suddenly this person falls out of the REGULAR cohort of people on whom the Specificity and Sensitivity was calculated. The PPV does not now mean that much to us because the prevalence of a rotator cuff tear in a cohort of people over the age of 65 with a fall on the arm whose arm is weak may be as high as 90% – or put another way of 100,000 65 year olds who had fallen onto the arm and the arm was testing weak, 98,000 actually may have a cuff tear ie 98% prevalence. This dramatically changes the PPV figure.