Even before it gets to a jury, I'm worried that it's going to confuse investigators.
Note that there is a subtle but very large difference between "this test correctly identifies X/non-X 70% of the time" and "when this test says the perp is X there's a 70% chance it's correct".
For instance, let's say 5% of people and 5% of criminals in some region are redheads. So out of 1000 criminals, 950 will be redheads and 50 will be non-redheads. A test that's 70% accurate in both directions will end up mistakenly identifying 285 non-redheads as redheads, and it will only pick 35 of the redheads. So when the test says "redhead", there's only a 35/320 = 11% chance that the perp actually is a redhead.
Sure, it's still giving you some information: the conditional probability that the perp is a redhead has gone from 5% to 11% with this information. But as an investigator... what can you actually do with that information? It's nowhere near firm enough to say that you should exclude non-redheads from the investigation. And if it goes the other way and tells you the perp is a non-redhead, there's still a 2% chance that they actually are.
also: A test that's 70% accurate in both directions will end up mistakenly identifying 285 non-redheads as redheads,
That's not quite how it works. They're not testing "is it a redhead", It's more "if it says the criminal is a redhead, it has a 70% chance of being right"
Are you sure of that interpretation? The abstract seems ambiguous to me: "the individual-based prediction accuracies employing a prediction-guided approach were 69.5% for blond, 78.5% for brown, 80% for red and 87.5% for black hair colour". There are several different ways I could interpret that:
(1) When we test on a redhead, the test returns "red" X% of the time. (2) When we attempt to test whether somebody is a redhead, the test returns a correct answer X% of the time. (2a) When we test on redheads AND when we test on non-redheads, the test returns a correct answer X% of the time. (3) When the test identifies somebody as a redhead, it has a X% chance of being correct.
If it is referring to #3 then that's a problem in itself, because in general X is not a function of the test alone - it depends on the prior distribution (i.e. population demographics). Apply this test in Ireland and X will be larger than if you apply it in Japan.
Even within some standardised population, neither #1 nor #3 is very useful in characterising the accuracy of the test because each of them is only telling us half the story. Definition #1 is only concerned with (true positives/(true positives+false negatives)), ignoring false negatives; definition #3 is only concerned with (true positives/true + false positives). You can get very high values of X for either of those definitions (but not both at once) simply by setting very low or very high acceptance criteria.
actually the test identifies that they have a gene that all red heads have (and be reasonably sure that's not a side effect of mechnical error in the PCR process) – the 30% error rate is due to it being hard to acertain (and by hard, I mean NP hard) that the person 100% did not also have a gene that, along with the gene for red headedness, cause the person to have been brunet or brown haired, due to the physical procedure of obtaining DNA samples (degradation in the sample leading to partial samplings of the suspect's DNA). So if you find a definite marker of being ginger or blonde, they MIGHT be ginger or blonde, if you find a marker for black hair THEY WILL NOT be ginger or blonde. And hence why the probablity differs for hair colour.
Of course it's a bit silly given that a lot of forensic DNA samples will come from hair samples – in which case the ability to assess hair colour improves dramatically!
Yes, but what exactly does "x% error rate" mean for that finding? There are four possible outcomes (true positive, false positive, true negative, false negative) and several different ways one can define "error rate" from those four variables. While I'm not a geneticist, it seems to me that the useful ones will all be dependent on the demographics of your population.
Let's suppose we define error rate as "the percentage of identified 'redheads' who are not actually redheads", the definition WK was using above. Split our population into groups A (redheads), B (non-redheads with a non-activated red-headed gene*), C (non-redheads with no red-headed gene) with a, b, c representing the proportion of each of those groups in the population - or at least, among the population of criminals who leave DNA evidence.
Assume for the sake of argument that the test gives correct results for groups A and C 100% of the time** and gives incorrect results with probability p when applied to group B. So the outcomes are:
a = correctly-identified redheads bp = non-activated mistakenly identified as redheads b(1-p) = non-activated correctly identified as non-redheads c = correctly identified as having no red-headed genes
Then under the definition given above, the "error rate" in identifying redheads is bp/(bp + a)
This depends on the population. If you're investigating a murder at the Red-Headed League where a=1 and b=c=0, error rate works out at 0/(0+1) = 0%. But if you're investigating a population where prevalence of the red-headed gene is very low, say a=0.0001 and b=0.02, then the error rate is 0.02p/[0.02p+0.0001 + 0.02p] = p/(p+0.005), which is close to 100%, and in between those it varies.
*simplifying here, since by my understanding the genetics of hair colour is a bit fuzzier than a straightforward recessive/dominant model. And even with a perfect sample, genetics alone can't completely predict hair colour; my mother started out as a redhead, but by the time she was forty her hair had darkened effectively to black, and anybody looking for her as a "redhead" would have been badly misled. OTOH, mine has stayed roughly the same shade so far. **if you drop this assumption, the working gets more involved but the conclusion is still the same: error rate shifts with population properties.
Ah, I'm glad they said this, "Of course, that may or may not be helpful, as a person's outward appearance may not necessarily reflect their origin," Because that's what I immediately thought when I saw it could predict their "origin".
(no subject)
Date: 2012-08-30 03:40 pm (UTC)"There's a
70% chancenear certainty the perpetrator was blonde! The defendant is blonde!"(no subject)
Date: 2012-08-30 04:35 pm (UTC)(no subject)
Date: 2012-08-30 10:06 pm (UTC)(no subject)
Date: 2012-08-31 12:15 pm (UTC)Note that there is a subtle but very large difference between "this test correctly identifies X/non-X 70% of the time" and "when this test says the perp is X there's a 70% chance it's correct".
For instance, let's say 5% of people and 5% of criminals in some region are redheads. So out of 1000 criminals, 950 will be redheads and 50 will be non-redheads. A test that's 70% accurate in both directions will end up mistakenly identifying 285 non-redheads as redheads, and it will only pick 35 of the redheads. So when the test says "redhead", there's only a 35/320 = 11% chance that the perp actually is a redhead.
Sure, it's still giving you some information: the conditional probability that the perp is a redhead has gone from 5% to 11% with this information. But as an investigator... what can you actually do with that information? It's nowhere near firm enough to say that you should exclude non-redheads from the investigation. And if it goes the other way and tells you the perp is a non-redhead, there's still a 2% chance that they actually are.
(no subject)
Date: 2012-08-31 12:29 pm (UTC)I detect a major problem in this math. You anti-ginger-ist.
(no subject)
Date: 2012-08-31 01:17 pm (UTC)(no subject)
Date: 2012-08-31 01:06 pm (UTC)A test that's 70% accurate in both directions will end up mistakenly identifying 285 non-redheads as redheads,
That's not quite how it works. They're not testing "is it a redhead", It's more "if it says the criminal is a redhead, it has a 70% chance of being right"
(no subject)
Date: 2012-08-31 01:48 pm (UTC)(1) When we test on a redhead, the test returns "red" X% of the time.
(2) When we attempt to test whether somebody is a redhead, the test returns a correct answer X% of the time.
(2a) When we test on redheads AND when we test on non-redheads, the test returns a correct answer X% of the time.
(3) When the test identifies somebody as a redhead, it has a X% chance of being correct.
If it is referring to #3 then that's a problem in itself, because in general X is not a function of the test alone - it depends on the prior distribution (i.e. population demographics). Apply this test in Ireland and X will be larger than if you apply it in Japan.
Even within some standardised population, neither #1 nor #3 is very useful in characterising the accuracy of the test because each of them is only telling us half the story. Definition #1 is only concerned with (true positives/(true positives+false negatives)), ignoring false negatives; definition #3 is only concerned with (true positives/true + false positives). You can get very high values of X for either of those definitions (but not both at once) simply by setting very low or very high acceptance criteria.
(no subject)
Date: 2012-09-01 06:10 pm (UTC)Of course it's a bit silly given that a lot of forensic DNA samples will come from hair samples – in which case the ability to assess hair colour improves dramatically!
(no subject)
Date: 2012-09-01 10:39 pm (UTC)Yes, but what exactly does "x% error rate" mean for that finding? There are four possible outcomes (true positive, false positive, true negative, false negative) and several different ways one can define "error rate" from those four variables. While I'm not a geneticist, it seems to me that the useful ones will all be dependent on the demographics of your population.
Let's suppose we define error rate as "the percentage of identified 'redheads' who are not actually redheads", the definition WK was using above. Split our population into groups A (redheads), B (non-redheads with a non-activated red-headed gene*), C (non-redheads with no red-headed gene) with a, b, c representing the proportion of each of those groups in the population - or at least, among the population of criminals who leave DNA evidence.
Assume for the sake of argument that the test gives correct results for groups A and C 100% of the time** and gives incorrect results with probability p when applied to group B. So the outcomes are:
a = correctly-identified redheads
bp = non-activated mistakenly identified as redheads
b(1-p) = non-activated correctly identified as non-redheads
c = correctly identified as having no red-headed genes
Then under the definition given above, the "error rate" in identifying redheads is bp/(bp + a)
This depends on the population. If you're investigating a murder at the Red-Headed League where a=1 and b=c=0, error rate works out at 0/(0+1) = 0%. But if you're investigating a population where prevalence of the red-headed gene is very low, say a=0.0001 and b=0.02, then the error rate is 0.02p/[0.02p+0.0001 + 0.02p] = p/(p+0.005), which is close to 100%, and in between those it varies.
*simplifying here, since by my understanding the genetics of hair colour is a bit fuzzier than a straightforward recessive/dominant model. And even with a perfect sample, genetics alone can't completely predict hair colour; my mother started out as a redhead, but by the time she was forty her hair had darkened effectively to black, and anybody looking for her as a "redhead" would have been badly misled. OTOH, mine has stayed roughly the same shade so far.
**if you drop this assumption, the working gets more involved but the conclusion is still the same: error rate shifts with population properties.
(no subject)
Date: 2012-08-30 10:06 pm (UTC)