Law, Probability and Risk Advance Access originally published online on January 31, 2007
Law, Probability and Risk 2006 5(2):159-165; doi:10.1093/lpr/mgl017
| ||||||||||||||||||||||||||||||||||||||||||||||||||
© The Author [2007]. Published by Oxford University Press. All rights reserved.
Case commentUnited States v. Copeland, 369 F. Supp. 2d 275 (E.D.N.Y. 2005): quantification of the proof beyond reasonable doubt standard

School of Mathematics and Statistics, University of New South Wales, Sydney 2052, Australia
Email: j.franklin{at}unsw.edu.au
Received on 21 September 2006. Accepted on 30 October 2006.
There are many reasons for objecting to quantifying the proof beyond reasonable doubt standard of criminal law as a percentage probability. They are divided into ethical and policy reasons, on the one hand, and reasons arising from the nature of logical probabilities, on the other. It is argued that these reasons are substantial and suggest that the criminal standard of proof should not be given a precise number. But those reasons do not rule out a minimal imprecise number. Well above 80% is suggested as a standard, implying that any attempt by a prosecutor or jury to take the proof beyond reasonable doubt standard to be 80% or less should be ruled out as a matter of law.
Keywords: evidence standard; proof beyond reasonable doubt; quantification; logical probability
Objections to the quantification of the criminal standard of proof come from two directions. From the direction of policy, ethics and psychology, the problems raised include the following:
- There may be different standards appropriate to different cases, e.g. a higher standard where the punishment is heavier.
- The jury is properly left to decide the standard in the light of the facts of the particular case.
- Since there is in fact considerable disagreement as to the correct numerical value of the standard, attempts to standardize it will create only confusion, evasions and a façade of uniformity where there is no true consensus.
- The majesty of the law and its powers of deterrence would be ill-served, if the law were forced to admit the truth about the number of false convictions it allows and the number of criminals it allows to go free.
Quite different objections arise from certain more conceptual problems about the nature of probability:
- Some probabilities may be inherently incapable of being given a precise number.
- Evidence suitable for conviction should be substantial or weighty, and a numerical probability expresses only the balance between favourable and unfavourable reasons, not whether those reasons are substantial.
- A numerical standard will tend to draw attention to evidence that is quantified and logically relevant but legally inadmissible, such as proportions in reference classes containing the defendant.
These conceptual problems have rarely been clearly distinguished from the more commonly discussed policy and ethical problems. They are therefore developed at greater length here. It is concluded, however, that though all these arguments have some force, it is still desirable to introduce some minimal quantification into the reasonable doubt standard. In particular, any probability less than 0.8 should be declared less than proof beyond reasonable doubt in all circumstances.
Although betting odds, biases of dice and relative frequencies in populations are inherently numerical, that is not obviously so with the logical probabilities that concern the relation of evidence to hypothesis. According to the classic exposition of Keynes' Treatise on Probability, the relation of evidence to hypothesis, in cases such as proof beyond reasonable doubt in law or the evaluation of scientific theories in the light of experimental evidence, is a logical matter, a kind of partial implication.1 Certainly, there are cases where it is very natural to attach a precise number to that relation. For example, if the hypothesis is This swan is black and the (sole relevant) evidence is 15% of swans are black, then it is natural to attach a precise numerical probability to the relation between the two, namely,
![]() |
At the other extreme, it is unnatural to attach any number, precise or otherwise, to
![]() |
The evidence has no logical relation to the hypothesisthere is no partial implication, hence no number expressing it. If one insists on numbers, one might admit that
![]() |
More typical cases of evidence evaluation, Keynes thought, lay between these two extremes of maximally precise and maximally imprecise probabilities.2 Such cases of imprecision can arise even when the evidence is itself numerical. For example, it may be reasonable to conclude that
![]() |
On the other hand, as Peter Tillers points out,3 quantification with imprecise numbers is still quantification, and so arguments that there is no precise number to be attached to a standard of proof do not carry over to arguments against imprecise but still numerical quantification. If there are reasons against choosing any one precise number such as 0.95 for the criminal standard of proof, that does not rule out an imprecise level such as considerably above 0.8 as a requirement for adherence to that standard.
A second problem arises from Keynes' problem of the weight of evidence.4 A probability P(h|e) expresses the balance of the reasons in evidence e for and against hypothesis h. But that balance may be a balance between few and light reasons or between many and solid reasons. The matter is most easily appreciated when P(h|e) is a half, since it is easy to find cases where either few or many reasons for and against a conclusion balance. Keynes asks about the difference between
![]() |
![]() |
Both probabilities are &1by2;, but the second is based on the balance of much more evidence. It has greater weight. The concept appears in a minimal way in the burden of production of some appreciable amount of evidence that is required for a civil case to begin.5 But it is more evident in civil cases that involve decision on very little evidence. Since the civil standard of proof is &1by2;, or the preponderance of evidence or balance of probabilities, realistic cases have arisen where decisions have had to be made on the balance of very small amounts of evidence, i.e. on probabilities of low weight. A celebrated instance is the Australian case TNT Management Pty Ltd v. Brooks. The plaintiff's husband was one of the two drivers killed in a head-on collision on a straight road. There were no witnesses and almost no further relevant evidence, and hence a symmetry in the evidence with respect to each driver. The legal situation required a decision as to whether, on the balance of probabilities, the other driver was negligent (irrespective of any possible negligence on the part of the plaintiff's husband). It was argued, using the following diagram, that on the balance of probabilities the other driver was negligent.6

- AN: Plaintiff's husband alone negligent
- BN: Defendant's driver alone negligent
- AN & BN: Both drivers negligent
- BN: Defendant's driver alone negligent
Such cases are alarming, mainly because of the lack of robustness of the low-weight probabilities to new evidence. A small amount of further evidence would substantially change the probability. Therefore, the decision has some random element in it, arising from the randomness of the evidence that the court has available. Nevertheless, in civil cases it is arguable that this is the best one can doa decision must be forced and the best one available on the evidence is the correct one, even if the evidence is scanty.
The use of probabilities of low weight in criminal cases is more worrying.7 A probability of guilt of 0.9 reached through balancing a small amount of evidence is different from a probability of 0.9 based on a mass of evidence, because the chance discovery of a new minor piece of evidence could well reduce the first to 0.7 but is unlikely to do so for the second. One might therefore be rationally less willing to condemn a defendant to a heavy sentence on a probability of 0.9 of low weight than on a probability of 0.9 of high weight. The purely qualitative language of beyond reasonable doubt could be argued to mean a doubt that is both large enough in probability and of sufficient weight to rely on. Likewise, the merely fanciful doubts that will not sway the jury from its conviction of guilt may be doubts that are not merely low in probability but low in weight, i.e. not based on any solid evidence but only on mere logical possibility or an ingenious imagination.
It is true that there is some necessary connection between high probabilities and high weight, in that it seems to be impossible to obtain extreme probabilitiesthose near 0 or 1without some considerable weight. For example, if the probabilities are based purely on relative frequencies in some reference class, then a large reference class (and hence high weight) is needed to obtain a probability close to 0 or 1. Since
![]() |
Again, these questions on the relation of high probability and high weight have no bearing on whether low numerical probabilities such as those less than 0.8 should be ruled out as failing to satisfy the standard of beyond reasonable doubt. If a probability on the evidence is less than 0.8, then whether it is of high or low weight makes little difference. Conviction on that probability carries a high risk of condemning the innocent.
This brings us to the third conceptual problem with quantification of the reasonable doubt standard, the fact that the probabilities most usually and easily quantified, those arising from a proportion in a reference class like
![]() |
There is a special problem with frequencies in the reference class of which every defendant is a member, namely the class of defendants. Jurors have beliefs about this class, and widely differing beliefs. In the survey of Saunders on the opinions of 130 numerate adults on the level of probability equivalent to proof beyond reasonable doubt, all but two of the responses lay between 50% and 99.99999% inclusive. The two outliers both explained their opinion by their beliefs about defendants generally. One argued for 30%, on the grounds that defendants are generally guilty and so the standard for releasing accused murderers should be high; in his homeland (Nigeria), he thought, a presumption that a defendant was innocent would not be prudent. The other outlier, an AfricanAmerican, believed defendants were often victims of police conspiracies and so demanded 100% certainty for conviction.10
Other frequencies in reference classes that could be but are not allowed to be considered as relevant include recidivism rates. Traditional justifications of high standards for proof beyond reasonable doubt along the lines of it is better to allow 10 (or 100) criminals to go free than to condemn one innocent invite cost-benefit analyses that could only be conducted with close attention to relative frequencies. The cost of allowing criminals to go free depends on the chance of re-offending by those criminals. Recidivism rates are possibly sex-specific, race-specific and crime-specific and are certainly age-specific.11 There is no prospect whatever that the presentation of such statistics will be permitted in court to influence a jury's setting of its standard of proof.
The reasons for the inadmissibility of all such reference-class evidence may be either psychological or ethical: either psychological claims as to its prejudicial effect on juriesi.e. claims that as a matter of psychological fact it tends to be overweighted and lead to wrong decisionsor ethical reasons on the injustice of condemning someone on the basis of the acts of others. In either case, it is to some degree problematic for the quantification of the standard of proof beyond reasonable doubt that the evidence that would most naturally lead to numbers is inadmissible, and therefore that admissible evidence is almost always qualitative. That takes us back to the first problem above, that it may be impossible in principle to assign a precise or even an imprecise number to some probabilities based on such evidence.
Again, none of this reasoning tends to rule out placing a floor of 0.8 on the criminal standard of proof. If the probability of guilt on the admissible evidence can be assigned a number and if that number (precise or otherwise) is not substantially greater than 0.8, then it carries a high risk of condemning the innocent.
Therefore none of the conceptual problems with probability constitute reasons against setting a minimum of 0.8 on the standard of proof beyond reasonable doubt. Nor do any of the ethical or psychological reasons mentioned at the beginning of the article. As these have been discussed at length, brief comments will suffice here. Undoubtedly, there are some reasons for insisting on an exceptionally high standard of proof in capital and other grave cases,12 but the fact that some standards may be less stringent than others is no reason to relax standards at the bottom end. Jurors and the judges instructing them may need to be left with some discretion and flexibility,13 but some discretion is not arbitrary discretion; their discretion is already constrained by the requirement that proof beyond reasonable doubt is a much higher probability that the preponderance of evidence, so some further restriction to eliminate the observed wide variation in reported standards should be acceptable in principle. It is true that attempts at constraint may be less than totally successful in their effects on real juries,14 but given the extreme present confusion revealed by surveys, any confusion resulting from a simple demand that the criminal standard should be more than 0.8 is likely to be much less than the current confusion, where a façade of uniform language hides an unacceptable large variation in numbers. In any case, the evidence suggests, as far as it goes, that jurors can understand quantified standards better than unquantified ones and act more uniformly as a result.15 The public's respect for the law will probably not be diminished any further, by any unpleasant truths that may come to light about the numbers of false convictions or of acquitted criminals; in any case, such figures cannot be inferred from the standard of proof because the base rates in the class of defendants is unknown.
The main argument in favour of a quantitative constraint on the beyond reasonable doubt standard is the gross divergence of opinions as to the numerical meaning of the standard. The survey results are clear.16 The consequences are also clear: there are many real juries in which a majority of jurors believe that a probability of 70% satisfies the standard of proof beyond reasonable doubt. That is a cause for alarm to all potential defendants, that is, to everyone. It is a travesty of the commitment to consistency on which the law prides itself in other areas.
A first step towards justice would be to rule out numerical probabilities that are clearly unreasonable. If a jury asks a judge for guidance on whether a 75% probability is sufficient, the law should be unambiguous in its answer. The answer should be no.
An appropriate numerical standard to choose as an absolute minimum follows from Judge Weinstein's suggestion in Copeland III (United States v. Copeland, 369 F. Supp. 2d (E.D.N.Y. 2005) of 20% for a reasonable probability, and hence of 80% for its inverse or complement clear, unequivocal and convincing evidence.17 The case was a civil one involving the serious consequence of deportation. Since proof beyond reasonable doubt is well above clear, unequivocal and convincing evidence, it follows that proof beyond reasonable doubt means well above a probability of 0.8. Any suggestion from a jury that 0.8 or less is adequate can be ruled out, while the qualification well above will avoid any suggestions that something just above 0.8 is in fact adequate, and will not obstruct any later attempts to quantify the standard more exactly.
| Notes |
|---|
|
|
|---|
1 KEYNES, J. M. (1921) Treatise on Probability. London, Macmillan, c. 1.
3 TILLERS, P., Law, Probability and Risk, 5, 2006 (in press) ![]()
4 KEYNES, Treatise, c. 6; COHEN, L. J. (1985) Twelve questions about Keynes' concept of weight. British Journal for the Philosophy of Science, 37, 263; JAYNES, E. T. (2003) Probability Theory: The Logic of Science Cambridge, Cambridge University Press, c. 18; Similar ideas originally in PEIRCE, C. S. (1878) The probability of induction. Popular Science Monthly, 12, 705; A sixteenth century anticipation in FRANKLIN, J. (2001) The Science of Conjecture: Evidence and Probability before Pascal. Baltimore, Johns Hopkins University Press, 7679. ![]()
5 FLEMING, J. & HAZARD, G. C. (1985) Civil Procedure, Little Brown 3rd edn. Boston,
7.7. ![]()
6 T.N.T. Management v. Brooks (1979) 23 A.L.R. 345, discussed in EGGLESTON, R. (1983) Evidence, Proof and Probability, 2nd edn. London, Weidenfeld and Nicolson p. 184 and in FRANKLIN, J. (2001) Resurrecting logical probability. Erkenntnis, 55, 277. ![]()
7 DAVIDSON, B. & PARGETTER, R. (1987) Guilt beyond a reasonable doubt. Australasian Journal of Philosophy, 65, 182; Clarifications in DUNHAM, N. J. & BIRMINGHAM, R. L. (1989) On legal proof. Australasian Journal of Philosophy, 67, 479. ![]()
8 Eggleston, pp. 5960, 8889. ![]()
9 Though possibly sometimes relevant to sentencing: TILLERS, P. (2005) If wishes were horses: discursive comments on attempts to prevent individuals from being unfairly burdened by their reference classes. Law, Probability & Risk, 4, 33; COLYVAN, M., REGAN, H. & FERSON, S. (2001) Is it a crime to belong to a reference class? Journal of Political Philosophy, 9, 166. ![]()
10 SAUNDERS, H. D. (2005) Quantifying Reasonable Doubt: A Proposed Solution to an Equal Protection Problem. ExpressO Preprint Series, paper 881 (http://law.bepress.com/expresso/eps/881). ![]()
11 For example HANSON, R. K. (2002) Recidivism and age: follow-up data from 4,673 sexual offenders. Journal of Interpersonal Violence, 17, 1046. ![]()
12 Reasons for and against reviewed in LILLQUIST, E. (2005) Absolute certainty and the death penalty. American Criminal Law Review, 42, 45. ![]()
13 Various reasons in STOFFELMAYR, E. & DIAMOND, S. S. (2000) The conflict between precision and flexibility in explaining "beyond a reasonable doubt." Psychology, Public Policy, and Law, 6, 769. ![]()
14 Warnings in STRAWN, D. U. & BUCHANAN, R. W. (1976) Jury confusion: a threat to justice. Judicature, 59, 478. ![]()
15 KAGEHIRO, D. K. (1990) Defining the standard of proof in jury instructions. Psychological Science, 1, 194. ![]()
16 SIMON, R. J. & MAHAN, L. (1971) Quantifying burdens of proof: a view from the bench, the jury, and the classroom. Law and Society Review, 5, 319; KAGEHIRO, D. K. & STANTON, W. C. (1985) Legal vs. quantified definitions of standards of proof. Law and Human Behavior, 9, 159; SAUNDERS, op. cit.; HOROWITZ, I. A. & KIRKPATRICK, L. C. (1996) A concept in search of a definition: the effects of reasonable doubt instructions on certainty of guilt standards and jury verdicts. Law and Human Behavior, 20, 655; HOROWITZ, I. A. (1997) Reasonable doubt instructions: commonsense justice and standard of proof.Psychology, Public Policy and Law, 3, 285. ![]()
17 As discussed in Tillers, previous article. ![]()
![]()
CiteULike
Connotea
Del.icio.us What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||







