Over at The Green Bag, Judge Richard Posner published “What Is Obviously Wrong With the Federal Judiciary, Yet Eminently Curable, Part I.” The article is quintessential Posner: concise, expansive, forceful, and packed with good and bad ideas with minimal supporting citations.
Back in July 2014, I wrote a post about the misuse of “statistical significance” by defendants and courts trying to apply the Daubert standard to scientific evidence. As I wrote,
It’s true that researchers typically use statistical formulas to calculate a “95% confidence interval” — or, as they say in the jargon of statistics, “p < 0.05” — but this isn’t really a scientifically-derived standard. There’s no natural law or empirical evidence which tells us that “95%” is the right number to pick to call something “statistically significant.” The number “1 in 20” was pulled out of thin air decades ago by the statistician and biologist Ronald Fisher as part of his “combined probability test.” Fisher was a brilliant scientist, but he was also a eugenicist and an inveterate pipe-smoker who refused to believe that smoking causes cancer. Never underestimate the human factor in the practice of statistics and epidemiology.
(Links omitted; they’re still in the original post.) As expected, defense lawyers criticized my post.
Last week, the American Statistical Association published its very first “policy statement” on “a specific matter of statistical practice,” making clear that tossing around the term “statistical significance” is a “considerable distortion of the scientific process:”
Practices that reduce data analysis or scientific inference to mechanical “bright-line” rules (such as “p < 0.05”) for justifying scientific claims or conclusions can lead to erroneous beliefs and poor decision-making. A conclusion does not immediately become “true” on one side of the divide and “false” on the other. Researchers should bring many contextual factors into play to derive scientific inferences, including the design of a study, the quality of the measurements, the external evidence for the phenomenon under study, and the validity of assumptions that underlie the data analysis. Pragmatic considerations often require binary, “yes-no” decisions, but this does not mean that p-values alone can ensure that a decision is correct or incorrect. The widespread use of “statistical significance” (generally interpreted as “p ≤ 0.05”) as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process.
Update: It’s worth pointing out that, a year and a half after Dr. Anick Bérard’s testimony was precluded as “unreliable,” she published in the Journal of the American Medical Association, using many of the same methods the court deemed unacceptable.
Back in 2012, I wrote: “Scientific evidence is one of those rare areas of law upon which every lawyer agrees: we are all certain that everyone else is wrong.”
There have been some missteps in the law’s use of scientific proof as evidence in civil litigation — like when the Supreme Court affirmed a trial court holding in Kumho Tire Co. v. Carmichael, 526 U.S. 137 (1999), that an engineer with a Masters in Mechanical Engineering who had worked in tire design and failure testing at Michelin was nonetheless incompetent to testify about tire failures — but, by and large the standard articulated in Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579 (1993) makes sense. Courts review an expert’s methods, rather than their conclusions, to ensure that the expert’s testimony has an appropriate scientific basis.
To go with the baseball metaphors so often (and wrongly) used in the law, when it comes to Daubert, the judge isn’t an umpire calling balls and strikes, they’re more like a league official checking to make sure the players are using regulation equipment. Mere disagreements about the science itself, and about the expert’s conclusions, are to be made by the jury in the courtroom.
In practice, though, the Daubert standard runs into problems when courts erroneously decide factual disputes about methodology and conclusions, issues which are better left to cross examination of the experts at trial. Consider the June 27, 2014 opinion in the Zoloft birth defects multidistrict litigation, which struck the testimony of plaintiffs’ “perinatal pharmacoepidemiologist,” Dr. Anick Bérard. Dr. Bérard holds a Ph.D. in Epidemiology and Biostatistics from McGill University, teaches at the Université de Montréal, and has conducted research on the effects of antidepressants on human fetal development. The expert was going to opine that “Zoloft, when used at therapeutic dose levels during human pregnancy, is capable of causing a range of birth defects (i.e., is a teratogen),” an opinion based upon her review of a variety of studies showing a correlation between SSRI use and birth defects. The court had multiple grounds for striking the opinion, but a key issue relating to statistics jumped out at me.…