## Induction Rules

## Part B: Numbers Games

### Lies, Damned Lies, and Statistics

We ended Part A with the limitations of anecdotal evidence. That leads us to statistics.

Disraeli famously said, "there are three kinds of lies: lies, damned lies and statistics": but that is only fair comment on the *misuse* of statistics. Statistics are a valid and useful tool for discovering or proving causal links, especially where phenomena are complex and there are multiple causal factors involved. As we saw when analysing anecdotal evidence, we need a method to determine whether an apparent connection between events has any meaning. Statistics is one of those methods: it comprises procedures and mathematical formulae for determining (given certain assumptions) the probability that observed relationships have not occurred merely by chance.

However, there are many things to beware of when dealing with statistics.

#### Validity

Is the method used *valid*? As noted above, statistical formulae depend on certain assumptions, such as the variations following a "normal" (bell-curve) distribution. Different statistical tools are needed for different types of variation, sample sizes and methods of data collection. Using an inappropriate tool can give false results.

#### (In)significance

"Statistically significant" just means that there is a low probability that a correlation is due to chance: it doesn't mean that the linkage is significant in the sense of "important". For example, if with a huge sample size we get a statistically significant but tiny real difference (say a variation in height of 0.1%), it is not very interesting or "significant" in the sense of something we need to care about. The questions to ask are "is the magnitude of the effect significant compared to all the other things that influence it?", and "is the effect actually important?"

#### Coincidence

*Isolated studies* have to be treated with extreme caution. Statistical significance is generally expressed as the probability that the results are due to chance, and a conventional cut-off level is a probability of 5%. That is, results are usually called statistically significant if the probability that they happened by chance is less than or equal to one in twenty.

Unfortunately, the corollary is that one in twenty experiments with statistical significance at the 5% level are mere coincidences! On average, if you believe every such test you will be wrong one in twenty times. More generally, statistical results significant at a probability of *1/n* will happen purely by chance once in every *n* studies, on average. And there are so many thousands of statistical studies done each year that we can expect there to be false positives in some that are significant to 0.1% or less. Thus, further studies are *critical:* *all* studies based on statistics must be treated with caution until and unless the probability level is reduced to a very low level.

A related issue concerns "shotgun" studies that look at many variables. For example, people looking for side effects of drugs or environmental conditions often include every illness they can get figures on, in order to increase their chance of catching any effects. But consider the danger here: if you look at 100 diseases, by chance alone you would expect to see five of them significant at the 5% level and one at the 1% level, with a one in ten chance of finding one significant at the 0.1% level – *all meaning nothing*. You might have noticed how frequently some strange and rare illness is linked to some supposed risk factor or location, and wondered why they looked at that particular illness. Often – they didn't. They looked at everything they could get figures for, and a few "passed" the test of statistical significance. So that is a further caution: *if a study looks at multiple possible links, the required probability for any one of them has to be divided by the total number looked at.* On principle, finding a few things significant to 5% in a study of 100 different conditions is *expected* and therefore *means nothing.* The most it can mean is "this particular relation is worth a further look to see if we can confirm it."

A similar consideration applies to "clustering" of illnesses in particular locations. If you look at 100 locations, chances are 5 of them will show something that is significant at the 5% level yet means nothing. It is the nature of probability that purely by chance, some places will have an unusual frequency of certain illnesses. So pointing to such a "cluster" means nothing until proper investigations have been made.

#### Biological Relevance

It often happens that a statistically significant effect, e.g. an increase in cancers, is linked to a particular chemical – but the level of chemical used in the study is thousands of times greater than any reasonable level of exposure in the real world.

The usual justification given for this is that a high exposure was needed to cause enough problems to measure; and that one can extrapolate back by assuming a linear response (i.e. proportional to the dose). For example, if 1000 units causes one cancer in 100 animals, then 1 unit is predicted to cause one cancer in 100,000. However this assumption cannot be accepted without independent evidence that the extrapolation is valid. The simple proof of this is that there are many chemicals and other factors (e.g. molybdenum, copper, iodine, Vitamin A, water and sunlight) which are toxic or carcinogenic at excessive levels but actually *necessary* at lower levels.

#### Correlation versus Causation

If A and B are linked, does A cause B, B cause A, or are they both caused by something else? The last possibility is one of the reasons why statistical studies must go to great lengths to make the comparison groups equivalent in every respect except the one being examined. This is especially a problem when the effects are small, because then the causal agent, whatever it is, must have a subtle action.

#### Trends in Results

As Richard Feynman noted in *The Meaning of It All*, a real phenomenon will stand out more and more from the background as further experiments are done. Conversely, if the apparent effect keeps on shrinking, it is probably unreal. Feynman gave the example of ESP studies. In the early years, quite spectacular results were achieved, but as experiments were refined to weed out fraud, subtle clues, and other problems identified by critics, the improvements in scores over chance shrunk steadily from 200% to 2%. As Feynman wrote, as the experiments got better they disproved all their own previous experiments, down to a residual effect so low as to be suspect itself.

#### Summary

There are a number of critical pieces of information needed to evaluate any statistical study: and these are almost always omitted from press reports. The most critical are: What was the actual probability level measured? A level of 0.001% is far more likely to be real than one of 5%. And what is the probability that this result itself was achieved purely by chance (which will depend on sample size, number of candidate associations looked at in this study, and number of related studies carried out)? These items of information tend to be left out to avoid making it "too complicated for the public": at the expense of making the report completely meaningless to the public or anyone else.

### The Smoking Gun

The link between smoking and lung cancer is a good example of the process required to turn a statistical link into real knowledge.

Statistical studies indicated a link between smoking and lung cancer. Further statistical studies strengthened the link, and eliminated other possible causal links such as socio-economic factors. This and the fact that smoking precedes cancer indicated a causal link. However, clearly not a strong causal link: many non-smokers get lung cancer, many smokers never get cancer, and those who do usually succumb only after decades of smoking. Scientific study found that cancers are caused by the chance accumulation of genetic damage in cells, leading ultimately to uncontrolled, invasive cell multiplication; and that cigarette smoke contains a cocktail of mutagens that cause such damage.

Thus the combination of statistical studies and scientific investigation produced a consistent and convincing explanation of how smoking "causes" (in the sense of increasing the risk of) cancer. Consistent, confirmed statistical studies proved the link, and scientific studies showed both why the link exists, and why smoking increases the chance of contracting cancer without being a necessary or sufficient cause.

### Risk Analysis

Since a common use of statistical studies is to identify agents of harm, a related issue is an appreciation of *relative risk*. Nothing in life is risk free: neither doing things, nor refraining from them. Both exercise and lack of exercise have their own risks. Even smoking has subjective benefits.

The question is not *is there any risk?* The question is, *is the risk worth worrying about?* The latter is actually made of three parts: is it more or less risky than the alternatives; is it more or less risky than all the other things you do *without* worrying; and is the risk greater than the benefit?

For example, some decades ago the artificial sweetener saccharine was alleged to be cancer-causing after animal studies with huge doses. However it was calculated that even if the dubious extrapolation to normal doses was true, the resulting "risk" was less than that of driving to the shop to buy a drink in the first place, or of crossing the road to buy a saccharine-free drink at another shop. Not to mention the alternative risks of drinks full of *sugar*.

Similarly, natural foods contain a wide range of carcinogens and toxins – yet lettuce isn't banned. True, eating is quite risky: but less risky than the alternative!

When combined with the principle noted previously on how to evaluate human evidence, this leads to a principle of evaluating studies of risk which I think should be enshrined as a maxim:

If a study labels something as a risk, but does not put that risk in the proper context of alternative, background and accepted other risks: then it is fundamentally flawed by failing the tests of honesty and/or demonstrated reasoning ability.

I encourage all editors of scientific and medical journals – and newspapers – to return such studies to the authors with a firm note to fix that deficiency.