Hypothesis of no difference

When it is suspected that treatment A may be superior to treatment B and the truth is sought, it is convenient to start with the proposition that the treatments are equally effective — the 'no difference' hypothesis (null hypothesis). After two groups of patients have been treated and it has been found that improvement has occurred more often with one treatment than with the other, it is necessary to decide how likely it is that this difference is due to a real superiority of one treatment over the other. To make this decision we need to understand two major concepts, statistical significance and confidence intervals.

17 Wallis W A et al 1957 Statistics, a new approach. Methuen, London.

18 Galton F 1879 Generic images. Proceedings of the Royal

A statistical significance test19 (e.g. the Student's't' test, the Chi-Square test) will tell how often an observed difference would occur due to chance (random influences) if there is, in reality, no difference between the treatments. Where the statistical significance test shows that an observed difference would only occur five times if the experiment were repeated 100 times, this is often taken as sufficient evidence that the null hypothesis is unlikely to be true. Therefore the conclusion is that there is (probably) a real difference between the treatments. This level of probability is generally expressed in therapeutic trials as: 'the difference was statistically significant', or 'significant at the 5% level' or, P = 0.05' (P = probability based on chance alone). Statistical significance simply means that the result is unlikely to have occurred if there is no genuine treatment difference, i.e. there probably is a difference.

If the analysis reveals that the observed difference, or greater, would occur only once if the experiment were repeated 100 times, the results are generally said to be 'statistically highly significant', or 'significant at the 1% level' or 'P = 0.01'.

Confidence intervals. The problem with the P value is that it conveys no information on the amount of the differences observed or on the range of possible differences between treatments. A result that a drug produces a uniform 2% reduction in heart rate may well be statistically significant but it is clinically meaningless. What doctors are interested to know is the size of the difference, and what degree of assurance, or confidence, they may have in the precision (reproducibility) of this estimate. To obtain this it is necessary to calculate a confidence interval (see Figs 4.1 and 4.2).20

A confidence interval expresses a range of values, which contains the true value with 95% (or other chosen %) certainty. The range may be broad, indicating uncertainty, or narrow, indicating (relative) certainty. A wide confidence interval occurs when numbers are small or differences observed are variable and points to a lack of information, whether the difference is statistically significant or not; it is a

"Altman D et al 1983 British Medical Journal 286:1489. 20Gardner M J, Altman D G 1986 British Medical Journal 292: 746.

warning against placing much weight on, or confidence in, the results of small or variable studies. Confidence intervals are extremely helpful in interpretation, particularly of small studies, as they show the degree of uncertainty related to a result. Their use in conjunction with nonsignificant results may be especially enlightening.21 A finding of 'not statistically significant' can be interpreted as meaning there is no clinically useful difference only if the confidence intervals for the results are also stated in the report and are narrow. If the confidence intervals are wide, a real difference may be missed in a trial with a small number of subjects, i.e. absence of evidence that there is a difference is not the same as showing that there is no difference. Small numbers of patients inevitably give low precision and low power to detect differences.

The above discussion provides us with information on the likelihood of falling into one of the two principal kinds of error in therapeutic experiments, for the hypothesis that there is no difference between treatments may either be accepted incorrectly or rejected incorrectly.

Type I error (a) is the finding of a difference between treatments when in reality they do not differ, i.e. rejecting the null hypothesis incorrectly. Investigators decide the degree of this error which they are prepared to tolerate on a scale in which 0 indicates complete rejection of the null hypothesis and 1 indicates its complete acceptance; clearly the level for a must be set near to 0. This is the same as the significance level of the statistical test used to detect a difference between treatments. Thus a (or P = 0.05) indicates that the investigators will accept a 5% chance that an observed difference is not a real difference.

Type II error (p) is the finding of no difference between treatments when in reality they do differ, i.e. accepting the null hypothesis incorrectly. The probability of detecting this error is often given

21Altman D G et al 1983 British Medical Journal 286:1489.

wider limits, e.g. (3 = 0.1-0.2, which indicates that the investigators are willing to accept a 10-20% chance of missing a real effect. Conversely, the power of the study (1 - P) is the probability of avoiding this error and detecting a real difference, in this case 80-90%.

It is up to the investigators to decide the target difference22 and what probability level (for either type of error) they will accept if they are to use the result as a guide to action.

Plainly, trials should be devised to have adequate precision and power, both of which are consequences of the size of study It is also necessary to make an estimate of the likely size of the difference between treatments, i.e. the target difference. Adequate power is often defined as giving an 80-90% chance of detecting (at 1-5% statistical significance, P = 0.01-0.05) the defined useful target difference (say 15%). It is rarely worth starting a trial that has less than a 50% chance of achieving the set objective, because the power of the trial is too low; such small trials, published without any statement of power or confidence intervals attached to estimates reveal only their inadequacy.

Was this article helpful?

Your heart pumps blood throughout your body using a network of tubing called arteries and capillaries which return the blood back to your heart via your veins. Blood pressure is the force of the blood pushing against the walls of your arteries as your heart beats.Learn more...

## Post a comment