## Info

•Difference between treatments/standard deviation

(based on a two-sided test at the 0.05 level)

•Difference between treatments/standard deviation

(based on a two-sided test at the 0.05 level)

Fig. 4.2 Power curves — an illustrative method of defining the number of subjects required in a given study. In practice the actual number would be calculated from standard equations. In this example the curves are constructed for 16,40, 100 and 250 subjects per group in a two-limb comparative trial.The graphs can provide three pieces of information: (l)The number of subjects that need to be studied, given the power of the trial and the difference expected between the two treatments. (2) The power of a trial, given the number of subjects included and the difference expected. (3) The difference that can be detected between two groups of subjects of given number, with varying degrees of power. (With permission from: Baber N, Smith R.N, Griffin JP, O'Grady J, D'Arcy (eds) 1998Textbook of pharmaceutical medicine, 3rd edn. Belfast: Queen's University of Belfast Press.)

It will be intuitively obvious that a small difference in the effect that can be detected between two treatment groups, or a large variability in the measurement of the primary endpoint, or a high significance level (low P value) or a large power requirement, all act to increase the required sample size. Figure 4.2 gives a graphical representation of how the power of a clinical trial relates to values of clinically relevant standardised difference for varying numbers of trial subjects (shown by the individual curves). It is clear that the larger the number of subjects in a trial, the smaller is the difference that can be detected for any given power value.

The aim of any clinical trial is to have small Type I and II errors and consequently sufficient power to detect a difference between treatments, if it exists. Of the four factors that determine sample size, the power and significance level are chosen to suit the level of risk felt to be appropriate; the magnitude of the effect can be estimated from previous experience with drugs of the same or similar action; the variability of the measurements is often known from published experiments on the primary endpoint, with or without drug. These data will, however, not be available for novel substances in a new class and frequently the sample size in the early phase of development is chosen on a more arbitrary basis. As an example, a trial that would detect, at the 5% level of statistical significance, a treatment that raised a cure rate from 75% to 85% would require 500 patients for 80% power.

### Fixed-sample size and sequential designs

Defining when a clinical trial should end is not as simple as it first appears. In the standard clinical trial the end is defined by the passage of all of the recruited subjects through the complete design. But, it is results and decisions based on the results that matter, not the number of subjects. The result of the trial may be that one treatment is superior to another or that there is no difference. These trials are of fixed-sample size. In fact, patients are recruited sequentially, but the results are analysed at a fixed time-point. The results of this type of trial may be disappointing if they miss the agreed and accepted level of significance.

It is not legitimate, having just failed to reach the agreed level (say, P = 0.05) to take in a few more patients in the hope that they will bring P value down to 0.05 or less, for this is deliberately not allowing chance and the treatment to be the sole factors involved in the outcome, as they should be.

An alternative (or addition) to repeating the fixed-sample size trial is to use a sequential design in which the trial is run until a useful result is reached.26 These adaptive designs, in which decisions are taken on the basis of results to date, can assess results on a continuous basis as data for each subject that becomes available or, more commonly, on groups of subjects

(group sequential design). The essential feature of these designs is that the trial is terminated when a predetermined result is attained and not when the investigator looking at the results thinks it appropriate. Reviewing results in a continuous or interim basis requires formal interim analysis and there are specific statistical methods for handling the data, which need to be agreed in advance. Group sequential designs are especially successful in large long-term trials of mortality or major non-fatal endpoints when safety must be monitored closely.

Interim analyses can reduce the power of statistical significance tests to a serious degree if they are scheduled to occur more than, say, about four times in a trial. Such sequential designs recognise the reality of medical practice and provide a reasonable balance between statistical, medical and ethical needs. It is a necessity to have expert statistical advice when undertaking such trials; poorly designed and executed studies cannot be salvaged after the event.