Volume 1
Number 5
Late Spring, 2005
About NextGen | Editors | Past Issues | Series | Submissions | Resources | Contact Us

Tutorial: An Introduction to Clinical Trials

Part II of II: Statistics and Experimental Design

The Second Installment of NextGen's "Tutorials in Medicine" Series

Part One of this tutorial, which appeared in the last issue, addressed the types of clinical trials and outlined the phases through which a new drug goes in order to obtain FDA approval. Part Two introduces the principles of study design and statistical analysis that determine the validity and reliability of a clinical trial's findings.

Randomized trials are generally considered to be the most reliable clinical trials. These trials assign subjects to different treatment groups via a process of randomization such that neither the subjects nor the doctors know which group a subject will be assigned to in advance, thus eliminating conscious bias. This process creates treatment groups that are similar "on average," helping to prevent imbalance between the treatment groups with respect to other potentially important variables. This validates statistical treatment comparisons and helps to isolate the treatment effect. However, randomization does have disadvantages, as many subjects or doctors may be unwilling to participate in an experiment that determines patient treatment by chance.

Dynamic randomization alters the chance nature of randomization by changing the probability that a given subject is assigned to a particular group via a random mechanism, based on the subject's characteristics and the characteristics of the subjects who have already been randomized. This method seeks to create a better balance between the two groups dynamically. In experimental designs involving stratification, researchers identify particular subject characteristics that may influence the trial outcome, such as age or gender, and balance these known variables between the two otherwise randomized treatment groups. This helps prevent these potentially confounding variables from producing a biased estimate of treatment effect.

Blinding also helps to avoid bias. Blinded studies, unlike open-label studies (in which treatment assignment is known), hide treatment group assignment from participants in the study. Single blinding only hides this information from the treated subject, for example by making all of the administered treatments appear the same. If subjects know which treatment they are receiving, then they may be more likely to react to the treatment in certain ways ("knowing" they have gotten better from taking an active drug rather than a placebo, for instance). Double blinding blinds both the subject and the investigator. For instance, by coding rather than labeling all treatments administered by the doctor, only the statistician knows which subject is receiving which treatment. Doctors may treat or perceive a subject differently if they know into which treatment category the subject has been placed. Doctors may also have financial or intellectual interests in the success of a particular experimental treatment. While blinding makes a clinical trial much more rigorous, it may sometimes be unfeasible. For example, if a treatment produces an extremely obvious reaction, the physician or subject may realize that a subject must be in an active treatment group.

While sound trial design minimizes conscious bias and makes treatment groups as comparable as possible, statistical analyses investigate whether the results of the trial provide sufficient evidence for the efficacy and safety of a treatment. A statistical analysis plan (SAP) serves as the guiding framework for these analyses, and should be in place before analysis begins and ideally before the trial itself begins.

A Data Monitoring Committee (DMC) may oversee the trial and monitor its experimental design, statistical analyses, and the safety of the trial participants. These DMCs protect patients and trial integrity.

A single-arm pilot study administers a drug to a single group of subjects to see whether their condition improves. This improvement is often measured against their condition before the trial using a change from baseline. A relatively simple 1-sample t-test may be used to analyze whether these changes are statistically significant. The results are usually reported along with a p-value indicating the likelihood that variations in the results are due to chance rather than to the treatment. Another statistical endpoint used rather than change from baseline is the cure rate or response rate, analyzing the proportion of the group cured and whether this proportion is significantly high enough to justify claims about the drug's effectiveness. Due to their simple design, single arm pilot studies have several disadvantages. Because subjects are compared only to themselves previously, this design cannot conclude that improvements are necessarily due to treatment, as it can be argued that patients would have improved even if left untreated. A positive result may also be obscured if no improvement is observed during the study, but the subjects would have gotten worse during the time of the study if left untreated. Additionally, because all subjects are receiving an active treatment, the placebo effect may cause them or their doctors to believe they are getting better, which can increase the scores assigned to certain symptoms (upon which the analysis is based, if they are measured on a subjective scale). For this reason, objective measures are preferred.

An alternative is the two-arm placebo controlled study, which administers the experimental treatment of interest to one group and a placebo, an inactive treatment, to the other. Because this design divides the subjects into two groups, randomization, stratification, and blinding may be used to reduce potential bias and confounding. The two groups may then be compared with respect to either a change from baseline or the response rate. Change from baseline is a continuous variable (degree of improvement ranges along a continuum), so these results are analyzed using a confidence interval for the difference between groups; if zero is within the conference interval (meaning it is possible that there was actually no difference between the groups), then the results do not conclusively show any difference between treatments. Response is a binary variable (either response or no response) and a confidence interval is calculated for the difference between response rates for the two groups. One problem with placebo-controlled studies is that if an existing treatment is known to be effective, then assigning subjects to placebos may be unethical.

For a drug to obtain FDA approval, it doesn't necessarily need to be proven more effective than existing drugs; it only needs to be proven at least as good as the current standard of care (SOC) normally used to treat the condition. Another experimental design, the two-arm non-inferiority study, replaces the placebo (inactive treatment) arm of a placebo-controlled study with an active control, usually an SOC. Researchers must define the practical equivalence or noninferiority margin (the clinically tolerable amount of difference) and compare it to the confidence interval for the difference between groups, indicating whether a new drug is "no worse" than an SOC drug.

Miya Bernson is an Associate Editor of the Next Generation and a member of the Harvard College Class of 2006. Scott Evans, Ph.D. is a Research Scientist at the Center for Biostatistics in AIDS Research and the Department of Biostatistics of the Harvard School of Public Health where he focuses on clinical trials research.