• By Dartington SRU
  • Posted on Thursday 02nd April, 2015

How to draw broader conclusions from randomised trials

Randomised controlled trials are the best way to discover the effects of a programme. But there’s a catch. RCTs show only how the programme worked with the people who agreed to participate in the trial – and those people may be a very different mix than the population for whom the programme is really intended. Here, researchers discuss a statistical solution to the problem.

Randomised controlled trials (RCTs) are the “gold standard” for programme evaluations. Such an experiment, when conducted well, gives an accurate measure of how well a programme works – right?

Yes and no. A good randomised trial does demonstrate the effect of the intervention, but only for the participants in the trial. If these participants are similar to the programme’s target population, then well and good: the trial’s results are “generalisable” to the target population.

However, this is often not the case. Parents who sign up for an RCT on parenting programmes, or schools that willingly join a trial to test an innovation may be unusual in many ways. The results of a trial with them may not apply, say, to a wider roll-out of the same programme.

Trying to recruit trial participants who are more similar to the intended target population is one solution. But RCTs are time-consuming and expensive. It’s not always practical to run a new trial, especially when policy or clinical decisions have to be made on a tight timescale.

So is there a way to get more information out of existing RCTs? What if there were a way to generalise about multiple populations using an evaluation of a single sample? A US-based research team may have found a way. Using statistical adjustments, they estimated what the results of a school-based behavioural intervention would be if applied to a state-wide general elementary school population.

The researchers, based at Johns Hopkins and the University of Virginia, wanted to explore ways to make trial results “more applicable to policy and clinical questions.”

They argue that statistical weighting procedures, used on data from randomised trials, may be able to generalise the results of trials to different populations.

Taking data from a trial of a positive behavioural interventions and supports (PBIS) programme conducted in 37 schools in Maryland, the researchers attempted to generalise the findings to speak about the likely impact of PBIS across all 717 elementary schools in the state.

They used “weights” similar to those used by survey researchers. Large weights were assigned to the results from trial subjects who were similar on observed characteristics to a large portion of the Maryland school population. Smaller weights were assigned to the results from those who were similar to a small portion of the population.

The team found that the estimated effects of PBIS if implemented in every school in the state were comparable to the original effects found in the trial.

The study shows how policymakers and researchers can take existing data from an RCT and predict how the intervention will likely perform in their own target population. It also represents a significant implication for funders, in that it suggests that much can be inferred about the effects of interventions on different populations without the expense of a new trial.

Weighting trial results to match a different population is a much better approach than hoping or assuming that trial results will still apply in a new situation. However, there are at least three substantial limitations of the approach, as the authors point out.

First, the sample can be weighted only by variables that have been measured in both the trial and in the general population. In the PBIS trial, for instance, it was possible to calculate weights based on the schools’ attendance rates, size, and reading and math test scores – since these numbers were available for both the 37 schools in the trial and the 717 schools in the state.

However, other important differences between schools – such as a school principal’s support for PBIS – could not be known in the general population. If other Maryland principals are much less enthusiastic about PBIS than the trial participants, for instance, the programme might be less effective at the state level than the weighting procedure predicts.

Second, the weighting procedure doesn’t take account of treatment heterogeneity. Most programmes have different effects on different participants – they work well for some but not for others – and RCTs are often too small to discover why these differences occur. This limitation in the RCT also applies to the estimates for the target population.

Third, the weighting procedure may not produce good estimates when the trial sample and the general population differ substantially.

The study paves the way for follow-up discussion and research. The ability to speak with evidence about the effects of an intervention on a specific target population without the need to conduct a full-blown RCT could be an incentive for policymakers to invest in more research. The discussion also highlights the need for those conducting RCTs to give great consideration to their sample, as its quality will determine whether and how the findings can be generalised.


Stuart, E. A., Bradshaw, C. P., & Leaf, P. J. (2014). Assessing the Generalizability of Randomized Trial Results to Target Populations. Prevention Science. DOI 10.1007/s11121-014-0513-z

Return to Features