• By Dartington SRU
  • Posted on Thursday 02nd January, 2014
Standards of Evidence

Standards, standards, everywhere

We have standards in many walks of life – for hygiene in restaurants, for quality in manufacturing, for proficiency amongst professionals. People or organisations meeting the standards are known to be ‘sound’ or good. Equally, such standards serve as a guide to others who aspire to reach that level of quality.

In the children's services field we are seeing the emergence of a number of sets of standards – specifically, standards of evidence. This is on the back of a renewed commitment to evidence informing policy and practice.

Those involved know that not every intervention that claims to improve children’s lives is based on robust evidence of impact, and by no means all of those interventions that do should be recommended for implementation. So this trend is definitely to be welcomed: we need standards. Otherwise, how does a commissioner choose among the plethora of interventions claiming to reduce bullying? Or how will a parent know if the parenting course at their local children’s centre is worth the paper the advert is printed on? Or how can a teacher differentiate between robust methods of teaching socio-emotional skills and the latest wheeze of a discredited quack doctor?

In the USA, there are already over 20 databases of evidence-based programmes, all of which are supported by standards of evidence of varying degrees of detail. Although the UK has come late to the game, several options are available here, and more are on the way, notably from the government’s 'What Works' evidence centres charged with ensuring that evidence is at the heart of decision-making.

So what exists? A comprehensive overview is not possible here, but the following gives a flavour.

The Government’s Magenta Book sets a broad direction of travel but is not intended for application to specific questions. The National Institute for Health and Clinical Effectiveness (NICE), a What Works centre, draws conclusions about impact based on evidence ranging from systematic reviews or meta-analyses to expert opinion and agreed good practice.

The National Academy for Parenting Practitioners has standards that cover a reasonably wide spectrum of evidence of impact, ranking evaluations from simple pre-post test without a control group to the use of two or more RCTs, but its focus is limited to parenting programmes. The Institute for Effective Education’s Best Evidence Encyclopaedia takes a similar approach to education programmes. The Education Endowment Foundation, another What Works centre, applies standards that focus mainly on the number and quality of meta-analyses with a view to identifying effective teaching and learning practices.

Organisations like the Centre for Excellence and Outcomes (C4EO) and the Centre for Analysis of Youth Transitions (CAYT) focus on rating the wide variety of innovation in children and youth services, and are therefore less concerned with robust impact evaluations of well established programmes (while accepting their importance).

At the Social Research Unit, we co-authored the standards of evidence that underpin the Greater London Authority’s Project Oracle and the review by Graham Allen MP of early intervention. In a slightly adapted form they now underpin Investing in Children and Blueprints for Healthy Youth Development. Developed in close collaboration with an international group of experts, they require a high standard and can be applied to all aspects of children’s health and development.

Plurality is welcome. Our colleague and collaborator on Investing in Children at the Washington State Institute for Public Policy, Steve Aos, calls himself an 'independent investment advisor', but rather than providing counsel on stocks and shares to private investors he explains the likely costs and benefits of competing policy approaches and programmes to politicians and decision-makers. Just as there are many investment advisors in the stock market from which to choose, he feels that there should be many options in the public sector.

At the same time, too much choice can be confusing, especially for the busy practitioner or an interested layperson, who may find that everything meets at least one set of standards but nothing meets all of them. As Samuel Coleridge’s Rime of the Ancient Mariner suggests, quantity is nothing without quality: ‘Water, water, everywhere, Nor any drop to drink’. Too many standards may be as bad as no standards at all.

So, which standards are best? It all depends on the question, which, in turn, is partly about subject and partly about function. If the focus is parenting programmes, for example, the National Academy’s list is a good place to start, but for education programmes the Best Evidence Encyclopaedia is more suitable. If the intention is to identify the best interventions today then Blueprints arguably sets the highest standard, but if it is to find innovations that may be worth nurturing so that they can become the best interventions of tomorrow, then C4EO or CAYT are a better bet. What matters is that each option has clearly articulated and defensible standards.

Dynamism is essential. It is unhelpful to think of standards as fixed and unchanging. Take the Social Research Unit’s standards. Some critics say they are too low – for example, because they don’t insist that evaluations are independent of the programme developer. Others say they are unreasonably high and exclude a lot of interventions that deserve wider use. Our view is that they are necessarily discriminating – it would be nonsense if everything met the standards – but also realistic – it would be equal nonsense if nothing met them. And, critically, we anticipate them getting higher as more interventions come onto the market and as more and higher quality evaluations are conducted.

By our standards, for instance, an intervention can currently be approved on the basis of one good RCT (it is a sobering reflection on the state of the field that this is considered ‘high’). In the future we expect greater weight to be given to, say, whether a study was independent of the developer, and the length of follow-up beyond the end of the intervention. Just as comforts once enjoyed only by business class flyers are now the norm in standard class, so standards now reserved for the ‘Best’ interventions will become the new ‘Good enough’. Moreover, each year will bring new methods, new measures and new understandings about the frailties of what is now considered ‘robust’ evaluation. The standards will therefore need to become more nuanced as well as tougher.

We still depend on human judgement. Standards generally comprise clearly articulated dimensions – in our case intervention specificity, system readiness, evaluation quality and impact – underpinned by criteria that can be rated by skilled researchers and practitioners. But few cases are clear-cut. What happens when the sample size is on the margins of having sufficient statistical power to demonstrate impact, but the study is otherwise robust? Or when several evaluations undertaken in collaboration with the programme developer produce positive results, but one study without the programme developer’s involvement indicates negative results? In theory it is possible to code all of these eventualities but, in practice, sound, skilled human judgement adds more value.

A mother of all standards?

All of this speaks to the need for diversity allied to rigour, and strong collaboration and sharing between those helping to articulate better standards of evidence.

At the Social Research Unit, for example, we know that the success of our high standards of evidence depends on the success of organisations like C4EO and CAYT in promoting good innovation: we simply will not be able to build the number of evidence-based interventions unless more people embark on the journey from innovation to proven impact. We would also like to think that by encouraging children’s services innovators to aspire to meet our high standards in the future we will help C4EO and CAYT in their work.

Inevitably, if spats occur, there will be pressure to bring all of this work together in a ‘mother of all standards’, an idea that we know from experience would be unmanageable even if it were in any way desirable. Far better to spend that energy charting how existing standards relate to one another and getting better at communicating those standards to interested audiences. When you research washing machines or cameras on Which? you can be confident that a product with a ‘Best Buy’ kitemark is good quality and reasonably priced. You can also read the small print to see what tests were applied and how the product fared in them. This level of transparency and the elegant combination of simplicity and detail is something to aspire to in our field.

Finally, humility must run through everything we do. We know so little in our field that we are not in a position to speak with absolute authority. There are benefits to drawing successive lines in the sands of changing policy and practice, learning as we go, continually getting better at the difficult task of improving the health and development of our children.

This blog closes with some suggested ‘rules’, or principles, that might help those of us involved in developing and applying standards of evidence:

1. Standards can and should have different functions.
2. Plurality is good, but too many standards may be as bad as no standards at all.
3. Standards must be discriminating: on given criteria some interventions are better than others, and we shouldn’t try to claim that everything is good.
4. Standards must be achievable: there is no point in setting the bar too high and saying that nothing reaches it.
5. Standards must go up over time: what we think is high now should be considered normal – or even low – in, say, 2024.
6. Standards need to be communicated well: we need kitemarks, but we also need readable small print.
7. Standards still need human judgment: it is not possible to reduce everything to a checklist.

Nick Axford

Return to Blogs