Focus on Alternative and Complementary Therapies
www.pharmpress.com/fact
Focus Alternat Complement Ther©2005 Pharmaceutical Press
Focus Altern Complement Ther 2002; 7: 337–40
Edzard Ernst, Adrian White
Close observers of the BMJ letters column on homoeopathy may have noticed that David Reilly recently introduced the concept of the ‘double positive’ study.1 It epitomises what many in complementary and alternative medicine (CAM) feel about much of CAM research and is therefore worthy of consideration.
A ‘double positive’ study is a clinical trial where improvements are noted in both the experimental and the control group (Figure 1). Such results are often reported in CAM and other fields.2 The argument is, that if patients in both groups improve, both treatments must be effective.
Figure 1. Digrammatic representation of clinical trial results in which both groups improved.
But what if one of the treatments was a placebo? The logic, we are told, is the same. As long as there is an improvement in both groups and no significant difference between them, the study is ‘double positive’, meaning the experimental treatment is effective. This logic seems pragmatic and patient-orientated; what patients (and clinicians) want to achieve most of all is clinical improvement. Questions relating to how this improvement came about are secondary, even unimportant. Reilly argues this case perfectly: if we reject the experimental therapy because it is no different from placebo, ‘we risk the paradox of throwing out treatments with efficacy exactly because they also show high effectiveness – when the real problem is that our studies lack the power to see the baby of efficacy in its bath water of context/non-specific/placebo effectiveness’.1 Others have had similar thoughts.3
As clinicians, we can intuitively follow these arguments; but they might be seriously challenged by others who have an input into health care. Most scientists would raise several questions about this use of the evidence. How do we know that we are observing any therapeutic effects at all in such trials? The change could well be the result of other factors (e.g. natural history of the condition).4 Even if the effect is due to the ‘context’ (including the therapeutic relationship) the trial was not designed to test context effects. A controlled clinical trial is typically designed to detect differences between the two treatments. In adopting the conclusion that ‘both groups improved’, one actually violates this protocol and makes a comparison between time point x and y (Figure 1). Controlled clinical trials are not usually designed to allow such a comparison. In addition, is it permissible to apply the context effects that occurred in a clinical trial into normal clinical practice where the context is likely to be very different? One is only justified in extrapolating what the study was designed to measure, (i.e. usually treatment effects). In particular, it would be wrong to extend the context effects of a very atypical form of ‘isopathy’ to the practice of homoeopathy in general. The ‘baby and the bath water’ argument (that the large effectiveness of the treatment obscures a small level of specific efficacy) is interesting and appears robust, but scientists would say that it was dealt with in the sample size calculation: the study was designed to detect a clinically useful effect. Finally, the study was not designed as an ‘equivalence’ study and should not be used in this way. This may appear just a technicality, but is an important and often misunderstood problem.
Health economists might ask why, if the experimental treatment is not superior to placebo, clinicians should use this treatment, which is presumably associated with costs and perhaps adverse effects? Why not simply use placebo? Would this not be cheaper and safer? A further question would be whether any other treatment exists that has been shown to work better than placebo for the condition in question. In this case, one should administer that therapy but neither the placebo nor the tested experimental treatment.
Those that organise the health service (and those who finance it, i.e. the taxpayers) would argue that standards in medicine ought to be the same for all types of medicine. If the concept of the ‘double positive’ trial were adopted in CAM, it would thus be applicable also to conventional medicine. Imagine a placebo-controlled drug trial where a new analgesic medication yields a 30% pain reduction that is not different from the effect size of placebo. Those who favour the ‘double positive’ nomenclature would need to concede that the manufacturer of the new drug could sell it on the basis of such findings. Many of us would probably feel uncomfortable with this scenario.
Ethicists might also have serious problems with the use of any treatment when the best evidence shows that it is no different from placebo, without fully informing the patient. The principle of patient autonomy demands that the patient has the right to know as much as the practitioner about the treatment before accepting it. A clinician would be considered highly paternalistic for administering a therapy that they believed to be effective, when the best evidence does not support this view, without full discussion.
Obviously, clinicians and others will continue to differ in their interpretations of clinical trials. This creates a tension that can and should stimulate fruitful debate. We consider ourselves ex-clinicians, clinical scientists, tax-payers and sometimes patients. Therefore, we sympathise with the view of the clinician but, in the final analysis, find the arguments of science to be compelling: trials should be interpreted in accordance with their design. Studies designed to compare an experimental CAM therapy with placebo that show improvement in both groups but no difference between them2,5 do not demonstrate efficacy/effectiveness.
David Reilly
I am a clinician, a scientist and a patient. In all three roles I look for science to help me understand the phenomenology of human care. However, it is just one piece of understanding and, in turn, trials are smaller pieces we use to build the jigsaw of an ‘evidence mosaic’, no more, no less.1 Each piece develops meaning in their context, not in isolation – why else the endless debate and uncertainty that surrounds them? The BMJ choose to take one trial and on the basis of a negative interpretation of its result run a front cover headline saying the test treatment was a ‘waste of time’.2 This was pseudoscience. The reader (but not the media listener or abstracts’ reader) can look at the trial in itself (and note both groups showing subjective and objective improvement) and the context of similar studies (showing positive effects of treatment greater than placebo) and see that the situation is not as cut-and-dried as the BMJ would have us believe. The trial was just one piece in a complex picture. In raising what I called ‘the double positive paradox’3 I was concerned about two possibilities in medicine (orthodox or CAM) around the issues of ‘negative trials’: (1) that effective treatments may be rejected because of underpowered studies, and (2) that challenges produced by effective ‘context effects’ are not being addressed.
So I am, with Ernst and White, in the pursuit of rigorous science and clarity and grateful that their team have made significant contributions to sharpening up CAM debate. If both groups improve in a trial then, like them, I do not support reflex interpretations such as ‘both treatments must be effective’ or ‘the experimental treatment is effective’ or ‘how this improvement came about is secondary or unimportant’. However, with mirrored logic I do not think that if both groups improve then I can conclude immediately that neither is effective, nor that the experimental treatment is ineffective nor that underlying mechanisms have been proven. Like Ernst and White, I do not think the idea I am suggesting is to be ‘adopted in CAM’ in the manner that if a trial is a ‘double positive’ then it is used to peddle nonsense – be it CAM or orthodox nonsense. I never knowingly prescribe a placebo and believe it is usually unethical to do so, and I am against an argument that says ‘if the customer is happy, then all is well’ – customers can be made happy with deceit, woolly thinking, toxic nostrums and ineffective soothing. There is too much medicine in general, and too much iatrogenesis (and too much CAMiatrogenesis).4
While these are interesting general discussion points, with Ernst and White we might all raise less heat and more light if we focus on the core scientific idea here: when both groups behave similarly in a study – a negative outcome – let us then ask a second-order question – did neither group improve (a ‘double negative’ study) or did both groups improve (a ‘double positive’). To say ‘of the latter that scientists would say that it was dealt with in the sample size calculation’ is like saying because there was a speedometer in a car the driver must have driven safely. The Lewith et al. study triggering this debate did not have an adequate sample size calculation – how could it when it was a pilot of new design in a new context with an unknown context effect size?2 This pilot then turned up what appears to be a very strong context effect, and a post hoc power calculation needs to be done now to design the next study. A warning has already been raised about a confounding high ‘zeal factor’ in such a context.5
As Figure 2 suggests, in the first circumstance of a double negative, the argument may be over; the bath water is cold. In the second circumstance of a double positive, something may be happening (the water seems to be warm or hot). I am not saying that we should jump in the bath just because it is warm. Rather, this raises different third-order questions, such as the need for a ‘no treatment’ arm or a larger study with greater power to see if there is a baby of specific efficacy in this warm bath of successful ‘non-specific/context/placebo’ effects.
Figure 2. Decision tree for interpreting clinical trial results
Thus, I agree that trials should be interpreted according to their design, but if that design is underpowered, let us tackle that problem. We need caution here as, meantime, the clinical care of patients continues in the face of the uncertainty from trials. Thus, other elements of the evidence mosaic are important, such as the clinical participants’ experiences and judgements and the quality of the encounter.6 Academic analysis does not exist in a cultural vacuum and ‘scientific’ pronouncements – never mind tabloid analysis such as the BMJ headline – can do harm when misinterpreted. Statements that there ‘is no evidence of effect’ for a treatment is being interpreted the same way by commissioning bodies whether the underlying data is based on ‘double positive’ trials (where there should be an ‘unproven by trial’ verdict), or double negative trials where lack of effect is proven. The one says patients appear to be benefiting and further study is to be considered – be it of specific and/or non-specific effects, the other can say there is no benefit.6 Let’s get away from drawing black- and -white conclusions (how ever comforting or empowering) from grey data.