Trial By Error: Quartet of Trials Reveals Limitations of CBT for "Medically Unexplained Symptoms"

By David Tuller, DrPH

A year ago, I wrote a post about how the biopsychosocial ideological brigades had completed a trifecta of major studies that investigated cognitive behavior therapy for a variety of so-called “medically unexplained symptoms” (MUS). As a group, the studies demonstrated the overall ineffectiveness of CBT as a treatment for this category of disorders, despite herculean efforts to spin the results the other way. MUS is usually defined to include chronic fatigue syndrome, irritable bowel syndrome, fibromyalgia, functional neurological disorders, and other conditions for which pathophysiological mechanisms have not been identified.

These three trials were: the PACE trial for chronic fatigue syndrome, which also tested graded exercise therapy; the ACTIB trial for irritable bowel syndrome; and the CODES trial for psychogenic non-epileptic seizures, a form of functional neurological disorder also called dissociative seizures. Now a fourth major trial can be added to the group. PRINCE Secondary is a recently published study of CBT to treat so-called “persistent physical symptoms” (PPS), another term for MUS. I blogged last month about how the investigators tried to present the trial as a success despite null results for the primary outcome.

These trials were the biggest in their domains to date. In each case, the investigators published trial protocols, which cited evidence from earlier research supporting their approach and generally presented the proposed study as the essential next step in obtaining authoritative information for guiding clinical care and/or public health policy. All four were centered at leading UK universities, PACE at Queen Mary University of London and the other three at King’s College London. And they were all based on the same unproven biopsychosocial hypothesis, that symptoms were perpetuated by unhelpful cognitions, excessive focusing on somatic sensations, and related psychological factors. The CBT interventions were said to be designed specifically to target the faulty perceptions presumed to be perpetuating the symptoms in each condition.

What is remarkable about this quartet of trials is that the evidence is pretty clear: CBT did not work as intended in any of them. But you wouldn’t necessarily know that from the papers, which all claimed success to one degree or another. In none of the studies did the poor findings lead the investigators to raise serious questions about the appropriateness of the interventions or the robustness of the theoretical assumptions behind them. Instead, all four studies engaged in methodological and/or statistical gymnastics to avoid having to conclude or at least consider the possibility that CBT might not be an effective intervention after all.

There is significant overlap among the investigators in these four studies, as there is among the strategies deployed to downplay and/or obscure the exceedingly modest or null results. The studies also share an overall design problem, they are unblinded to participants and rely on subjective outcomes for their claims of success. Therapeutic interventions are understandably difficult if not impossible to blind, which makes it all the more imperative for investigators to make sure to include some kind of objective measurements, not to mention adequate control groups. No less an authority than the Journal of Psychosomatic Research has noted in a recent commentary that subjective outcomes in studies in which blinding is not rigorous are at increased risk of bias. Any such findings must therefore be interpreted with that limitation in mind.

(For more on the MUS mess overall, read Goodelf’s series of posts on the Opposing MEGA blog, starting with this one. Also, Dr Keith Geraghty has taken a hard look at the CBT model of chronic fatigue syndrome and found it wanting; I was a co-author on this paper.)

**********

Turning Bad Results into Positive News

Here is a brief summary of how investigators in these studies managed to turn unappealing results into positive news:

*In PACE, the investigators weakened their assessment methods with rampant outcome-switching, enabling them to report modestly attractive results in favor of their interventions. They ignored or dismissed the null or minimal results of their own objective measures. Subsequent re-analyses based on data that were released under court order revealed that the results per the methods outlined in the protocol were either null or easily within what would be expected by bias inherent in the study design. At long-term follow-up, the authors prioritized “within-group” comparisons over the “between-group” comparisons that are of interest in a clinical trial, another form of outcome-switching. A commentary accompanying the publication of the initial results, written by colleagues of the investigators, amplified the questionable claims of success.

*The ACTIB trial tested both telephone CBT and web-based CBT. For the latter, results for both of the study’s two co-primary outcomes, a symptom severity scale and a work and social adjustment scale–were below the identified thresholds for clinical significance. The investigators did not highlight this information. (The telephone CBT fared a bit better across the board.) King’s College London, which touted the findings in a press release, has licensed the web-based CBT program to a private company. The company has received approval to market the program in the US and UK. It will presumably be promoted as “evidence-based.” At least two of the investigators have declared a financial interest in the program. At long-term follow-up, the results for web-based CBT were even worse on both measures.

*The CODES trial tested CBT for psychogenic non-epileptic seizures, also called dissociative seizures. At 12 months, the trial reported null results for its primary outcome, in fact, it found a non-statistically significant trend toward greater seizure reduction in the non-intervention group. The investigators touted instead several of their secondary outcomes, most of them vague and non-specific to the condition.* [Correction: In this sentence, I wrote initially that they touted “a minority of” their secondary outcomes. In fact, nine out of 16 secondary outcomes had statistically significant findings, although only five were statistically significant after correcting for multiple comparisons.] After years of promoting seizure reduction as the appropriate primary outcome, they argued unconvincingly in CODES that other outcomes besides seizure reduction were likely more important to patients. In an accompanying commentary, a friendly colleague defended CBT based on his own clinical experiences and reinforced the trial’s effort to downplay the null results from the primary outcome. King’s College London issued a press release that hailed the study’s findings but did not mention that the primary outcome had null results.

*The Prince Secondary trial found no benefits at 12 months for its CBT intervention on the primary outcome measure, the same work and social adjustment scale used in the ACTIB trial. In fact, the entire confidence interval for the results on this scale fell under the threshold for clinical significance for that measure. The conclusion of the abstract did not mention the null results of the primary outcome but focused on very modest improvements in a minority of the secondary outcomes. The investigators suggested that these reported benefits, which would have disappeared if subjected to the standard conservative method of correcting for multiple comparisons, indicated “preliminary evidence” that CBT might be “helpful.” This conclusion ignored the fact that the trial was designed to provide definitive and actionable evidence related to efficacy, not more “preliminary” data requiring further research.

A review of this collection of trials indicates that all of them have benefited from one or more of the following strategies:

*outcome-switching
*claiming success based on secondary outcomes
*ignoring/dismissing objective findings
*prioritizing results that are not corrected for multiple comparisons
*misinterpreting minimal or null results at long-term follow-up
*being accompanied by a friendly commentary that amplifies the claims of success
*being promoted in a positive light by university public relations departments

Looking beyond the UK, a 2017 study from a team of Dutch investigators could fit in with this group, a large trial of CBT for prolonged fatigue after an acute bout of Q fever, a bacterial infection. (The trial also investigated a regimen of antibiotics.) This fatigue would indisputably fall into the MUS category; the investigators cited CBT’s purported success in treating CFS as the rationale for the trial. Modest reported benefits for CBT at the end of the treatment wore off by the one-year follow-up. A robust critique of the trial, which was published last year, outlined serious methodological issues and undermined any claims of CBT’s effectiveness. (More on this interesting trial later.)

Start typing and press enter to search