Trial By Error: Update on BMJ’s CBT-Music Therapy Study (h/t Steinkopf and Tack)

By David Tuller, DrPH

I have written multiple posts this year about a Norwegian study of cognitive behavior therapy plus music therapy as a treatment for chronic fatigue after acute EBV infection (aka mononucleosis and glandular fever). The study, published in April by BMJ Paediatrics Open, was rife with methodological and ethical flaws. It should not have been accepted in the first place.

Among many issues, the investigators presented their research as a feasibility study seeking data to justify funding for a fully powered trial. In reality, it was designed as a fully powered trial, but it failed to meet recruitment goals and produced terrible results. It also included an outcome measure–post-exertional malaise–that was not included in the trial registration and protocol. How it ended up in the trial was not explained.

Norwegian patient advocate Nina Steinkopf blogged about the problems with the study in early May. Not long after, the journal posted a response to the study from another patient advocate, Michiel Tack, who also highlighted multiple serious lapses. At that point, I started doing one of the things I do pretty well, trying to ensure the issue got more attention by blogging about it and sending letters, along with colleagues or on my own, to the journal and BMJ editors.

The false description of the research as a feasibility study was only one of many points of concern that several colleagues and I raised in a May 31st letter to the journal and Dr Fiona Godlee, BMJ’s editorial director. I followed up with further letters in efforts to nudge the process along. BMJ responded that it was working to address the various issues, yet it placed no advisory notice on the problematic paper in the meantime.

On October 21st, the excellent blog Retraction Watch reported that the journal had retracted the study and was replacing it with a new version–an approach to dealing with problematic papers that has gained popularity in recent years. Around the same time, I received a letter from BMJ’s research integrity coordinator about this resolution of the matter. After reading the replacement paper and the retraction notice, I expressed my displeasure at what I considered to be BMJ’s abrogation of its obligations. I posted that exchange here.

The retraction notice thanked Michiel Tack but did not present an accurate picture of what went wrong. According to the journal, “We identified a mistake in the editorial process which led to this misrepresentation of the research that was undertaken.” No further explanation was provided. Yet a review of the relevant available documentation contradicts this characterization of what occurred. If other information confirms BMJ Paediatric Open’s account of a “mistake,” the journal should make it available. The retraction notice did not mention the PEM problem and other concerns raised about the original paper, nor did the replacement address many of them.

Again: BMJ’s claim about what occurred is not consistent with the documentation available. The peer reviews posted with the now-retracted version of the paper were not ambiguous. One of the reviewers acknowledged not having reviewed the paper, he noted that he did not read €œbeyond the abstract.€ (BMJ didn’t mention this breakdown of the peer review process in the retraction notice.) The other peer reviewer was confused about the draft of the paper and asked a straightforward question: Was this research designed as a feasibility study or as a fully powered trial? The peer reviewer wanted the investigators to clarify.

The peer reviewer posed a binary question. One of the two available answers was right, and one was wrong. One answer was true and the other was false. Given these facts, the retraction notice’s reference to a “mistake” on the editorial side is puzzling and doesn’t make sense. Mistake? This study was designed as a fully powered trial, although it ended up being underpowered. Instead of explaining that, the investigators chose to respond to the peer reviewer’s question by declaring the opposite, that it was designed as a feasibility study.

So what was the editorial “mistake”? Did a BMJ editor advise the investigators to pretend their underpowered trial was a feasibility study when it wasn’t? If that is the case, then surely BMJ has an obligation to make that public. At this point, the opaque retraction notice reads more like an attempt to white-wash bad behavior on the part of the investigators. This sneaky strategy certainly makes it easier for BMJ Paediatrics Open to avoid having to press the investigators to explain or justify their apparent decision to misrepresent their work–behavior that arguably meets standard definitions of research misconduct.

The investigators’ new version of the paper is still trash and it still stinks, for multiple reasons–including some that marred the first version. But at least it does not misrepresent itself as a feasibility study. However, in violation of its own stated policy, BMJ Paediatrics Open is refusing to post the peer reviews for this replacement version–even as it tells me in its latest letter (see below) that readers can be “reassured” about the robustness of its peer review process.


My latest letter from BMJ’s research integrity coordinator

As mentioned above, I responded with some displeasure to the first letter I received from BMJ informing me of the retraction and replacement. Below is my next (and I guess final) letter about this matter from the research integrity coordinator, received on November 12th.

Dear Dr Tuller

Thank you for your email.  

As you will know, there is controversy among publishers around when to publish expressions of concern.  The retract and republish approach is recommended by COPE’s Retraction Guidelines and in this case, we felt that it would be better to undertake these simultaneously.  We did not feel there was an urgent patient safety issue which required immediate retraction of the paper.

The Editor in Chief, Imti Choonara, would welcome a brief e-response to the article making your point about the outcome of post-exertional malaise and we will ask the authors to respond.  We have satisfied ourselves that this does not represent research misconduct.  

You will see from our previous response to you that this new submission has undergone full editorial and peer review. We are not going to post the peer reviews on this paper.   The peer review process relies on trust and readers may be reassured that we have learned from our previous errors and ensured our processes are more robust.

We now consider this matter closed and we will not engage in any further correspondence regarding the editorial process.  Should you wish to respond in the journal regarding the post-exertional malaise outcome, however, you would be welcome.

Kind regards,

Simone Ragavooloo
BMJ Research Integrity Team
On behalf of Fiona Godlee, Editorial Director

From my perspective, BMJ’s actions in this matter represent the opposite of research integrity. I will make this point to the research integrity coordinator when I get the time to write back to her, as well as the editor-in-chief of BMJ Paediatrics Open and Dr Godlee, BMJ’s editorial director.


Steinkopf and Tack respond to “retract and replace”

In the meantime, both Nina Steinkopf and Michiel Tack have submitted responses to the new publication in BMJ Paediatrics Open. Neither response has been posted to date, for unknown reasons. Steinkopf has blogged about the situation here. €œUnfortunately, in the republished version of the paper, most of the issues raised remain unresolved,€ she writes.

With Tack’s permission, I am posting below the comment he submitted to the journal in response to the new paper. Remember, the journal thanked him in its retraction notice for bringing attention to important concerns.


This is a resubmitted version of a retracted paper and many issues remain unresolved

Inconsistencies in the retraction notice

It is unfortunate that the BMJ Paediatrics Open website doesn’t make clear that this is an amended, republished version of a paper about which multiple methodological issues were raised [1] and that, eventually, was retracted [2]. The publication history, for example, does not mention the retracted version. Only at the very end of the article under €œProvenance and peer review€ is a link provided to the previous version. I suspect few readers will notice this. Hopefully, BMJ Paediatrics Open will place a more visible notification at the top of the page, for example the one it has put above the retraction notice [3].

Because this is a new publication that received its own DOI, the criticism of the retracted paper [1, 4] is no longer visible. This is unfortunate because, as I will try to clarify in this comment, many issues that were raised have not been addressed in this republished version.

In the retraction notice, BMJ Paediatrics Open explains that the retracted paper was misrepresented as a feasibility trial due to a mistake in the editorial process and that this was not due to error on behalf of the authors. As I will explain below, this is an implausible explanation given the information that is currently available. The review history shows that one of the reviewers was confused with how the authors originally presented their results. She stated: €œI struggle to understand from the aims of the study and the way the study is described whether this was intended as a feasibility study €“ i.e. to look at feasibility (can this be done?), acceptability (how do participants experience it?) and to give some indication of potential effect sizes to power a future larger scale trial, or whether this was intended as a fully powered trial. Throughout, I think this needs to be clarified for the reader and interpretations/conclusions drawn in light of what the aim was.€

To this, the authors responded: €œThank you. We agree €“ this study should be regarded a feasibility study, and the manuscript has been rephrased accordingly.€ So it seems that the authors explicitly stated that this study should be regarded as a feasibility study when this was not the case. This information conflicts with the editorial statement that the authors were not in error. I think more information should be shared to clarify this contradiction.

It is unfortunate that the peer review history of this republished version has not been made public. On its website, BMJ Paediatrics Open claims it is committed to open peer review and states €œas part of this commitment we make the peer review history of every article we publish publicly available.€ It is unclear why the peer review of this republished version became an exception to this commitment. In the peer review of the retracted version, one reviewer admitted that he hadn’t read beyond the abstract. Considering the problems surrounding this work, it seems particularly important that readers are able to read through the review process and understand the decisions taken by the journal and authors.

The paper doesn’t explain that the study was designed to test large effect sizes

A notable aspect of this study is that the number of participants the authors tried to recruit (n = 60) is much lower than the number the power analysis suggested was needed to detect a difference of moderate effect size (n = 120). In other words, the lack of power was not only due to recruitment problems or drop-outs; the study was designed as such.

The statistical analysis plan [5] shows that the study was powered to test a large effect size, namely 0.8 times the estimated standard deviation of the primary outcome measures (approximately 2000 steps per day). The authors justified this by saying that since “CBT alone is documented to have a moderate effect size in CFS/ME, only a substantial effect size is of direct clinical interest. Analogously, only a substantial treatment effect is of interest regarding markers of pathophysiology.€ [5] In the protocol, they also argued that €œthe FITNET study suggests that larger treatment effects might be assumed in adolescent CFS/ME patients as compared to adults.€ [6]

In this republished version of the manuscript, however, nothing suggests that the authors anticipated the intervention to produce a large effect size and that the intervention was designed to this hypothesis. The fact that the authors previously agreed to frame this study as a feasibility trial (when this was clearly not the case), suggests there was a willingness to disregard that the intervention failed to provide the anticipated effect sizes. Even though the republished version avoids the term ‘feasibility trial’, the issues remain the same.

There is little data to support the conclusion that the intervention is €œfeasible and acceptable€

As the intervention failed to provide the large effect sizes anticipated in the statistical analysis and protocol, the retracted paper focused on other aspects than the efficacy of the intervention, such as feasibility and acceptability. The same is true for this republished version which highlights that €œcombined CBT and music therapy is feasible and acceptable in adolescent postinfectious CF.€

The data do not support this conclusion. More than half of the eligible individuals (n=48) did not consent to participation. Of those who started the program 6 out of 21 or 28% dropped out, compared to only 4.5% in the control group. The claim that the intervention is acceptable seems to be based only on a high attendance of the 15 patients who did start and stayed in the intervention arm of the study.

The authors also point to a lack of statistically significant differences of adverse events between the intervention and control group. But given that the trial was €œstrongly underpowered€ to test the efficacy of the intervention, the same reasoning would apply to adverse events as well.

It should also be noted that the trial registration [7] lists approximately 20 different outcome measures for this trial, and with the exception of adverse effects, none of these focused on the acceptability of the intervention. Therefore, the main conclusion of this republished version of this paper remains unwarranted.

The outcome measure for post-exertional malaise was added post-hoc

In the discussion section, the authors state: €œwe observed a concurrent tendency of improvement of many symptom scores, including fatigue and postexertional malaise, in the intervention group.€ The authors fail to mention that approximately 20 outcomes measures were registered for this study but that the outcome ‘postexertional malaise’ was added post-hoc: it was not listed as an outcome in the protocol (1), statistical analysis plan (2), or trial registration (3).

The tendencies towards improvement were all quite small. The biggest difference found was for the primary outcome measure, where patients in the intervention group did worse than those in the control group. Patients in the intervention group had a mean of 6198 steps per day post-treatment, 2059 steps lower than in the control group. The text of the paper does not mention that this difference reached statistical significance in the per-protocol analysis.

The reported recovery rates risk misleading readers

Instead, the result section highlights a trend towards a higher recovery rate in the intervention group. The authors defined recovery as a score lower than 4 points on the Chalder Fatigue Scale using a dichotomous scoring method (range 0-11 points). A score of 4 or higher on the Chalder Fatigue Scale, however, was already used as an inclusion criterium. This means that participants could be classified as recovered as a result of reporting an improvement of just 1 point on the Chalder Fatigue Scale.

It should also be noted that the Chalder Fatigue Scale does not assess the intensity or impact of fatigue. Instead, it assesses whether participants experience fatigue-related symptoms such as having€ problems starting things€ or finding it €œmore difficult to find the right word€ more than usual. Consequently, it seems inappropriate to use the term €œrecovery rate€ for the percentage of participants who score lower than the threshold of 4 points on the Chalder Fatigue Scale. Post-treatment there was no difference in the percentage of patients meeting this 4-point threshold between the intervention and control group.

The intention-to-treat analysis of the Chalder Fatigue Scale ordinal scoring (range 0-33) was also reported, and this showed little difference between the two groups. A plausible explanation for what the authors describe as €œa trend towards higher recovery rate in the intervention group€ is the high drop-out rate. At follow-up, only 13 patients were in the intervention group. The analysis of recovery rates highlighted by the authors do not take into account the 8 persons who were in the intervention group but were lost to follow-up. There are little reasons to suggest that more patients in the intervention group recovered than in the control group. It is unfortunate that the authors have used this term in their manuscript.


[1] Tuller D. Trial By Error: More on that Norwegian CBT/Music Therapy Study. Virology Blog. May 16, 2020.

[2] Marcus A. BMJ journal retracts, replaces study on chronic fatigue in children. Retraction Watch. October 21, 2020.…odBA9xf5vDnNPlIWayQ6mwr5JOVhslbloUjiRFmHwkL-k

[3] BMJ Paediatrics Open. Retraction: Cognitive€“behavioural therapy combined with music therapy for chronic fatigue following Epstein-Barr virus infection in adolescents: a feasibility study.

[4] Tack M. Inaccuracy in reporting CEBA part II. BMJ Paediatrics Open.

[5] Statistical analysis plan €“ CEBA part 2.…/bmjpo-2020-000797supp002_data_supplement.pdf

[6] Akershus University Hospital. Research Protocol – processing. Mental training for chronic fatigue syndrome (CFS/ME) following EBV infection in adolescents: a randomised controlled trial. Available from: og ungdomsklinikken/Paedia/Forskningsprotokoll – behandling.pdf

[7] Identifier: NCT02499302. Available from:

Comments are closed.

Scroll to Top