Trial By Error: Dutch Team Offers “Dog-Ate-My-Data” Excuses for Not Reporting Null Objective Findings

By David Tuller, DrPH

Two months ago, Clinical Infectious Diseases (CID), a high-impact journal, published a study called “Efficacy of Cognitive-Behavioral Therapy Targeting Severe Fatigue Following Coronavirus Disease 2019: Results of a Randomized Controlled Trial.” The study, nicknamed ReCOVer amd conducted in the Netherlands, purported to provide the “first evidence for the positive effect of CBT in patients with severe post–COVID-19 fatigue.” The study received widespread attention and has been highlighted credulously in social media and news articles—including a recent article in Slate.

The 114 subjects were randomized to receive either a 17-week course of CBT, called Fit after COVID, or “care as usual” (CAU). When this study was announced a few years ago, I noted that it was an unblinded study relying on subjective outcomes—a recipe for generating a significant amount of bias. In other words, it was essentially designed to produce positive findings. And that is what happened.

The primary outcome was the mean difference in self-reported fatigue at the end of treatment and six months later on an instrument called the Checklist Individual Strength. The fatigue subscale of the CIS includes eight questions, each of which can be rated from 1 to 7; the final scores range from 8 to 56*, with higher scores indicating greater fatigue. The difference in the means between the two groups was 9.3 points and 8.4 points, respectively, at the end of therapy and six months later. At that point, the mean score was 31.5 in the CBT group and 39.9 in the CAU group. [*In the sentence above, I originally wrote 52, not 56; I apologize for the mathematical error.]

As with so much of the research from the CBT ideological brigades, the results were less than meets the eye; it is impossible to take them at face value. It should not be hard to understand that if you offer one group supportive and compassionate attention from a sympathetic therapist over a period of several months, they are more likely to provide positive answers on questionnaires than members of a group not receiving the intervention. Even more so if you prime them by assuring them repeatedly that CBT has been shown to work for such conditions, as is frequently the case in this sort of research.

In other words, the findings are not a surprise. As I wrote in a recent post, the research documented that unblinded studies relying on subjective outcomes yield positive results. These results have been over-hyped from Amsterdam to Harvard Medical School, even though the self-reported benefits were pretty modest—well within the range that might be expected from bias alone.

The authors noted that six points is considered a clinically significant difference on the CIS fatigue scale. So the difference between the two groups was only marginally outside what would be considered a clinically significant difference. Nonetheless, the authors were able to claim that more people from the CBT group no longer had “severe” fatigue. That’s because a score of 35 was designated as the threshold between severe and less severe fatigue. While it is true that more in the CBT group met that threshold, the 3.5-point difference between the mean of 31.5 and the 35-point threshold would not be considered clinically significant. Nor would the difference between that threshold and the mean score of 39.9 for the CAU group

So to repeat the obvious: These apparent benefits from CBT are exceedingly modest no matter how they are parsed.

Two cogent published responses—here and here–are well worth reading.

A curious omission from the published report raised some eyebrows. The trial protocol listed the primary outcome, several secondary outcomes, and a category called “other study outcomes.” Among the latter was the sole objective measure: actigraphy. Actigraphy involves wearing devices that precisely measure physical movement over a period of time. In this case, participants wore these devices for 14 days at baseline and for 14 days at the end of therapy. According to the protocol, “The actigraph has been shown to be a reliable and valid instrument for the assessment of physical activity.”

The lack of any mention of these data in the article suggested that the results for that measure were likely poor. In the authors’ response to the published comments, they acknowledged this to be the case. As they wrote:

“Proposed alternative outcomes, like physical activity assessed with actigraphy or physical fitness are no[t] reliable markers of fatigue, and are also influenced by the perception of patients and subjectively experienced symptoms. Research showed that a substantial number of patients with severe fatigue do not have deviant physical activity levels. This was also found in our sample, i.e. 81% of participants had a fluctuating active activity pattern, and only 19% had a low active activity pattern. A reduction of fatigue will not necessarily lead to increased levels of objective physical activity or vice versa. Also, reduced fatigue levels do not necessarily concur with improved aerobic capacity. In our study there was no significant difference between the conditions in the increase in physical activity assessed with actigraphy.”

For good measure, one of the co-authors, Dr Chantal Rovers, added another excuse in a Twitter thread: “You always have more data than fits within the word limit of medical journals. Primary and secondary outcome measures are included. The plan was to publish the other results in a separate article, as is very common.”


Deconstructing bogus excuses for failing to provide objective data

Do any of these rationalizations hold water? No. Let’s look at each in turn

*Previous studies have found that actigraphy results do not correspond with patients’ self-reports about their fatigue levels. Therefore, they’re irrelevant.

It is true that previous studies from these and other authors have found that positive reports on subjective measures of fatigue are not matched by corresponding increases in physical activity, as measured objectively by actometers worn over a period of time. The apparent conclusion drawn by these researchers—that the actigraphy results can be dismissed as not related to fatigue—is ridiculous and clearly self-serving.

The Dutch investigators have perfected this strategy of including objective measures and then not reporting them until long after they have already received attention for their papers highlighting the positive findings on subjective measures. This happened in three trials in the 2000s of CBT for what they called CFS. Years later, the investigators published the null actigraphy findings from all three papers and concluded—conveniently—that reduction in fatigue was not mediated by increases in physical activity. They pulled the same stunt more recently with a study on Q-fever.

In other words, they enthusiastically accepted the accuracy of subjective reports of improvements and dismissed the significance of the indisputable fact that patients did not engage in more physical activity—even though they stated that its goal is to reduce the disability associated with the condition. This discrepancy has now morphed into the blunt assertion—the dogma, really–that objective measures of physical activity bear no relationship to the construct of fatigue. All that matters is the self-report—whether people actually do more is irrelevant as long as they say that they are less fatigued. It is hard to know how to respond to such an absurd, self-serving argument except to point out that it is absurd and self-serving.

*Most patients had fluctuating activity levels rather than continuously low activity levels, so the actigraphy readings couldn’t really show any improvement in these patients.

First, let’s note that—to justify leaving out key findings–they are offering information on levels of physical activity at baseline that they did not present in the study itself. We have no idea what they mean by “fluctuating activity levels”—presumably these were based on the baseline actigraphy readings—and we are asked to take their word based on data that have not been peer-reviewed and that we haven’t seen.

Beyond that methodological complication, the argument is odd: Unless people are active 24 hours a day, it is nonsensical to argue that they couldn’t do more and that actigraphy would not provide salient information. And this point also overlooks the obvious converse—that patients might do much worse in one or both study arms. If patients are at a high level of activity and suffer relapses either with or without CBT, the actigraphy would likely document that. These authors are so convinced by their own theorizing that they appear not to grasp that measures are designed to capture declines in health status as well as improvements.

*The actigraphy was not a primary or secondary outcome, so we didn’t need to report all the findings in this first paper.

Come on! Really?? This is perhaps my favorite bogus justification. The null results on this objective outcome inevitably raise questions about the validity and reliability of the subjective primary and secondary measures. Investigators have an obligation, enshrined in research ethics codes, to provide all salient data and not to hide information that would raise questions about or alter interpretations of their findings. It is impossible to argue with a straight face—although the Dutch team has tried; perhaps they lack a sense of humor and irony—that an objective measure of function like actigraphy is meaningless even when the results contradict subjective reports.

If the actigraphy results had been terrific—if they’d shown that patients increased their activity levels–does anyone seriously believe that the investigators would have withheld the data from the first study report on the grounds that it was not a primary or secondary outcome?

In any event, it was the investigators themselves who decided not to make their one objective measure a primary or secondary outcome, presumably because past experience demonstrated that the findings would probably contradict the subjective claims. So to cite this questionable decision as the reason to intentionally withhold these findings is disingenuous in the extreme. It takes an enormous amount of chutzpah—as in the classic tale of someone who kills their parents and then pleads for mercy as an orphan.

Moreover, if activity levels are “fluctuating” and therefore the actigraphy data collected for 12 days are therefore questionable, why should we pay attention to subjective reports of fatigue from eight items on a single questionnaire? I don’t get it. Under their argument about fluctuating levels, all the study data should be considered irrelevant and meaningless.

What about the argument that they just had too much data for one paper and so understandably had to set some aside for future publications? This position takes a self-evident fact of scientific research and twists it to justify not reporting important data. Investigators always want to squeeze more publications out of a single study—that’s fine. What’s not fine is choosing to leave out information that raises questions about your conclusion. The Dutch investigators don’t see it that way, of course. Since they believe the actigraphy findings have no value in assessing fatigue in the first place, they see no problems with waiting till some future date to publish them.

In short, these responses are not serious. They are self-serving deflections; whether or not the authors actually believe them, they have exposed themselves as unqualified and too ethically challenged to engage in any research at all. It is hard to grasp how legitimate investigators could engage in such specious reasoning.

Dr Daniel Griffin, an infectious disease specialist in the New York City area and a regular on the popular podcast This Week in Virology (hosted by Vincent Racaniello, a microbiology professor at Columbia who is also the host of Virology Blog) agreed that the decision to not report the objective findings from actigraphy cannot be justified. Here’s what he had to say:

“The criticism is warranted. My biggest thing is being open and honest and not ‘hiding’ or ignoring data that fails to support one’s agenda. As we are seeing, a person may report they feel improved. But when we see no increase in actual activity, that is important information to share.”

15 thoughts on “Trial By Error: Dutch Team Offers “Dog-Ate-My-Data” Excuses for Not Reporting Null Objective Findings”

  1. David – many thanks for explaining just how flawed this research is.

    Today, perhaps more than ever, we are reminded of how all the things that David is covering on ME, FND, Long Covid and MUS are all inextricably tied together by health politics – see this paper - that was published online yesterday. It just happens to be in the same journal that the letter in David’s last blog on FND was to. A coincidence? I strongly suspect not. We must ask ourselves – why are a number of the co-authors of the new paper FND ‘experts’/researchers who (as far as I know anyway) have little or nothing to do with ME? Are they all running scared now, or something? If so, hitting out at NICE strikes me as rather desperate.

  2. An excellent take down and I wholeheartedly agree with this as well:

    “Whether or not the authors actually believe them, they have exposed themselves as unqualified and too ethically challenged to engage in any research at all.”

  3. I feel angry that this data was not included. It is showing the impact that ME has in terms of fluctuating symptoms. Could deeper analysis have shown best baseline levels to help people manage activity to avoid crashes. Could they have shown impact and duration of pem? There is a whole treasure trove of information that is valuable insight into ME but it obviously shows the researchers do not care about patients they are only concerned with their product. I still don’t understand how so much research money is still being wasted.

  4. Patients have a right to be angry, I think. They deserve the best (and the whole truth) from medical science, not mediocrity and substandard tripe.

  5. This magnificent scientific work performed by Knoop et al, has been awarded a 308.000 euro grant by ZonMw. The Dutch organisation that awards grants for health and healthcare research.

    We asked the program managers for biomedical research ME/CFS and Covid-19 at ZonMw how it could have been possible that the study proposal has been judged to be highly relevant and of very good quality, while it is known that an open trial like this, in which the researchers and those studied know which treatment is used, with only subjective outcome measures is known for a very high riskof bias and that therefore objective outcomes must also be measured. Knowing what treatment is being given to a patient can influence their response to treatment.

    The same organisation awarded recently a 4,4 milion euro grant for biomedical research into ME/cfs to a professor in psychosomatics who considers cfs to be a functional syndrome in which several MUS come together.
    The so called patients in this ME/CFS Lines consortium have been found on the basis of questionnaires. Most of them were not diagnosed by an experienced and competent physician although that was a specific requirement.
    The prevalence of ME/cfs in ME/cfs Lines is 1,74% which is absurd.

  6. Thanks, Lou–as you know, that project is also on my radar. I will get to it soon.

  7. That Wainwright Woman

    One wonders why they even included actigraph measurements at all. Could it be that the IRB, or whatever pre-award entity, required such a measurement before they would approve this study?

  8. Mike Fraumeni

    Perhaps CBT will just fade away:
    “But for traditional CBT to survive all these new challenges, proponents must strive to produce better research, and this may require the modification of some of the approach’s central tenets. Ultimately, CBT will need to conform to this emerging science in order to retain its strong foothold, or the approach may be destined to fade the way of former giants such as psychoanalysis over the upcoming decades.”

  9. Mike Fraumeni

    A review of, from the above author, Farhad Dalal, “CBT: The Cognitive Behavioural Tsunami – Managerialism, Politics, and the Corruptions of Science”

    …”Much of this supported the Resilience Programme used for every soldier in the US Army with the aim of reducing the incidence of post-traumatic stress (PTSD) disorder after traumatic experience of the battlefield. By avoiding to take a look at the causes of depression, CBT is doing away with meaning. Issues like these became a part of what Dalal identifies as Psy-Wars. Much of this rather intellectual PsyWwar has to do with the fact that behaviourism believes in an animal-equals-human equation. In other words, the behaviourist . . . recognises no dividing line between man and brute. Essentially, we human beings are no different from an ordinary lab rat. In behaviourism, there is no room for psychology, only what is on the outside of the body and what is visible behaviour counts. Next came the cognitive challenge arguing that human beings are more than mechanical bodies . . . bolting cognition onto behaviourism. Still, the focus remained on external and measurable, faced with a massive influx of PTSD in the wake of the Vietnam War.” …

  10. In the department of similar self-serving junk treatments from Dutch scientists, read this paper. The entirety of gender affirming care is scientifically beyond flawed. It is, arguably, fraud. Cherry picked data (oops a kid died, let’s not mention it), misrepresentation, highly questionable measures, complete lack of controls, and complete inability to confirm results in follow-up studies.

    This medical malpractice is being enshrined in law and education as the only acceptable way for children to be treated—with parental consent specifically excluded. This campaign is implemented by well-meaning, duped schoolteachers who have NOT read the basis of what they are told.

    Garbage science to further garbage careers, raining hell down on innocent people. I ran across this paper and what is going on is sociopathic. This needs widespread publicity. How can it not be a major focus?

    I have to ask here. Is psychiatry broken? Has psychiatry become a haven for self-serving, self-aggrandizing, profiteering sociopaths? I am really asking this question. The evidence is beyond suggestive.

  11. Mike Fraumeni

    Enjoy this interview featuring Farhad Dalal. It would even better if David was part of this interview.

    “Political deception and the CBT tsunami” – Ivan Tyrrell and Farhad Dalal | Human Givens Podcast

Comments are closed.

Scroll to Top