By David Tuller, DrPH
Last year, King College London’s professor of cognitive behaviour therapy, Trudie Chalder, published another one of her extremely incompetent papers. This one is so statistically challenged as to be truly mind-boggling, even by Professor Chalder’s extremely low standards. A team of purportedly expert researchers has mangled descriptions of their own data so badly that the paper is rendered literally incomprehensible. When it was published, Mark Vink and Keith Geraghty tweeted about it, respectively, here and here.
My friend and colleague Brian Hughes, a psychology professor at the National University of Ireland, Galway, and I wrote a letter to the journal, Occupational Medicine, explaining the innumeracy of the paper. Given that it is such a godawful mess, we suggested that the paper should be retracted. This was an aspirational request. And of course, that hasn’t happened.
This week, the journal published both our letter and the authors’ response–which is as innumerate as the paper itself. They suggest the mistakes are not disqualifying but maintain that their language in places is simply “less precise than it could have been.” This explanation is ridiculous. The phrasing they have used throughout the paper is statistically wrong—not just “imprecise.” They did suggest they are “happy to correct” the purportedly “less precise” working. Gee, thanks, Trudie!
Brian has written a post about this bogus response on his blog, The Science Bit. So I’m not going to bother doing that. I’ll just post our letter again, along with their response, so that interested readers can see for themselves how these investigators have mangled things. These events also speak poorly for the journal’s peer-review processes, which failed spectacularly in this case, and the judgment of the editor, Steven Nimmo, a specialist in occupational medicine at the Plymouth Hospitals NHS Trust.
And policy-makers in the UK pay attention to these people? Really? Wow.
Our letter to Occupational Medicine
Occupational Medicine recently published a paper from Stevelink et al.  called ‘Chronic fatigue syndrome and occupational status: a retrospective longitudinal study’. Unfortunately, the paper features major technical and methodological errors that warrant urgent editorial attention.
To recap: The study started with 508 participants. The primary outcome was occupational status. Many participants had dropped out by follow-up—only 316, or 62%, provided follow-up data. Of those 316, 88% reported no change in employment status. As a group, the participants experienced either no changes or only insignificant ones in a range of secondary outcomes, including fatigue and physical function. The poor follow-up scores on fatigue and physical function alone indicate that the group remained, collectively, severely disabled after treatment.
In several sections of the paper, the authors’ description of their own statistical findings is incorrect. They make a recurring elementary error in their presentation of percentages. The authors repeatedly use the construction ‘X% of patients who did Y at baseline’ when they should have used the construction ‘X% of all 316 patients (i.e. those who provided follow-up data)’. This recurring error involving the core findings undermines the merit and integrity of the entire paper.
For example, in the Abstract, the authors state that ‘53% of patients who were working [at baseline] remained in employment [at follow-up]’. This is not accurate. Their own data (Table 2) show that 185 patients (i.e. 167 + 18) were working at baseline, and that 167 patients were working at both time points. In other words, the proportion working continuously was in fact 90% (i.e. 167 out of 185). The ‘53%’ that the authors refer to is the percentage of the sample who were employed at both time points (i.e. 167 out of 316), which is an entirely different subset. They have either misunderstood the percentage they were writing about, or they have misstated their own finding by linking it to the wrong percentage.
This error is carried over into the section on ‘Key Learning Lessons’, where the authors state that ‘Over half of the patients who were working at baseline were able to remain in work over the follow-up period…’ While 90% is certainly ‘over half’, it seems clear that this phrasing is again incorrectly referring to the 53% subset.
The same error is made with the other key findings. For example, the Abstract states that ‘Of the patients who were not working at baseline, 9% had returned to work at follow-up’. But as above, this is incorrect. A total of 131 patients (i.e. 104 + 27) were recorded as ‘not employed’ at baseline and 27 were recorded as not working at baseline but as working at follow-up. This is 21%, not 9%. Once again, the authors appear to misunderstand their own findings. The ‘9%’ they refer to is a percentage of the sample of 316; it is not, as they have it, a percentage of that subset of the sample who were initially unemployed. This erroneous ‘9%’ conclusion appears as well in the ‘Key Learning Lessons’ and in the Discussion.
And again, the authors state in the Abstract that ‘of those working at baseline, 6% were unable to continue to work at follow-up’, a claim they repeat in the section on ‘Key Learning Lessons’ and in the Discussion. This statement too is wrong. Once more, the authors mistakenly interpret a percentage of the sample of 316 as if it were a percentage of a targeted subset. In this case, they think they are referring to a percentage of patients working at baseline, but they are actually referring to a percentage of the full group that provided follow-up data.
The authors present the raw frequency data in Table 2. Readers can see for themselves how their sample of 316 patients is cross-tabulated into four subsets of interest (i.e. ‘working at baseline and follow-up’; ‘not working at baseline and follow-up’; ‘dropped out of work at follow-up’; ‘returned to work at follow-up’). From Table 2, it is clear that the prose provided in the body of the paper is at odds with the actual data.
It is undeniable that the text of this paper is replete with elementary technical errors, as described. Inevitably, the narrative is distorted by the authors’ failure to understand and correctly explain their own findings. It is unclear to us how these basic and self-evident errors were not picked up during peer review. Although we don’t know the identities of the peer reviewers, we speculate that groupthink and confirmation bias will have played their part. After all, it is generally reasonable for peer reviewers to presume that authors have understood their own computations.
There are several other features of this paper that cause concern. These include the following:
- The authors state that they evaluated participants using guidance from the UK’s National Institute for Health and Care Excellence (NICE). (Presumably they are referring to the 2007 NICE guidance, not the revision published in October 2021.) But the reference for this statement is a 1991 paper that outlines the so-called ‘Oxford criteria’, a case definition that differs significantly from the 2007 NICE guidance. Moreover, in a paper about the same participant cohort previously published by Occupational Medicine—‘Factors associated with work status in chronic fatigue syndrome’—the authors state explicitly that these patients were diagnosed using the Oxford criteria. This inconsistency is non-trivial, because the differences between these two diagnostic approaches have substantive implications for how the findings should be interpreted. The authors’ confusion over the matter is hard to comprehend and raises fundamental questions about the validity of their research.
- According to Table 1, there were either no changes or no meaningful changes in average scores for fatigue, physical function and multiple other secondary outcomes between the preliminary sample of 508 and the final follow-up sample of 316. The authors themselves acknowledge that the patients who dropped out before follow-up were likely to have had poorer health than those who remained. Therefore, the fact that Table 1 presents combined averages for the entire preliminary sample—i.e. combined averages for patients who dropped out and those who did not—muddies the waters. Presenting combined baseline scores for all patients will mask any declines that occurred for these variables in the subset who were followed up. It would have been far more appropriate to have isolated and presented the baseline data for the 316 followed up patients alone. Doing so would have reflected the authors’ research question more correctly, as well as enabling readers to make their own like-with-like comparisons.
- Finally, the authors state that ‘Studies into CFS have placed little emphasis on occupational outcomes, including return to work after illness’. However, they conspicuously fail to mention the PACE trial, a high-profile large-scale British study of interventions for CFS. The PACE trial included employment status as one of four objective outcomes, with the data showing that the interventions used—the same ones as in the Occupational Medicine study—have no effect on occupational outcomes. This previous finding is so salient to the present paper that it is especially curious the authors have chosen to omit it. The omission is all the more disquieting given that the corresponding author of the paper was a lead investigator on the PACE trial itself.
Authors of research papers have an obligation to cite seminal findings from prior studies that have direct implications for the target research question. Not doing so—especially where there is overlapping authorship—falls far short of the common standards expected in scientific reporting.
Even putting these additional matters aside, the technical errors that undermine this paper’s reporting of percentages render its key conclusions meaningless. The sentences used to describe the findings are simply incorrect, and the entire thrust of the paper’s narrative is thereby contaminated. We believe that allowing the authors to publish a correction to these sentences would create only further confusion.
We therefore call on the journal to retract the paper..
1.Stevelink SAM, Mark KM, Fear NT, Hotopf M, Chalder T. Chronic fatigue syndrome and occupational status: a retrospective longitudinal study. Occup Med (Lond). 2021. doi:10.1093/occmed/kqab170.
Chalder et al respond with further nonsensical ramblings
Thank you for asking us to respond to Professor Brian Hughes and Dr David Tuller’ comments on our paper Chronic fatigue syndrome and occupational status: a retrospective cohort study. We have carefully considered their comments and checked our analyses. Whilst we accept there is a need for some clarification on certain points, we stand by our data and the paper’s key findings.
Comment: Incorrect description of statistical findings, using ‘X% of patients who did Y at baseline’ when they should have used the construction ‘X% of all 316 patients (i.e. those who provided follow-up data)’.
Response: We made clear at the start of the results section that we only included the 316 patients in our analyses for whom we have baseline and follow-up data on our outcome of interest, namely employment status. We also stated this clearly in the methods section of the abstract.
We also reiterate in paragraph 2 of the results that ‘119/316 (38%) reported that they were on sick leave from their job at baseline’.
We therefore disagree with the criticism.
Comment: Incorrect description of statistical findings. For example, in the Abstract, the authors state that ‘53% of patients who were working [at baseline] remained in employment [at follow-up]’. This is not accurate. Their own data (Table 2) show that 185 patients (i.e. 167 + 18) were working at baseline, and that 167 patients were working at both time points. In other words, the proportion working continuously was in fact 90% (i.e. 167 out of 185). The ‘53%’ that the authors refer to is the percentage of the sample who were employed at both time points (i.e. 167 out of 316), which is an entirely different subset.
Response: We reported on the proportion of patients affected by CFS in relation to our main outcome of interest changes in work status over the course of follow-up. These proportions were calculated based on the overall sample and shown in Table 2. However, we agree that the wording used in the abstract was less precise than it could have been. The 53% was derived from the overall sample when looking across all the categories of our main outcome of interest (remained in employment when considering the total sample) but as pointed out, it would have been better to amend the wording or alternatively use all patients who were working at baseline (n = 185) as the denominator. We are happy to correct this.
Comment: The authors indicate that the patients fulfilled the NICE criteria for CFS, whereas in their previous paper, also published in Occupational Medicine, it was suggested they fulfilled the Oxford criteria. This inconsistency is non-trivial, because the differences between these two diagnostic approaches have substantive implications for how the findings should be interpreted.
Response: We apologise for this error. We should have said all patients fulfilled NICE criteria for CFS. A proportion also fulfilled Oxford criteria. We would be happy to write a correction in the manuscript.
Comment: According to Table 1, there were either no changes or no meaningful changes in average scores for fatigue, physical function and multiple other secondary outcomes between the preliminary sample of 508 and the final follow-up sample of 316. The authors themselves acknowledge that the patients who dropped out before follow-up were likely to have had poorer health than those who remained. Therefore, the fact that Table 1 presents combined averages for the entire preliminary sample—i.e. combined averages for patients who dropped out and those who did not—muddies the waters. Presenting combined baseline scores for all patients will mask any declines that occurred for these variables in the subset who were followed up.
Response: The aim of this paper was to explore changes in work status from baseline to follow-up among the 316 patients we had employment status for on both occasions. Table 1 describes the baseline outcomes for those in the entire baseline sample compared with the subgroup who were successfully followed up. The data in Table 1 shows that for all data collected at baseline, data were comparable, except for occupational status, which we discussed in the results. We also commented on this in the discussion and added a warning about the interpretation of the results. There is nothing incorrect in our presentation of results in Table 1.
Another way to present these results is to compare the baseline scores for those who had baseline only data versus those who had baseline and follow-up data (see Table 1, available as Supplementary data at Occupational Medicine Online). These findings suggest that there were no differences between these groups with regards to gender, age, marital status and only a borderline significant difference in education (P = 0.043). Most importantly, no difference was found between the two groups regarding whether participants reported their work to be physically demanding, emotionally demanding, depression/anxiety symptoms, work and social adjustment scale scores, fatigue severity or any of their responses on the cognitive and behavioural responses questionnaire, except for catastrophizing thoughts. However, patients were more likely lost to follow-up if they reported poorer physical functioning at baseline and were by affected by CFS for a longer duration (average 5.8 years in the lost to follow-up group, versus 4.8 years in the group who provided data on both occasions).
The results provided in Table 2 address the aim of the current paper and the insights derived from Table 1 (available as Supplementary data at Occupational Medicine Online) do not change the interpretation of our findings.
Comment: The authors state that ‘Studies into CFS have placed little emphasis on occupational outcomes, including return to work after illness’. However, they conspicuously fail to mention the PACE trial, a high-profile large-scale British study of interventions for CFS. The PACE trial included employment status as one of four objective outcomes, with the data showing that the interventions used—the same ones as in the Occupational Medicine study—have no effect on occupational outcomes. The omission is all the more disquieting given that the corresponding author of the paper was a lead investigator on the PACE trial itself.
Response: The aim of the current study was to explore factors associated with changes in work status over time among patients affected by CFS and did not look at the effectiveness of treatment. Treatment was not specifically targeting work outcomes and effectiveness was out of scope for the current paper. We did not reference the paper previously as we did not think it was appropriate.
We noted that the authors suggested the article should be retracted. We stand by the results described in our paper; however, we would like the opportunity to correct the issue of definition of cases and are happy to add whatever clarifications on the other points that the editors see fit.
8 thoughts on “Trial By Error: An Innumerate Response from Chalder to Hughes-Tuller Comments on Bogus Data Analysis”
I think that you need to be emotionally resilient to read anything on cfs by Chalder. I keep feeling libelled by her and chums.
I’m not really surprised that a retraction hasn’t been forthcoming. The Editor’s own 2015 piece on MUS (including CFS) in the same journal looks like it needs correcting to me -https://academic.oup.com/occmed/article/65/2/92/1489477 .
CT: may I ask what needs correcting in the Editor’s paper on MUS?
Lady Shambles commented:
CT: may I ask what needs correcting in the Editor’s paper on MUS?
Yes, you may, Lady Shambles!
In the first sentence of the 3rd paragraph of his article -https://academic.oup.com/occmed/article/65/2/92/1489477 Nimmo provided data that, if I’m not mistaken, comes from this Nimnuan et al 2001 paper -https://www.sciencedirect.com/science/article/abs/pii/S0022399901002239?via%3Dihub. He didn’t reference this paper though, instead he referenced a ‘personal communication’ for the data, so I clearly can’t be 100% sure. (It would be some coincidence though if it was a different study, because the data matches exactly). He appears to indicate that the follow-up period of the patients that the data related to was 12 months but, from memory, the follow-up period for the Nimnuan et al study was only 3 months, a lot less time for the investigation that he suggests was carried out. That might sound like a minor issue, but if you read the sentence then I think you may agree with me that it isn’t. My interpretation of what he wrote was that the patients had likely been thoroughly investigated over a considerable period of time – a year. He didn’t mention that the study (if it was indeed the Nimnuan et al study he was referring to) had limitations, including the quite short follow-up time and the sample size. Also, the misdiagnosis rates in the Nimnuan et al study were very high, but he makes no mention of that. (He may not have been aware of this if he hadn’t read widely on MUS and come across this paper based on the same study -https://academic.oup.com/qjmed/article/93/1/21/1588375?login=false. ) There are other issues with his article too, but the follow-up time period is the one that I think needs correcting (if it is wrong).
Thanks CT. It does sound like that needs correcting. I’d love to know what the other issues about that paper are too though.
“[We] did not look at the effectiveness of treatment”
A stunning admission that their work is just marketing and effectiveness doesn’t matter. No one is supposed to actually *read* the research paper, or even the abstract. We are only supposed to read the headline of the university press release and marvel at the brilliance of the investigators.
Scientific journals are just like TV and radio. The content is merely filler between the ads. If the content is sometimes accurate and useful, that is a fortunate side effect.
Lady Shambles commented:
“Thanks CT. It does sound like that needs correcting. I’d love to know what the other issues about that paper are too though.”
Well here’s one thing about Nimmo’s MUS paper -https://academic.oup.com/occmed/article/65/2/92/1489477 that struck me.
The closing reference (6) of paragraph 3 of his paper on MUS is a study -https://pubmed.ncbi.nlm.nih.gov/3942467/ of 41 patients who were diagnosed with “somatization disorder” using specific criteria. Assuming that these patients were correctly diagnosed, I can’t see how they can represent patients with medically unexplained symptoms as Nimmo defined them in his opening paragraph, but they may possibly represent a very small subset of them. The Bermingham et al paper -https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2939455/, (the findings of which Nimmo discussed in paragraph 1), distinguished between patients with somatization disorder (let’s call that SD) and those with “subthreshold” somatization (let’s call that SS). From that paper, subthreshold patients made up a much larger group than those with the more severe somatization disorder– in primary care 23.4% with SS versus 1.2% with SD; in outpatient settings 16% with SS versus 1% with SD; in A and E 3.8% with SS versus 0.4% with SD. So for Nimmo to be consistent I think he should have warned his readers that the patients being discussed in the paper he referenced (his reference 6) were at the extreme end of the “somatization spectrum” (assuming that such a thing exists!) and not representative of MUS as he was defining it. But there is no such warning, rather he appears to have written about the findings of that study without qualification, as if they applied to all MUS patients as defined by him.
To qualify my previous comment, I suppose it’s possible that Nimmo wasn’t referring to the findings of the study described in that paper (his reference 6) about patients with somatization disorder but perhaps to other statement/s made in that paper about patients with a broader definition of MUS that did fit with his definition. I’m unable to access the full paper to check out if there were such statements in it that match with what he wrote, but I would hope that primary sources of information would be used in any authoritative article on MUS, (as I had assumed), not secondary ones.
Comments are closed.