Adventures in correcting the (semi-)scientific record

Photo shared by Michiel1972 via Wikimedia Commons

Photo shared by Michiel1972 via Wikimedia Commons

One of the blogs I follow is Retraction Watch, which documents the world of quality control in scientific research – pre-publication peer review (and its abuses); post-publication peer review in fora such as research blogs; retractions and corrections by journals; and plagiarism and fraud. The large majority of cases they report on are drawn from the “hard sciences”. From time to time, a case pops up from the humanities as well, and it’s not outrageous to ask – who cares anyway? Well, I do.

I’m one of those humanistic researchers who likes to imagine that I do something resembling science. One of the most frustrating things about humanistic research that can’t stand up to scrutiny is the feeling that it doesn’t matter, anyway; nobody cares about this stuff but us. Does that make me some starry-eyed idealist? No, I just don’t like sloppy work. And when I see it, it makes me look bad too, a humanistic guilt by association. Several of the posts on this blog can be seen as post-publication peer review, and during the past year I had my own experience with attempting to correct the (semi-)scientific record.

Last year I read an article by Prof. Hilary Nesi in the Journal of English for Academic Purposes (JEAP) entitled Laughter in university lectures. It contained an obvious error in the word count of the Corpus of British Academic Spoken English (BASE), which resulted in erroneous claims about the frequency of laughter in this linguistic database. The natural response, again, might be who cares?. Several people should care, because the author, two peer reviewers, and the journal editors apparently didn’t look very carefully at the figures reported in two of the tables in the paper. I decided to start with the author.

The search begins: looking for someone who cares

On April 1, 2013, I sent the following email to the correspondence address of the article:

Dear Prof. Nesi,

I’ve been reading your article entitled “Laughter in university lectures”, and I’m hoping you can help me to understand Tables 2 & 3 (page 83 of JEAP 11). The total words for the 160 lectures are listed as 2,646,920, when there are just over 1.6 million words in BASE. I thought you might have counted XML tags as words, but I can’t reproduce this figure in WordSmith or AntConc.

I would appreciate your help in understanding this, as it also affects the findings on standardised frequencies of laughter in the lectures.

Best regards,
Ray Carey

If anyone could answer this question, it would be Nesi, as she is both the sole author of the study as well as one of the compilers of the BASE corpus. After waiting three months for a reply and receiving none, I decided to reproduce her study. The BASE corpus is freely available to download, so I obtained the XML version of the corpus and got to work. As it became evident that Nesi would not reply, I published a research blog in July 2013 entitled “Laughter in academic talk: Brits, Yanks & ELF compared“.

At this point I considered contacting Nesi again to inform her of the blog post, but it already seemed apparent that I would have to keep looking for someone who cares. The Journal of English for Academic Purposes (JEAP) is the most prestigious journal of this field, published by Elsevier with a Source Normalized Impact per Paper (SNIP) value of 1.880. But before contacting the journal, I had to consider the possibility that this was a bad idea. Nesi sits on the editorial board of the journal, and one of the co-editors of the journal, Prof. Paul Thompson, is her close colleague, having worked with Nesi on compiling the BASE corpus. When I contacted Thompson a week after publishing the blog, I assumed it would likely result in my never getting published in JEAP.

A pleasant surprise: getting published in JEAP

I received a prompt and cordial reply from Thompson, who, to my surprise, invited me to submit a reader response to JEAP based on the contents of the blog post. He was an absolute professional throughout the process, and he was friendly to me when I met him at the ICAME 35 conference in May. After I submitted the reader response and discovered a problem in the Python code I wrote for the blog and response, he was kind enough to let me withdraw the paper and resubmit after I had researched and resolved the issue.

My submission, “A closer look at laughter in academic talk: A reader response“, was published in vol. 14 of JEAP in June 2014. As Thompson had expressed his wish to resolve the issue in a public forum, he also invited Nesi to respond. Her two-paragraph forum response appeared in the following issue of JEAP (vol. 15, Sept. 2014). At last, an explanation of what went wrong – or not. Though she opens with “I am very grateful to Ray Carey (Carey, 2014) for correcting my calculations for the distribution of laughter in the BASE corpus” (Nesi 2014: 48), there is no attempt to clarify the original figures or the origin of the errors, but instead a defense of the integrity of the original article:

Fortunately, my study centred on the reasons why lecturers provoke laughter, rather than on the variation between British and American lecturing styles; the quantitative differences in laughter distribution were only reported briefly and do not feature in the abstract or the highlight list.

(Nesi 2014: 48)

However, the quantitative claim I’ve shown to be overstated does still appear in the highlight list (“Laughter is more frequent in BASE [British] lectures than in MICASE [American] lectures”), and the “brief” report of the quantitative findings takes up roughly one full page of the paper, not including the two tables already mentioned. Moreover, unjustified interpretive claims were made on the basis of the miscalculations, and these will remain in the paper. I doubt this response would be sufficient in the sciences, but hey – we’re not really scientists, anyway.

Lastly, my reproduction of Nesi’s study revealed well-formedness problems in a file in the XML version of the BASE corpus, which required manual correction. I pointed this out in the reader response as well, and in her reply Nesi states that “I understand that the problematic BASE seminar file Carey identified has now been replaced in the Oxford Text Archive by a validated version” (2014: 48). But after obtaining the BASE corpus from the Oxford Text Archive on Nov. 24, 2014, I found the same file with the same problems (sssem006.xml), and the corpus still cannot be parsed as well-formed XML without making manual corrections to this and one other file (lssem006.xml, with character encoding problems).

Open science: yours for the low, low price of $94.50

Click on the image to jump to my reader response (behind paywall). It presents similar content to that included in the original blog post, but stripped of all humor, sarcasm, and other tell-tale signs of human authorship.

Click on the image to jump to my reader response (behind paywall). It presents similar content to that included in the original blog post, but stripped of all humor, sarcasm, and other tell-tale signs of human authorship.

So now that all this has been sorted out in a public forum, the original article and its unjustified claims remain on record for all to read and cite. At the time of writing, Nesi (2012) has been cited 16 times according to Google Scholar. She is a big name in English for Academic Purposes (EAP) circles, so this number will certainly grow. I am not a big name in anyone’s circles, and since nothing in the original article or its Elsevier webpage refers or links to my reader response, it will likely languish with me in obscurity, tucked into the last pages of the journal.

The ironic thing about Prof. Thompson’s desire for discussion in a public forum is that this blog is a more public forum than JEAP. Anyone can read and comment here, and the Python code I used in my reproduction of Nesi’s study is available as well. It’s true that publication in JEAP will bring my critique to the “right” people, who might even care. According to Elsevier’s Article Usage Dashboard, my published reader response has been viewed 279 times, more pageviews than the original response has received on this humble blog. So in that respect, it was a modest success.

Yet, the public forum of JEAP is not quite public, unless you belong to an educational institution that subscribes to Elsevier’s journals. If not, then these articles – the original uncorrected article, my reader response, and Nesi’s two-paragraph reply – will each cost you $31.50 for a decidedly non-public cost of $94.50. This is for a discussion of how often people laugh in academic speech events. To be honest, it’s all a bit embarrassing, and perhaps best kept behind a paywall. In conclusion, here’s my version of the famous “If a tree falls in the forest” riddle: If humanistic research was a joke, and nobody cared, would it even be funny?

References

ResearchBlogging.org

Carey, R. (2014) A closer look at laughter in academic talk: A reader response. Journal of English for Academic Purposes, 14, 118-123. DOI: 10.1016/j.jeap.2014.03.001.

Nesi, H. (2012) Laughter in university lectures. Journal of English for Academic Purposes, 11 (2), 79-89. DOI: 10.1016/j.jeap.2011.12.003.

Nesi, H. (2014) A closer look at laughter in academic talk: A response to Carey (2014). Journal of English for Academic Purposes, 15, 48-49. DOI: 10.1016/j.jeap.2014.07.001.

Tagged ,

5 thoughts on “Adventures in correcting the (semi-)scientific record

  1. geoffjordan says:

    All those concerned with doing good research owe you our thanks for your on-going attempts to expose this example of the arrogance of power. Well done.

  2. kentclizbe says:

    Keep at it. Your efforts do matter.

  3. T K says:

    Great read, keep it up!

  4. Alex says:

    Enjoyed reading your blog, I had read both the original article and your query plus the response and must admit I was slightly bemused by the side-stepping. It would have been much simpler to just admit an error surely? Certainly, as a result, it made me somewhat more suspicious of the original article (which is a shame perhaps?)

    • Ray Carey says:

      Hi Alex, I don’t think retractions/corrections are very common in the humanities, so I think this was handled as ideally as possible — the problems get addressed in a public forum and everyone gets to save as much face as possible. It has lowered my view of my own field, which for me is the real shame. But hey, I got another line for the list of publications and the game continues.

Leave a comment