On the other side: variations in organising chunks in ELF

Variations in organising chunks aren’t that common, but they do tend to stand out.
When working with ELF data – English used as a lingua franca between second/foreign-language speakers – one of the things that stands out are slight variations in conventional chunks of language. A formulaic chunk like as a matter of fact might be realised as as the matter of fact, or you could hear now that you mention it spoken as now that you say it. There’s no sense in calling them errors, since the variants won’t cause miscommunication, they resemble their conventional counterparts in both function and form, and the less-preferred variant is likely found elsewhere. It’s just not the English native-speaker preference.

These variations are interesting linguistically and they tend to stand out impressionistically for researchers, but I’ve wondered how often these variations actually occur in ELF – both in frequency and also in their distribution relative to conventional forms. It’s not an easy question to answer. Many of these formulaic chunks of language occur infrequently, so finding a couple variants doesn’t really tell you much. The example above of now that you say it occurs twice in the million-word ELFA corpus, with just one instance of the conventional form. Alternatively, as the matter of fact is found in ELFA 21 times compared to just eight occurrences of the expected chunk, but only two speakers account for those 21 instances.

We can see from these examples that a formulaic chunk that rarely shows up won’t reveal much about how often variation occurs among ELF users, across speech events, in different times and places. To find out more, I wanted to start with the highest frequency chunks I could find. These are described by Linear Unit Grammar as organising chunks, the recurring and relatively fixed chunks we use to structure our speech and writing, like on the other hand. Using the corpus freeware AntConc, I looked at the most frequent 3-, 4- and 5-word clusters (aka n-grams) in the ELFA corpus of spoken academic ELF. Keep reading…

Laughter in academic talk: Brits, Yanks & ELF compared

Click to jump to the original article (behind paywall): Nesi, Hilary (2012) Laughter in university lectures. Journal of English for Academic Purposes, 11(2). 79-89.

Update 30.12.2013: this updated post reflects improvements to the Python scripts used to generate the token counts. Links to the improved scripts are available in the footnotes. Minor changes to the token counts and frequencies have been made in the tables and text, but the main content of the post remains unchanged.

When I was earlier blogging on the frequencies of laughter in academic ELF (English as a lingua franca), I came across an article by Prof. Hilary Nesi, a compiler of the BASE corpus – the Corpus of British Academic Spoken English. She provides a qualitative analysis of the types and functions of laughter episodes in lectures from the BASE corpus and she concludes with the uncontroversial advice that British lecturers might want to adjust their use of humor when lecturing for an international audience.

I’ve waited until now to blog on Nesi’s article, since it contains obvious statistical errors that I wanted to research further. When I say obvious, I mean obvious – she cites the word count of the BASE lecture subcorpus as 2,646,920 words, when the official count of the entire corpus is only 1,644,942 words (cited in the same article). Nesi uses this oddly inflated word count to compute the standardised frequencies of laughter in lectures, which are therefore artificially low. Being naturally curious, I emailed Prof. Nesi in April to ask if she could clarify the situation, and naturally I received no reply.

To be fair, everyone makes mistakes and the quantitative findings don’t really affect her qualitative analysis. But this was published in a major peer-reviewed journal, the Journal of English for Academic Purposes. When a statistical error this basic can get past a senior researcher, two peer reviewers, and an editorial staff, it gives this junior researcher a fairly discouraging picture of academic rigor in the humanities. I might just be the first person on earth to look carefully at Nesi’s tables.

When in doubt, do it yourself

The thing that makes corpus research almost seem like real science is reproducibility – like with real experimental results, another researcher can take a linguistic corpus and try to reproduce a study’s findings. So, I downloaded the BASE corpus in XML format and set out to reproduce Nesi’s figures. She also uses the XML version of BASE, but only to search for laughter tags using the WordSmith Tools application. My first theory was that she had generated a word count for the lectures without excluding the XML markup, but even this approach didn’t reach her inflated word count.



And so on, or something like that: vague expressions in academic ELF

Another lakeside view of our heavenly Finnish summer.© Nina Valtavirta

Another lakeside view of our heavenly Finnish summer.
An important part of academic argumentation is not what you say, but how you say it. It’s one thing to make a bold claim, and another to “soften” it by adding expressions like or something like that, more or less, or in a way. These recurring chunks aren’t merely filler – they convey important interactive information. Vague expressions, or VEs, “express the speaker’s uncertainty or personal attitude towards the proposition and indicate for example solidarity” (Metsä-Ketelä 2012: 264).

Earlier research has expressed concern about non-native speakers’ learning and use of vague expressions, with the danger of sounding “blunt” or “pedantic” if these VEs are underused. In a recent paper by ELFA project member Maria Metsä-Ketelä, these concerns were investigated in the ELFA corpus of spoken academic ELF (English as a lingua franca). How are these vague chunks employed by second-language users in interaction with each other, and how do these findings compare to similar native-speaker data?

OI chunks: organising interaction

You’ll notice that vague expressions like and so on and in a sense function as units – they’re fixed chunks of language that typically don’t vary in form. From a Linear Unit Grammar (LUG) point of view, these are OI chunks (Organising Interaction) which can be used by a speaker to qualify her stance on the main content of an utterance. As Maria points out, the vague expressions in her study serve to intentionally add imprecision. They also have two other important traits:

  • VEs do not contribute to the propositional content of an utterance, or the message itself (the M chunks in LUG)
  • VEs “are supplementary, that is, they could be omitted from the utterance without compromising its syntactic structure” (Metsä-Ketelä 2012: 265).



Getting serious about laughter in academic talk

Academic discourse is serious business. Lectures are delivered, conference presentations are discussed, great thoughts hang in the air like disembodied spirits. It’s not the kind of environment you’d expect to find a lot of laughter and joking. And yet, we academics can’t seem to stop laughing.

The frozen Baltic

The Baltic Sea is still frozen in February. We’re anxiously awaiting the sun.
© Nina Valtavirta

The ELFA project had our February meeting on Thu., 21.2, and MA student Jani Ahtiainen gave a talk on laughter in spoken academic discourse. He’s doing his master’s research on terms of address in the ELFA corpus, an area often connected to culture-specific norms and expectations. Likewise, the occurrence of humor and laughter might be influenced by culture as well.

Jani based his discussion on a 2006 article by David Lee that looked at occurrences of laughter in MICASE (Michigan Corpus of Academic Spoken English). The idea behind the article is that foreign students must struggle with the profound subtlety of American humor, so we should study laughter in MICASE to help these hapless foreigners cope. These are quite different research motivations than we have in the ELF field, but the question of laughter in academic ELF is still relevant.



