Tag Archives: ELFA corpus

Language users or learners? Lexical evidence from spoken ELF

Click the image to jump to the article (behind paywall):Kao, S. & Wang, W. (2014) Lexical and organizational features in novice and experienced ELF presentations. Journal of English as a Lingua Franca, 3(1), 49-79. DOI: 10.1515/jelf-2014-0003.

Click the image to jump to the article (behind paywall):
Kao, S. & Wang, W. (2014) Lexical and organizational features in novice and experienced ELF presentations. Journal of English as a Lingua Franca, 3(1), 49-79. DOI: 10.1515/jelf-2014-0003.

One of the key distinctions made in research on English as a lingua franca (ELF) is the difference between language users and learners. ELF data is typically approached from the viewpoint of second language use instead of second language acquisition. Rather than seeing non-native English speakers as perennially deficient pursuers of “native-like” proficiency, ELF researchers start from the position that non-native English is principally English in use – English serves as a vehicular language for doing stuff, and especially for professional life in international domains like academia.

These issues are explored in a study in a recent issue of the Journal of English as a Lingua Franca. Shin-Mei Kao and Wen-Chun Wang take up the user/learner distinction by investigating “lexical and organizational patterns in the presentations made by speakers of different ELF proficiency and experience levels” (Kao & Wang 2014: 54). To do this, they perform a lexical analysis of academic presentations from three different groups – novice students who can be considered as language learners, academic experts using English as a lingua franca, and academic experts who are also English language specialists.

The three datasets are from the following sources:

  • novices/language learners – 43 student presentations in an English for Academic Purposes (EAP) course held at National Cheng Kung University, Taiwan. Students are a mixture of Taiwanese and international students, with most students from the field of engineering. Presentations ranged between 2-5 minutes each, with an average of 360 words.
  • ELFA corpus – 30 conference presentations from the Corpus of English as a Lingua Franca in Academic Settings (ELFA). These academic experts consist of 49 presenters (mostly between the ages of 31-50) with 20 different first-language backgrounds (and no native English speakers). Each presentation on average lasts 21 minutes with 2568 words.
  • John Swales Conference Corpus – 23 conference presentations from the JSCC, recorded at a conference in Michigan celebrating Swales’ retirement. The 28 presenters are all academic experts and English-language specialists from 13 different first-language backgrounds, including an unknown number of English native speakers. The presentations average 3007 words, and only monologues are included to match the ELFA data.

Keep reading…


What do we mean by “I mean”?

Click image to jump to Fernández-Polo, F. J. (2014) The role of I mean in conference presentations by ELF speakers. English for Specific Purposes 34, 58-67. (behind paywall)

Click image to jump to Fernández-Polo, F. J. (2014) The role of I mean in conference presentations by ELF speakers. English for Specific Purposes 34, 58-67. (behind paywall)

When analysing spoken English, it doesn’t take long to encounter discourse markers, the single words or phrases that speakers commonly use to mark their stance or organise their message. Common discourse markers include well, now, you know and i mean. In the April 2014 issue of English for Specific Purposes, Francisco Javier Fernández-Polo examines the discourse marker i mean in conference presentations included in the ELFA corpus. This subcorpus includes 34 conference presentations in English by speakers of 21 different first languages. Recorded at universities in Finland, the data consist of naturally occurring English used as a lingua franca (ELF) in academic settings.

Fernández-Polo’s study is qualitative, involving a close analysis of a small number of cases toward determining the functions of i mean in context. There are only 56 occurrences of i mean in this conference presentation subcorpus (94,314 words1), and Fernández-Polo takes 48 of them into his analysis. He classifies these into four different categories – correcting mistakes and dysfluencies; enhancing clarity and explicitness; organising text; and marking certainty and salience (see Table 1 below). Examples of each are discussed in turn.

A striking finding from the paper concerns the wide inter-speaker variation in the use of i mean. Fewer than half of the 34 presenters use i mean at least once, with a single speaker producing 20% of the occurrences, and five speakers contributing two thirds of all hits. To see if a different distribution might be found in similar English as a native language (ENL) data, Fernández-Polo consulted the monologic lectures in the American MICASE corpus. He found that i mean occurs in the MICASE lectures with the same standardised frequency (5 per 10,000 words) and with similar inter-speaker variation – one speaker in MICASE produced 27% of occurrences, with 14 speakers producing 60% of hits. It thus appears that the choice of discourse markers varies a lot based on a speaker’s preference or habit. Keep reading…


On the other side: variations in organising chunks in ELF

Variations in organising chunks aren't that common, but they do tend to stand out.Source: Livio Bourbon via The Telegraph

Variations in organising chunks aren’t that common, but they do tend to stand out.
Source: Livio Bourbon via The Telegraph

When working with ELF data – English used as a lingua franca between second/foreign-language speakers – one of the things that stands out are slight variations in conventional chunks of language. A formulaic chunk like as a matter of fact might be realised as as the matter of fact, or you could hear now that you mention it spoken as now that you say it. There’s no sense in calling them errors, since the variants won’t cause miscommunication, they resemble their conventional counterparts in both function and form, and the less-preferred variant is likely found elsewhere. It’s just not the English native-speaker preference.

These variations are interesting linguistically and they tend to stand out impressionistically for researchers, but I’ve wondered how often these variations actually occur in ELF – both in frequency and also in their distribution relative to conventional forms. It’s not an easy question to answer. Many of these formulaic chunks of language occur infrequently, so finding a couple variants doesn’t really tell you much. The example above of now that you say it occurs twice in the million-word ELFA corpus, with just one instance of the conventional form. Alternatively, as the matter of fact is found in ELFA 21 times compared to just eight occurrences of the expected chunk, but only two speakers account for those 21 instances.

We can see from these examples that a formulaic chunk that rarely shows up won’t reveal much about how often variation occurs among ELF users, across speech events, in different times and places. To find out more, I wanted to start with the highest frequency chunks I could find. These are described by Linear Unit Grammar as organising chunks, the recurring and relatively fixed chunks we use to structure our speech and writing, like on the other hand. Using the corpus freeware AntConc, I looked at the most frequent 3-, 4- and 5-word clusters (aka n-grams) in the ELFA corpus of spoken academic ELF. Keep reading…

Tagged , , ,

Laughter in academic talk: Brits, Yanks & ELF compared

Click to jump to the original article (behind paywall): Nesi, Hilary (2012) Laughter in university lectures. Journal of English for Academic Purposes, 11(2). 79-89.

Click to jump to the original article (behind paywall): Nesi, Hilary (2012) Laughter in university lectures. Journal of English for Academic Purposes, 11(2). 79-89.

Update 30.12.2013: this updated post reflects improvements to the Python scripts used to generate the token counts. Links to the improved scripts are available in the footnotes. Minor changes to the token counts and frequencies have been made in the tables and text, but the main content of the post remains unchanged.

When I was earlier blogging on the frequencies of laughter in academic ELF (English as a lingua franca), I came across an article by Prof. Hilary Nesi, a compiler of the BASE corpus – the Corpus of British Academic Spoken English. She provides a qualitative analysis of the types and functions of laughter episodes in lectures from the BASE corpus and she concludes with the uncontroversial advice that British lecturers might want to adjust their use of humor when lecturing for an international audience.

I’ve waited until now to blog on Nesi’s article, since it contains obvious statistical errors that I wanted to research further. When I say obvious, I mean obvious – she cites the word count of the BASE lecture subcorpus as 2,646,920 words, when the official count of the entire corpus is only 1,644,942 words (cited in the same article). Nesi uses this oddly inflated word count to compute the standardised frequencies of laughter in lectures, which are therefore artificially low. Being naturally curious, I emailed Prof. Nesi in April to ask if she could clarify the situation, and naturally I received no reply.

To be fair, everyone makes mistakes and the quantitative findings don’t really affect her qualitative analysis. But this was published in a major peer-reviewed journal, the Journal of English for Academic Purposes. When a statistical error this basic can get past a senior researcher, two peer reviewers, and an editorial staff, it gives this junior researcher a fairly discouraging picture of academic rigor in the humanities. I might just be the first person on earth to look carefully at Nesi’s tables.

When in doubt, do it yourself

The thing that makes corpus research almost seem like real science is reproducibility – like with real experimental results, another researcher can take a linguistic corpus and try to reproduce a study’s findings. So, I downloaded the BASE corpus in XML format and set out to reproduce Nesi’s figures. She also uses the XML version of BASE, but only to search for laughter tags using the WordSmith Tools application. My first theory was that she had generated a word count for the lectures without excluding the XML markup, but even this approach didn’t reach her inflated word count.

Keep reading…

Tagged , , ,

And so on, or something like that: vague expressions in academic ELF

Another lakeside view of our heavenly Finnish summer.© Nina Valtavirta

Another lakeside view of our heavenly Finnish summer.
© Nina Valtavirta

An important part of academic argumentation is not what you say, but how you say it. It’s one thing to make a bold claim, and another to “soften” it by adding expressions like or something like that, more or less, or in a way. These recurring chunks aren’t merely filler – they convey important interactive information. Vague expressions, or VEs, “express the speaker’s uncertainty or personal attitude towards the proposition and indicate for example solidarity” (Metsä-Ketelä 2012: 264).

Earlier research has expressed concern about non-native speakers’ learning and use of vague expressions, with the danger of sounding “blunt” or “pedantic” if these VEs are underused. In a recent paper by ELFA project member Maria Metsä-Ketelä, these concerns were investigated in the ELFA corpus of spoken academic ELF (English as a lingua franca). How are these vague chunks employed by second-language users in interaction with each other, and how do these findings compare to similar native-speaker data?

OI chunks: organising interaction

You’ll notice that vague expressions like and so on and in a sense function as units – they’re fixed chunks of language that typically don’t vary in form. From a Linear Unit Grammar (LUG) point of view, these are OI chunks (Organising Interaction) which can be used by a speaker to qualify her stance on the main content of an utterance. As Maria points out, the vague expressions in her study serve to intentionally add imprecision. They also have two other important traits:

  • VEs do not contribute to the propositional content of an utterance, or the message itself (the M chunks in LUG)
  • VEs “are supplementary, that is, they could be omitted from the utterance without compromising its syntactic structure” (Metsä-Ketelä 2012: 265).

Keep reading…

Tagged ,

Creativity and color in academic ELF

If you believe the myth that ELF is "colorless" English, then Spock from Star Trek should be the prototypical ELF user.Screenshot borrowed from Wikipedia

If you believe the myth that ELF is “colorless” English, then Spock from Star Trek should be the prototypical ELF user.
Screenshot borrowed from Wikipedia

I was recently addressing some common folk linguistic myths about English, especially the English used as a lingua franca (ELF) between its non-native speakers. One of these myths concerns “color”, or more often than not, “colour”, since it seems the British “owners” of English are the ones most preoccupied with this trait. More specifically, you hear the charge of “colourless” English directed toward ELF speakers. You might come to think there was an expressionless room of Vulcans exchanging robotic strings of linguistic data. You can’t be human in a foreign language, can you?

Believe it or not, ELF users somehow manage to be fully human in English, even in academic settings. I’ve already blogged about the distribution of laughter in the ELFA corpus of spoken academic ELF, and there doesn’t seem to be a big difference in the frequency of laughter in equivalent native-speaker data (MICASE corpus) or between the ELF speakers from different first-language backgrounds. So you’ve got to conclude that there must be some “colour” in there somewhere.

Valeria Franceschi of the University of Verona was a visiting PhD student in Helsinki last year, and she investigated these questions in our academic ELF data. Her findings were just published in the Journal of English as a Lingua Franca, and they confirm what has already been known for some time in ELF research – when the sneering critics of “colourless” English are out of the room, ELF speakers don’t hesitate to use idiomatic and metaphoric language, borrow images from their own linguacultures, and create new metaphors on-line (see esp. the work of Marie-Luise Pitzl on metaphoric language in ELF).

Keep reading…

Tagged ,

What’s so funny? More laughter in academic talk

Even real scientists like to laugh.Photo by Ruth OrkinSource: artnet.tumblr.com

Even real scientists like to laugh.
Photo by Ruth Orkin
Source: artnet.tumblr.com

Is it possible to fully experience humor when using a foreign language? This varies from person to person (you probably know someone with no sense of humor in any language), and maybe also from culture to culture. There’s a lot of culture-specific humor, so that even native speakers of the same language from different cultural backgrounds (e.g. Brits and Americans) are susceptible to misunderstandings when a joke is missed or a metaphor lacks a cultural reference.

Much intercultural research, even on academic talk, takes this monolithic approach – Culture A does it this way, Culture B does that that way, when Culture A goes to Culture B to study, there’s going to be problems. But lingua franca interaction adds additional variables, especially when English is spoken by second-language users outside of an English-speaking country. What then?

In an earlier post I presented some data from the Corpus of English as a Lingua Franca in Academic Settings (ELFA corpus), which I compared to similar spoken data from the U.S. When looking at the broad, corpus-wide frequency of laughter in the two corpora, there was no striking difference between the native and non-native speaker data. A laugh occurs 2-3 times per 1,000 words in each corpus, and laughter is concentrated in similarly interactive events like seminar discussions. Keep reading…

Tagged ,

Getting serious about laughter in academic talk

Academic discourse is serious business. Lectures are delivered, conference presentations are discussed, great thoughts hang in the air like disembodied spirits. It’s not the kind of environment you’d expect to find a lot of laughter and joking. And yet, we academics can’t seem to stop laughing.

The frozen Baltic

The Baltic Sea is still frozen in February. We’re anxiously awaiting the sun.
© Nina Valtavirta

The ELFA project had our February meeting on Thu., 21.2, and MA student Jani Ahtiainen gave a talk on laughter in spoken academic discourse. He’s doing his master’s research on terms of address in the ELFA corpus, an area often connected to culture-specific norms and expectations. Likewise, the occurrence of humor and laughter might be influenced by culture as well.

Jani based his discussion on a 2006 article by David Lee that looked at occurrences of laughter in MICASE (Michigan Corpus of Academic Spoken English). The idea behind the article is that foreign students must struggle with the profound subtlety of American humor, so we should study laughter in MICASE to help these hapless foreigners cope. These are quite different research motivations than we have in the ELF field, but the question of laughter in academic ELF is still relevant.

Keep reading…

Tagged , ,