Category Archives: Original findings

Needles in a haystack: questioning the “fluidity” of ELF

As I’ve earlier argued on this blog, sometimes the claims of “fluidity”, “diversity”, and “innovation” found in English as a lingua franca (ELF) research are overstated. It’s so diverse that even ordinary diversity won’t do – it’s “super-diversity” now. It could very well be ultra-mega-diversity-squared, but the question of the prominence of these presumably innovative features is a quantitative one. More specifically, it’s a question of how frequently any variant forms might occur in naturally occurring ELF interaction, relative to the conventional forms. One of my shameless nerd hobbies is writing little Python programs to query corpora, and several of these mini-studies have appeared on this blog. I especially enjoy working with the VOICE corpus, which is great because 1) it contains a million words of unelicited ELF interaction; 2) it’s ready for processing as well-formed XML; and 3) it has been meticulously part-of-speech (POS) tagged for both the form and function of each word in the corpus.

The value of this double form-function tag is that it reveals every token in the corpus where a word like fluently, which is formally recognisable as an adverb, functions in a different way, like as an adjective: i think you are very fluently in english. This example of fluently from VOICE has a form tag of RB (adverb), but a function tag of JJ (adjective) to reflect that fluently seems to be serving in an adjectival function. This kind of form-function variation in ELF is presumably prominent enough that it necessitates this double tagging to adequately describe the fluidity. The VOICE team was kind enough to carry out this formidable task involving manual inspection of all million words. Now that this resource is in place (and freely available), the instances of these form-function mismatches can be easily found, counted, and viewed in context.

I’ve wondered for some time how often these variant form-function tokens occur overall, in relation to their conventional forms. My interest was renewed by the recent paper by VOICE project researcher Ruth Osimk-Teasdale in the Journal of English as a Lingua Franca. One of the main workers on the VOICE POS-tagging project, she investigates word class shifts in VOICE. She narrows her data to double form-function tags that reflect a shift of category across word classes (like from adverb to adjective). These inter-categorical word class shifts therefore exclude variations within a word class, like singular nouns which are treated as plural. She focuses on items like fluently above, where word class conversion occurs without any change to the form of the word itself.

Assigning these form-function tags – and the analysis of them – are directly linked to the fluidity of ELF: Keep reading…


Whose English? A window into written academic ELF

A February sunset in Helsinki.© Nina Valtavirta

A February sunset in Helsinki.
© Nina Valtavirta

Though it’s easy to see that English has become the lingua franca of academia, it’s not always clear how widespread it is within a specific institution. Moreover, it’s not always clear whose English we’re talking about – while English is increasingly used as a lingua franca (ELF) between non-native English speakers and authors, English as a native language (ENL) is still in the mix. In an internationally oriented university such as here in Helsinki, how much of a presence does English have? And are we talking about ENL – native-speaker varieties such as in the US/UK – or English as a lingua franca (ELF)?

Any number of approaches could be put to this question, but our recent work on the WrELFA corpus of written academic ELF offers an intriguing look into language use within the University of Helsinki. We just finished compiling a subcorpus of preliminary examiners’ statements – the written evaluations by senior academics of newly submitted PhD theses. In Finland, PhD candidates must first submit their theses to obtain permission for a public defence. Typically two examiners evaluate the work and either grant or deny the permission to defend it.

These examiners’ statements are intriguing data for two reasons. First, they comprise a high-stakes academic genre that is part of the public examination as well as a demonstration of the author’s expertise. Second, they offer a unique source of written academic ELF. The examiners are often non-native English users who are writing statements to be read by Finnish students and faculty members. There are not native English gatekeepers in the writing process – as there are, for instance, in academic publishing – but ENL authors are also active in submitting evaluations. In short, it’s an unregulated window into linguistic practices within and across academic fields and faculties. Keep reading…

In search of wild diversity: a closer look at 3rd-person zero marking in ELF

The late Australian naturalist Steve Irwin (aka the Crocodile Hunter) had an infectious love of wild diversity. Source: Sydney Morning Herald

The late Australian naturalist Steve Irwin (aka the Crocodile Hunter) had an infectious love of wild diversity.
Source: Sydney Morning Herald

One of my most-read posts has been on the frequencies of 3rd-person singular present verb forms (he says, she says) in English spoken as a lingua franca (ELF). When looking at English used primarily between non-native speakers of English, is there a greater likelihood of finding the unmarked “zero” form of 3rd-person singular present – he say, she say? This so-called “dropping” of 3rd-person -s has been promoted as the emerging “default option” in ELF interaction, most notably by Martin Dewey (Dewey 2007; Cogo & Dewey 2006).

My previous post questioned the quality of Dewey’s data, much of which is elicited data from English classroom settings (i.e. not naturally occurring ELF interaction). In addition, his database of 60,000 words is far too small to warrant the sweeping generalisations he proposes, and I offered counterfindings from the better compiled, one-million-word VOICE corpus (Vienna-Oxford International Corpus of English). The VOICE team recently released a part-of-speech (POS) tagged version of the corpus with double POS-tags showing each word’s function and form, allowing quick calculations of 3rd-person -s vs. zero distributions.

While Dewey reports that 52% of the 3rd-person singular present verbs in his data appear without the -s morpheme (these are for main verbs, not auxiliaries), there is no support for this “emerging default option” in VOICE. After excluding all the forms of high-frequency be and have, the 5335 remaining verbs functioning as 3rd-person present singular verbs (tagged fVVZ) include only 310 cases of 3rd-person zero – just under 6% of the total. How could this be so different from Dewey’s findings? His small, unrepresentative database is the likely cause, but there must be more to the story.

This post goes deeper into the findings on what I’ll now refer to as “3rd-person zero” in the VOICE corpus of naturally occurring ELF interactions – do specific individuals, speech events, or speakers of certain first languages produce the 3rd-person zero form more often than others?

In search of wild diversity

There are ELF researchers who seem to start their studies determined to hack their way through a wild linguistic jungle of unexplored diversity, like a Crocodile Hunter for linguists. It’s true that diversity is prominent in ELF talk and it’s more fun to study than homogeneity, but not finding wild diversity where it was expected is also a significant finding. So what else can we say about these 310 cases of 3rd-person zero found in VOICE? Keep reading…


Laughter in academic talk: Brits, Yanks & ELF compared

Click to jump to the original article (behind paywall): Nesi, Hilary (2012) Laughter in university lectures. Journal of English for Academic Purposes, 11(2). 79-89.

Click to jump to the original article (behind paywall): Nesi, Hilary (2012) Laughter in university lectures. Journal of English for Academic Purposes, 11(2). 79-89.

Update 30.12.2013: this updated post reflects improvements to the Python scripts used to generate the token counts. Links to the improved scripts are available in the footnotes. Minor changes to the token counts and frequencies have been made in the tables and text, but the main content of the post remains unchanged.

When I was earlier blogging on the frequencies of laughter in academic ELF (English as a lingua franca), I came across an article by Prof. Hilary Nesi, a compiler of the BASE corpus – the Corpus of British Academic Spoken English. She provides a qualitative analysis of the types and functions of laughter episodes in lectures from the BASE corpus and she concludes with the uncontroversial advice that British lecturers might want to adjust their use of humor when lecturing for an international audience.

I’ve waited until now to blog on Nesi’s article, since it contains obvious statistical errors that I wanted to research further. When I say obvious, I mean obvious – she cites the word count of the BASE lecture subcorpus as 2,646,920 words, when the official count of the entire corpus is only 1,644,942 words (cited in the same article). Nesi uses this oddly inflated word count to compute the standardised frequencies of laughter in lectures, which are therefore artificially low. Being naturally curious, I emailed Prof. Nesi in April to ask if she could clarify the situation, and naturally I received no reply.

To be fair, everyone makes mistakes and the quantitative findings don’t really affect her qualitative analysis. But this was published in a major peer-reviewed journal, the Journal of English for Academic Purposes. When a statistical error this basic can get past a senior researcher, two peer reviewers, and an editorial staff, it gives this junior researcher a fairly discouraging picture of academic rigor in the humanities. I might just be the first person on earth to look carefully at Nesi’s tables.

When in doubt, do it yourself

The thing that makes corpus research almost seem like real science is reproducibility – like with real experimental results, another researcher can take a linguistic corpus and try to reproduce a study’s findings. So, I downloaded the BASE corpus in XML format and set out to reproduce Nesi’s figures. She also uses the XML version of BASE, but only to search for laughter tags using the WordSmith Tools application. My first theory was that she had generated a word count for the lectures without excluding the XML markup, but even this approach didn’t reach her inflated word count.

Keep reading…

Tagged , , ,

In defense of good data: the question of third-person singular –s

There's a special place in heaven called "Midsummer in Finland". This is a recent sunset viewed at the Saimaa in eastern Finland.© Nina Valtavirta

There’s a special place in heaven called “Midsummer in Finland”. This is a recent sunset viewed at the Saimaa in eastern Finland.
© Nina Valtavirta

In the early days of ELF research, it was sometimes claimed that English used as a lingua franca (ELF) between its second language speakers might be a separate and unique variety of English. No one seems to want to defend this claim any longer, and more emphasis is placed on the inherent complexity and fluidity of these lingua franca encounters. Yet, despite this distance from explicit claims of variety status, there is still the tendency for ELF researchers to treat ELF as a bounded object.

This is the argument developed by Janus Mortensen in the latest issue of the Journal of English as a Lingua Franca. He discusses a tendency in ELF research to treat ELF as a language system alongside English as a native language (ENL), in effect reifying ELF or treating it as a bounded object. As a result of this reification, ELF is “turned into a bounded object that can be delimited and characterized in terms of specific properties”, including properties of a formal linguistic nature (Mortensen 2013: 30).

One such linguistic property that Mortensen discusses is the marking of 3rd-person singular verbs in present simple tense: she studies in the university. This so-called 3rd-person singular –s morpheme is an anomaly of the English verb system (I study, you study, we study, and they study, but she studies), and some varieties of English regularise this feature: she study in the university. This “dropping” of the 3rd-person –s (also referred to as 3rd-person zero) has been proposed as a prominent feature of ELF talk since the early 2000s, and it is precisely this notion of a broadly claimed “ELF variant” that Mortensen objects to.

“Emerging as the default option”

As recently as 2012, ELF researchers Alessia Cogo and Martin Dewey have made the claim that “at least in certain types of ELF settings, 3rd person zero appears to be emerging as the default option in informal naturally occurring communications” (Cogo & Dewey 2012: 49). Keep reading…


What’s so funny? More laughter in academic talk

Even real scientists like to laugh.Photo by Ruth OrkinSource:

Even real scientists like to laugh.
Photo by Ruth Orkin

Is it possible to fully experience humor when using a foreign language? This varies from person to person (you probably know someone with no sense of humor in any language), and maybe also from culture to culture. There’s a lot of culture-specific humor, so that even native speakers of the same language from different cultural backgrounds (e.g. Brits and Americans) are susceptible to misunderstandings when a joke is missed or a metaphor lacks a cultural reference.

Much intercultural research, even on academic talk, takes this monolithic approach – Culture A does it this way, Culture B does that that way, when Culture A goes to Culture B to study, there’s going to be problems. But lingua franca interaction adds additional variables, especially when English is spoken by second-language users outside of an English-speaking country. What then?

In an earlier post I presented some data from the Corpus of English as a Lingua Franca in Academic Settings (ELFA corpus), which I compared to similar spoken data from the U.S. When looking at the broad, corpus-wide frequency of laughter in the two corpora, there was no striking difference between the native and non-native speaker data. A laugh occurs 2-3 times per 1,000 words in each corpus, and laughter is concentrated in similarly interactive events like seminar discussions. Keep reading…

Tagged ,

Getting serious about laughter in academic talk

Academic discourse is serious business. Lectures are delivered, conference presentations are discussed, great thoughts hang in the air like disembodied spirits. It’s not the kind of environment you’d expect to find a lot of laughter and joking. And yet, we academics can’t seem to stop laughing.

The frozen Baltic

The Baltic Sea is still frozen in February. We’re anxiously awaiting the sun.
© Nina Valtavirta

The ELFA project had our February meeting on Thu., 21.2, and MA student Jani Ahtiainen gave a talk on laughter in spoken academic discourse. He’s doing his master’s research on terms of address in the ELFA corpus, an area often connected to culture-specific norms and expectations. Likewise, the occurrence of humor and laughter might be influenced by culture as well.

Jani based his discussion on a 2006 article by David Lee that looked at occurrences of laughter in MICASE (Michigan Corpus of Academic Spoken English). The idea behind the article is that foreign students must struggle with the profound subtlety of American humor, so we should study laughter in MICASE to help these hapless foreigners cope. These are quite different research motivations than we have in the ELF field, but the question of laughter in academic ELF is still relevant.

Keep reading…

Tagged , ,