Category Archives: Research blogging

In search of wild diversity: a closer look at 3rd-person zero marking in ELF

The late Australian naturalist Steve Irwin (aka the Crocodile Hunter) had an infectious love of wild diversity. Source: Sydney Morning Herald

The late Australian naturalist Steve Irwin (aka the Crocodile Hunter) had an infectious love of wild diversity.
Source: Sydney Morning Herald

One of my most-read posts has been on the frequencies of 3rd-person singular present verb forms (he says, she says) in English spoken as a lingua franca (ELF). When looking at English used primarily between non-native speakers of English, is there a greater likelihood of finding the unmarked “zero” form of 3rd-person singular present – he say, she say? This so-called “dropping” of 3rd-person -s has been promoted as the emerging “default option” in ELF interaction, most notably by Martin Dewey (Dewey 2007; Cogo & Dewey 2006).

My previous post questioned the quality of Dewey’s data, much of which is elicited data from English classroom settings (i.e. not naturally occurring ELF interaction). In addition, his database of 60,000 words is far too small to warrant the sweeping generalisations he proposes, and I offered counterfindings from the better compiled, one-million-word VOICE corpus (Vienna-Oxford International Corpus of English). The VOICE team recently released a part-of-speech (POS) tagged version of the corpus with double POS-tags showing each word’s function and form, allowing quick calculations of 3rd-person -s vs. zero distributions.

While Dewey reports that 52% of the 3rd-person singular present verbs in his data appear without the -s morpheme (these are for main verbs, not auxiliaries), there is no support for this “emerging default option” in VOICE. After excluding all the forms of high-frequency be and have, the 5335 remaining verbs functioning as 3rd-person present singular verbs (tagged fVVZ) include only 310 cases of 3rd-person zero – just under 6% of the total. How could this be so different from Dewey’s findings? His small, unrepresentative database is the likely cause, but there must be more to the story.

This post goes deeper into the findings on what I’ll now refer to as “3rd-person zero” in the VOICE corpus of naturally occurring ELF interactions – do specific individuals, speech events, or speakers of certain first languages produce the 3rd-person zero form more often than others?

In search of wild diversity

There are ELF researchers who seem to start their studies determined to hack their way through a wild linguistic jungle of unexplored diversity, like a Crocodile Hunter for linguists. It’s true that diversity is prominent in ELF talk and it’s more fun to study than homogeneity, but not finding wild diversity where it was expected is also a significant finding. So what else can we say about these 310 cases of 3rd-person zero found in VOICE? Keep reading…


“I am going to looks like stupid”: language commentary & correction in spoken ELF

Enjoy it while it lasts, folks. Fall is just around the corner...© Nina Valtavirta

Enjoy it while it lasts, folks. Fall is just around the corner…
© Nina Valtavirta

When I introduced the PhD thesis of ELFA project member Niina Hynninen (read the intro here), I outlined some considerations for studying language regulation when English is spoken as a lingua franca (ELF). The norms of acceptable English in ELF settings are not self-evident – certainly the norms of “correctness” in relation to native-speaker standards are present, but the range of acceptability might be broader that this. The answer must be found in ELF interaction itself. What do ELF speakers actually do in their real-time negotiation of “living norms”?

In the introductory post, I reviewed Niina’s interactive ELF data drawn from academic study events and the range of interactive features she examined as expressions of language regulation. In this post, I go deeper into her data and findings on four areas of language regulation. First, what kind of overt comments do ELF users make on the quality of their/others’ English? Second, what kind of explicit corrections are made in their talk? Then I move on to more subtle forms of language regulation: instances where reformulations are embedded in a second speaker’s utterance (“embedded repairs”) and reformulations involving third-party intervention (“mediation”).

Commenting on English

In the 20 hours of interactive data that Niina analysed, it was rare to find comments on another speaker’s English. It was typically students who commented on their own language. Most of this commentary was found in the student group work events as expressions of insecurity: “I’m not that good in English“, “I don’t know if the word is correct in English“, “my English is not very good I know“. Despite this uncertainty, Niina found that these comments on language quality were not accompanied by signs of communication trouble or misunderstanding. Instead, they seemed to serve as disclaimers although there were no signals of unacceptability from other participants. Keep reading…

In praise of Finnish innovation: a tribute to Ossi Ihalainen

Prof. Ossi Ihalainen (1941-1993), the man who makes my research possibleSource: VARIENG website

Prof. Ossi Ihalainen (1941-1993), the man who makes my research possible.
Source: VARIENG website

When my misguided and unconventional life crash-landed in the University of Helsinki’s English department, I had no idea I was in the midst of a world-class center of linguistic research. It took less than a year to figure it out, and before I had finished my bachelor’s degree, I was working as a research assistant. Now as a PhD student in the same department, I understand very well that I’m part of a proud tradition of innovators in English linguistics.

Earlier this year, I was awarded a three-year grant from the Finnish Cultural Foundation (Suomen Kulttuurirahasto) to pursue my research in Linear Unit Grammar (LUG) and English as a lingua franca (ELF). On August 1, the foundation dispensed the first installment of my grant from the Ossi Ihalainen trust, and I want to take a moment to honor the man who supports my work, some 20 years after his passing. Oddly enough, we have a few things in common.

Ihalainen (right) making field recordings of interviews with speakers of the Somerset dialect.Source: VARIENG website

Ihalainen (right) making field recordings of interviews with speakers of the Somerset dialect.
Source: VARIENG website

Prof. Ossi Ihalainen of the University of Helsinki’s English department was a pioneer in the use of computers to research linguistic corpora. This was during the 1980s, when mainframe computers were needed to perform searches and the programming language of the day was still FORTRAN. He was also an expert in British dialectology, having conducted extensive fieldwork in the 1970s on the dialects of the Somerset region of southwest England. His crowning achievement was a hundred-page chapter on English dialects for the Cambridge History of the English Language.

Prof. Ihalainen died before his time at the age of 51. He fought leukemia for over a year before his death in 1993, and, in an act of academic heroism, he labored through his illness to complete the abovementioned chapter on English dialects. At the time of his death, his will established a trust to support linguistic research in Finland. And still in 2013, that’s exactly what he’s doing, and I do indeed thank him for that. Keep reading…

Laughter in academic talk: Brits, Yanks & ELF compared

Click to jump to the original article (behind paywall): Nesi, Hilary (2012) Laughter in university lectures. Journal of English for Academic Purposes, 11(2). 79-89.

Click to jump to the original article (behind paywall): Nesi, Hilary (2012) Laughter in university lectures. Journal of English for Academic Purposes, 11(2). 79-89.

Update 30.12.2013: this updated post reflects improvements to the Python scripts used to generate the token counts. Links to the improved scripts are available in the footnotes. Minor changes to the token counts and frequencies have been made in the tables and text, but the main content of the post remains unchanged.

When I was earlier blogging on the frequencies of laughter in academic ELF (English as a lingua franca), I came across an article by Prof. Hilary Nesi, a compiler of the BASE corpus – the Corpus of British Academic Spoken English. She provides a qualitative analysis of the types and functions of laughter episodes in lectures from the BASE corpus and she concludes with the uncontroversial advice that British lecturers might want to adjust their use of humor when lecturing for an international audience.

I’ve waited until now to blog on Nesi’s article, since it contains obvious statistical errors that I wanted to research further. When I say obvious, I mean obvious – she cites the word count of the BASE lecture subcorpus as 2,646,920 words, when the official count of the entire corpus is only 1,644,942 words (cited in the same article). Nesi uses this oddly inflated word count to compute the standardised frequencies of laughter in lectures, which are therefore artificially low. Being naturally curious, I emailed Prof. Nesi in April to ask if she could clarify the situation, and naturally I received no reply.

To be fair, everyone makes mistakes and the quantitative findings don’t really affect her qualitative analysis. But this was published in a major peer-reviewed journal, the Journal of English for Academic Purposes. When a statistical error this basic can get past a senior researcher, two peer reviewers, and an editorial staff, it gives this junior researcher a fairly discouraging picture of academic rigor in the humanities. I might just be the first person on earth to look carefully at Nesi’s tables.

When in doubt, do it yourself

The thing that makes corpus research almost seem like real science is reproducibility – like with real experimental results, another researcher can take a linguistic corpus and try to reproduce a study’s findings. So, I downloaded the BASE corpus in XML format and set out to reproduce Nesi’s figures. She also uses the XML version of BASE, but only to search for laughter tags using the WordSmith Tools application. My first theory was that she had generated a word count for the lectures without excluding the XML markup, but even this approach didn’t reach her inflated word count.

Keep reading…

Tagged , , ,

And so on, or something like that: vague expressions in academic ELF

Another lakeside view of our heavenly Finnish summer.© Nina Valtavirta

Another lakeside view of our heavenly Finnish summer.
© Nina Valtavirta

An important part of academic argumentation is not what you say, but how you say it. It’s one thing to make a bold claim, and another to “soften” it by adding expressions like or something like that, more or less, or in a way. These recurring chunks aren’t merely filler – they convey important interactive information. Vague expressions, or VEs, “express the speaker’s uncertainty or personal attitude towards the proposition and indicate for example solidarity” (Metsä-Ketelä 2012: 264).

Earlier research has expressed concern about non-native speakers’ learning and use of vague expressions, with the danger of sounding “blunt” or “pedantic” if these VEs are underused. In a recent paper by ELFA project member Maria Metsä-Ketelä, these concerns were investigated in the ELFA corpus of spoken academic ELF (English as a lingua franca). How are these vague chunks employed by second-language users in interaction with each other, and how do these findings compare to similar native-speaker data?

OI chunks: organising interaction

You’ll notice that vague expressions like and so on and in a sense function as units – they’re fixed chunks of language that typically don’t vary in form. From a Linear Unit Grammar (LUG) point of view, these are OI chunks (Organising Interaction) which can be used by a speaker to qualify her stance on the main content of an utterance. As Maria points out, the vague expressions in her study serve to intentionally add imprecision. They also have two other important traits:

  • VEs do not contribute to the propositional content of an utterance, or the message itself (the M chunks in LUG)
  • VEs “are supplementary, that is, they could be omitted from the utterance without compromising its syntactic structure” (Metsä-Ketelä 2012: 265).

Keep reading…

Tagged ,

Language regulation in academic ELF interaction

This is the first in a series of posts on the recently defended doctoral dissertation of Niina Hynninen. Click on the image for a link to the dissertation's full text.

This is the first in a series of posts on the recently defended doctoral dissertation of Niina Hynninen. Click on the image for a link to the dissertation’s full text.

When English is used as a lingua franca (ELF) between second-language speakers, there is still the question of what is normative – what is acceptable English in a lingua franca setting, and in a group of speakers with diverse backgrounds, which linguistic norms can be said to shape the interaction? These are questions that go beyond talk about “good English” or who can claim ownership of a language. This involves what ELF users themselves believe about acceptable English, and also what they actually do in interaction.

This construction of “living norms”, or norms that are co-constructed in interaction, was a topic of Niina Hynninen’s PhD dissertation, Language Regulation in English as a Lingua Franca: Exploring language-regulatory practices in academic spoken discourse. Language regulation is a broad concept covering the many ways we orient to what is appropriate language, from prescriptive grammatical rules to the ways we correct ourselves and others in interaction. If standardised rules are seen as “top-down” regulators of language, then Niina’s research focused on the “bottom-up” language regulation that is enacted by real people in authentic lingua franca interaction.

The construction of living norms in interaction might well incorporate these prescriptive norms, but not necessarily. Niina clarifies the distinction as follows:

Living, or non-codified, norms emerge as a result of acceptability negotiation in interaction, whereas prescriptive, or codified, norms arise as a consequence of linguistic description and codification. What is crucial, however, is that codified norms are not treated as relevant at the outset, but rather only to the extent that they are maintained and accepted in interaction.

(Hynninen 2013: 22)

In other words, prescriptive norms become living norms when they are realised in interaction. But this also highlights the gap between belief and behavior, which may not correspond in practice. What ELF users believe about acceptable English and how they actually negotiate acceptable ELF in interaction are two questions that must be studied separately. Niina also points out the need to distinguish between beliefs and normative expectations in specific contexts, including academic discourse. This is not so much a question of “correctness” as how to function appropriately or according to expectations within a community of practice.

Keep reading…


In defense of good data: the question of third-person singular –s

There's a special place in heaven called "Midsummer in Finland". This is a recent sunset viewed at the Saimaa in eastern Finland.© Nina Valtavirta

There’s a special place in heaven called “Midsummer in Finland”. This is a recent sunset viewed at the Saimaa in eastern Finland.
© Nina Valtavirta

In the early days of ELF research, it was sometimes claimed that English used as a lingua franca (ELF) between its second language speakers might be a separate and unique variety of English. No one seems to want to defend this claim any longer, and more emphasis is placed on the inherent complexity and fluidity of these lingua franca encounters. Yet, despite this distance from explicit claims of variety status, there is still the tendency for ELF researchers to treat ELF as a bounded object.

This is the argument developed by Janus Mortensen in the latest issue of the Journal of English as a Lingua Franca. He discusses a tendency in ELF research to treat ELF as a language system alongside English as a native language (ENL), in effect reifying ELF or treating it as a bounded object. As a result of this reification, ELF is “turned into a bounded object that can be delimited and characterized in terms of specific properties”, including properties of a formal linguistic nature (Mortensen 2013: 30).

One such linguistic property that Mortensen discusses is the marking of 3rd-person singular verbs in present simple tense: she studies in the university. This so-called 3rd-person singular –s morpheme is an anomaly of the English verb system (I study, you study, we study, and they study, but she studies), and some varieties of English regularise this feature: she study in the university. This “dropping” of the 3rd-person –s (also referred to as 3rd-person zero) has been proposed as a prominent feature of ELF talk since the early 2000s, and it is precisely this notion of a broadly claimed “ELF variant” that Mortensen objects to.

“Emerging as the default option”

As recently as 2012, ELF researchers Alessia Cogo and Martin Dewey have made the claim that “at least in certain types of ELF settings, 3rd person zero appears to be emerging as the default option in informal naturally occurring communications” (Cogo & Dewey 2012: 49). Keep reading…


Research blogging as an academic genre

Mauranen, A. (2013) Hybridism, edutainment, and doubt: Science blogging finding its feet. Nordic Journal of English Studies, 12(1). Click abstract for full text.

Mauranen, A. (2013) Hybridism, edutainment, and doubt: Science blogging finding its feet. Nordic Journal of English Studies, 12(1). Click abstract for full text.

Research blogging has become an object of research in its own right, and one area of interest for linguists is research blogging as an academic genre and means for communicating scientific knowledge. ELFA project director Anna Mauranen recently published an article on this linguistic aspect of research blogging in the Nordic Journal of English Studies. As a pilot study for the WrELFA corpus (Written English as a Lingua Franca in Academic Settings), her research focused on two well-established blogs and especially their comment threads, where ongoing scientific controversies (the Higgs boson and arsenic-consuming bacteria) were being discussed.

As I described in an earlier post (Blogging about blogging about blogging), I’ve been collecting samples from research blogs for the WrELFA corpus. This has familiarised me with the blogging conventions of 35 researchers who use English as a second/foreign language. In the process of compiling over 250,000 words of research blogs and comments (so far), I’ve gotten a bird’s-eye view of blogging as a scientific genre. For this post, I hope to add a few thoughts to Anna’s more in-depth study on two blogs over a longer period of time.

Individuals & communities

In her review of earlier research on blogging, Anna cites the broad distinction between thematic and personal blogs, stating “Clearly, it is the ‘thematic’ – or non-personal – type that bears the most relevance to science blogging” (Mauranen 2013: 11). This raises an interesting question, though, about how much research blogging actually bridges these two broad blog types. In other words, where does the science end and the scientist begin?

Keep reading…

Fluent chunks 2: How to label your chunks

Photo by Alan Chia via Wikimedia Commons

Photo by Alan Chia via Wikimedia Commons

Most people recognise that we don’t speak in “sentences”. Still, speech is analysed and described using the concepts of sentence grammars, even when these writing-based systems must be bent and stretched, or vice versa – isn’t it cheating to “clean up” naturally occurring speech so it fits into a sentence grammar?

In a previous post I introduced Linear Unit Grammar, or LUG, a chunk-based approach to analysing spoken and written text. In that post I introduced the linear, word-by-word process of chunking up a string of transcribed speech by placing intuitively directed chunk boundaries. The discussion focused on this short extract from an academic conference in the ELFA corpus. When asked about her experience with students in Brazil, the speaker responded:

er i c- i i so i i went to portugal er i live in portugal er for 13 years so i er my experience with brazilian students is is a long way @@ okay a long time ago (note: @@ = laughter and er is like uh in the US style)

How do you divide this into a well-formed constituency tree? The short answer is you don’t, and neither do speakers in actual interaction. LUG analysis attempts to mirror the real-time, linear processing of language as multi-word chunks, regardless of “grammaticality”. Keep reading…

Creativity and color in academic ELF

If you believe the myth that ELF is "colorless" English, then Spock from Star Trek should be the prototypical ELF user.Screenshot borrowed from Wikipedia

If you believe the myth that ELF is “colorless” English, then Spock from Star Trek should be the prototypical ELF user.
Screenshot borrowed from Wikipedia

I was recently addressing some common folk linguistic myths about English, especially the English used as a lingua franca (ELF) between its non-native speakers. One of these myths concerns “color”, or more often than not, “colour”, since it seems the British “owners” of English are the ones most preoccupied with this trait. More specifically, you hear the charge of “colourless” English directed toward ELF speakers. You might come to think there was an expressionless room of Vulcans exchanging robotic strings of linguistic data. You can’t be human in a foreign language, can you?

Believe it or not, ELF users somehow manage to be fully human in English, even in academic settings. I’ve already blogged about the distribution of laughter in the ELFA corpus of spoken academic ELF, and there doesn’t seem to be a big difference in the frequency of laughter in equivalent native-speaker data (MICASE corpus) or between the ELF speakers from different first-language backgrounds. So you’ve got to conclude that there must be some “colour” in there somewhere.

Valeria Franceschi of the University of Verona was a visiting PhD student in Helsinki last year, and she investigated these questions in our academic ELF data. Her findings were just published in the Journal of English as a Lingua Franca, and they confirm what has already been known for some time in ELF research – when the sneering critics of “colourless” English are out of the room, ELF speakers don’t hesitate to use idiomatic and metaphoric language, borrow images from their own linguacultures, and create new metaphors on-line (see esp. the work of Marie-Luise Pitzl on metaphoric language in ELF).

Keep reading…

Tagged ,