WrELFA 2015: the written corpus of academic ELF


During the past two years that I’ve kept up this blog, I’ve been working on compilation of the first corpus of written ELF (English as a lingua franca) for Anna Mauranen’s ELFA project. I started loitering around her group shortly after the ELFA corpus of spoken academic ELF was completed, and a written corpus was already being discussed. A couple years later, with the proper mix of time, money, and research assistants, we launched the WrELFA corpus project – the Written Corpus of English as a Lingua Franca in Academic Settings. And now we can announce that the WrELFA corpus compilation is complete.

I’ve been blogging about this work-in-progress over the past couple years, so I don’t need to repeat it all here. There are three text types included in WrELFA, each of which invites its own investigation. These three components are:

  1. Academic research blogs – this subcorpus is drawn from 40 different blogs maintained by second-language users of English and totals 372,000 words (see this and this post).
  2. PhD examiner reports – 330 evaluations by senior academics with 33 different first languages (402,000 words). I’ve discussed this data in depth in this post.
  3. SciELF corpus – a collaborative, stand-alone corpus of 150 unedited research papers by academics from 10 first-language backgrounds. Partners from 12 universities contributed texts to the 759,000 total words (see this post).

Taken together, these three components total just over 1.5 million words of text with a rough binary division between the natural sciences (55% of words) and disciplines in social sciences and humanities (45% of words). For more detailed information on the make-up of the corpus, see the ELFA project homepage, where I’ve recently done a major update of the WrELFA corpus pages, with documentation of the corpus components, compilation principles, and authors’ L1 distributions. Keep reading…

Whose English? A window into written academic ELF

A February sunset in Helsinki.
Though it’s easy to see that English has become the lingua franca of academia, it’s not always clear how widespread it is within a specific institution. Moreover, it’s not always clear whose English we’re talking about – while English is increasingly used as a lingua franca (ELF) between non-native English speakers and authors, English as a native language (ENL) is still in the mix. In an internationally oriented university such as here in Helsinki, how much of a presence does English have? And are we talking about ENL – native-speaker varieties such as in the US/UK – or English as a lingua franca (ELF)?

Any number of approaches could be put to this question, but our recent work on the WrELFA corpus of written academic ELF offers an intriguing look into language use within the University of Helsinki. We just finished compiling a subcorpus of preliminary examiners’ statements – the written evaluations by senior academics of newly submitted PhD theses. In Finland, PhD candidates must first submit their theses to obtain permission for a public defence. Typically two examiners evaluate the work and either grant or deny the permission to defend it.

These examiners’ statements are intriguing data for two reasons. First, they comprise a high-stakes academic genre that is part of the public examination as well as a demonstration of the author’s expertise. Second, they offer a unique source of written academic ELF. The examiners are often non-native English users who are writing statements to be read by Finnish students and faculty members. There are not native English gatekeepers in the writing process – as there are, for instance, in academic publishing – but ENL authors are also active in submitting evaluations. In short, it’s an unregulated window into linguistic practices within and across academic fields and faculties. Keep reading…

WrELFA corpus progress report: 500k words

This little fellow wants Dionysus’ grapes. From the Capitoline Museums in Rome.
There’s growing interest in English as a lingua franca (ELF) research on description of written ELF. Up to now, ELF data has almost exclusively been drawn from spoken interaction, which is where a lingua franca gets used in the first place. But the use of English as a second/foreign language extends into the written mode as well, and this may also be directed to an international audience. In globalised networks such as academia, examples of English used as a written lingua franca aren’t hard to find. Like other high-stakes domains of ELF, an academic career involves producing English texts that are used to evaluate the author’s professional competence.

Alongside the growth in ELF research has been growing awareness of a power imbalance in academic publishing – journals concentrated in the US & UK typically place a perfect imitation of “native-like” English as a basic criteria for being published. This goes beyond just “correct grammar” and extends into idiomatic usage, phraseological choices, and rhetorical style. So while there’s no dispute that non-native users of English as a lingua franca far outnumber the native English speakers of the world, academic journals tip the balance of power in favor of English native speakers. In short, “good English” is equated with “native-like English”.

This is a question of interest to descriptive ELF research. How does “good English” written by educated professionals who speak a first language other than English differ from the mythologised “native-like English”? This question and the issues surrounding it are persuasively developed by David Owen (2011) in an article on academic publishing and language revision. In his work doing language revision in a Spanish university, he observes that papers rejected on linguistic grounds are often “formally impeccable”, and he presents a series of extracts to illustrate this “correct” vs. “native-like” distinction. In the end, he calls for descriptive ELF research that could clarify this timely question. What does good written ELF look like?

WrELFA: a corpus of written ELF in academia

Late in the same year as Owen’s article was published, Anna Mauranen tasked me with starting compilation of the Corpus of Written English as a Lingua Franca in Academic Settings (WrELFA corpus), which she had been talking about for some time. The million-word ELFA corpus of spoken academic ELF interaction was completed in 2008, and a written companion was a natural development. I’ve been working on this project ever since, and with help from research assistant Jani Ahtiainen, this summer we reached 500,000 words of processed WrELFA text. At this halfway mark to our million-word goal, I thought I’d give an update on our progress. Keep reading…

Research blogging as an academic genre

Mauranen, A. (2013) Hybridism, edutainment, and doubt: Science blogging finding its feet. Nordic Journal of English Studies, 12(1). Click abstract for full text.

Research blogging has become an object of research in its own right, and one area of interest for linguists is research blogging as an academic genre and means for communicating scientific knowledge. ELFA project director Anna Mauranen recently published an article on this linguistic aspect of research blogging in the Nordic Journal of English Studies. As a pilot study for the WrELFA corpus (Written English as a Lingua Franca in Academic Settings), her research focused on two well-established blogs and especially their comment threads, where ongoing scientific controversies (the Higgs boson and arsenic-consuming bacteria) were being discussed.

As I described in an earlier post (Blogging about blogging about blogging), I’ve been collecting samples from research blogs for the WrELFA corpus. This has familiarised me with the blogging conventions of 35 researchers who use English as a second/foreign language. In the process of compiling over 250,000 words of research blogs and comments (so far), I’ve gotten a bird’s-eye view of blogging as a scientific genre. For this post, I hope to add a few thoughts to Anna’s more in-depth study on two blogs over a longer period of time.

Individuals & communities

In her review of earlier research on blogging, Anna cites the broad distinction between thematic and personal blogs, stating “Clearly, it is the ‘thematic’ – or non-personal – type that bears the most relevance to science blogging” (Mauranen 2013: 11). This raises an interesting question, though, about how much research blogging actually bridges these two broad blog types. In other words, where does the science end and the scientist begin?

Blogging about blogging about blogging

One of the base assumptions of ELF (English as Lingua Franca) research is that the English spoken between non-native speakers should be studied and understood in its own right. Lately, interest has also grown in written ELF, when English is the lingua franca of written interaction. To complement the spoken academic ELF in the ELFA corpus, we’ve started work on a new database of written academic ELF – the WrELFA corpus.

There are four main criteria we look for in a written academic ELF text:

  1. it is an instance of second-language use (SLU), not taken from a language learning environment
  2. it is authentic and naturally occurring, not elicited for research purposes
  3. it is ‘high stakes’ in the sense of its academic importance
  4. it has been written without native-English intervention

These four points are a good description of the academic research blog. Graduate students and experienced professors alike represent their research online. Unlike in the US/UK-dominated world of peer-reviewed journals, research bloggers can represent themselves in their professional lingua franca directly to the public and their peers, without linguistic barriers set up by outsiders.

