Whose English? A window into written academic ELF

A February sunset in Helsinki.© Nina Valtavirta

A February sunset in Helsinki.
© Nina Valtavirta

Though it’s easy to see that English has become the lingua franca of academia, it’s not always clear how widespread it is within a specific institution. Moreover, it’s not always clear whose English we’re talking about – while English is increasingly used as a lingua franca (ELF) between non-native English speakers and authors, English as a native language (ENL) is still in the mix. In an internationally oriented university such as here in Helsinki, how much of a presence does English have? And are we talking about ENL – native-speaker varieties such as in the US/UK – or English as a lingua franca (ELF)?

Any number of approaches could be put to this question, but our recent work on the WrELFA corpus of written academic ELF offers an intriguing look into language use within the University of Helsinki. We just finished compiling a subcorpus of preliminary examiners’ statements – the written evaluations by senior academics of newly submitted PhD theses. In Finland, PhD candidates must first submit their theses to obtain permission for a public defence. Typically two examiners evaluate the work and either grant or deny the permission to defend it.

These examiners’ statements are intriguing data for two reasons. First, they comprise a high-stakes academic genre that is part of the public examination as well as a demonstration of the author’s expertise. Second, they offer a unique source of written academic ELF. The examiners are often non-native English users who are writing statements to be read by Finnish students and faculty members. There are not native English gatekeepers in the writing process – as there are, for instance, in academic publishing – but ENL authors are also active in submitting evaluations. In short, it’s an unregulated window into linguistic practices within and across academic fields and faculties.

For the past several months I’ve been working with Prof. Anna Mauranen and research assistant Ruut Kosonen on compiling a corpus of these English-language examiners’ statements submitted in 2011 & 2012 to six University of Helsinki faculties. We finished this task last month, ending up with 402,135 words of text (the WrELFA corpus overall has passed 800,000 words since my last update). During that process, Ruut compiled figures from each of the faculties on how many examiners’ statements were submitted and what language they were written in. This post looks at where English stands in the examination process of one of the top research universities in Europe.

Which language? Findings from the six faculties

For comparison purposes, we were seeking a broad balance between natural sciences and humanities. We thus limited our search to three faculties in each of these general groupings:

Natural Sciences:

  • Faculty of Science (Mathematics & natural science in Finnish)
  • Faculty of Medicine
  • Faculty of Agriculture & Forestry

Humanities:

  • Faculty of Arts
  • Faculty of Social Sciences
  • Faculty of Behavioural Science
Figure 1. Distribution of preliminary examiners' statements by the language of the text, for all six faculties, 2011-12.

Figure 1. Distribution of preliminary examiners’ statements by the language of the text, for all six faculties, 2011-12.

Altogether, these six faculties generated 1311 preliminary examiners’ statements during 2011-12, with 67% of these coming from the natural sciences (n=886). Almost all of the statements were written in Finnish (n=743) or English (n=531). Of the 26 statements written in Swedish (Finland’s second official language), only two came from the natural sciences. In fact, the Faculty of Arts appears to be the lone preserver of multilingualism, with 20 statements written in Swedish along with four in French, three in Russian, three in Spanish, and one in Italian. See Figure 1 for the proportional distribution of languages.

With 57% of the examiners’ statements written in Finnish, it would seem that the status of Finnish is secure in the examination system. However, the picture changes when we separate these figures by faculty. Figure 2 below shows the raw counts of the statements written in Finnish (blue), English (red), and Swedish (green) across the six faculties. As you can see, the strength of Finnish can be largely attributed to the strength of the Finnish medical research tradition – an incredible 449 Finnish-language examiners’ statements were submitted in 2011-12:

Figure 2. Raw counts of the number of preliminary examiners' statements written in Finnish (blue), English (red), and Swedish (green) in 2011-12.

Figure 2. Raw counts of the number of preliminary examiners’ statements written in Finnish (blue), English (red), and Swedish (green) in 2011-12.

When looking at the other five faculties, however, the picture is more mixed. All the 43 statements in Behavioural Science were written in English, and the Finnish-English mix is about 50-50 in the Arts and Social Sciences. The English-language statements are predominant in the other science faculties. Overall it appears that Finnish is well represented, but English has a strong presence with 40% of all statements across faculties. This leads to the next question – whose English are we talking about?

Whose English in a lingua franca world?

Before including any examiners’ statement in the corpus, we had to obtain permission from the authors as well as their reported first language(s). At first we were only collecting statements from non-native English users, but later we added English as a native language (ENL) authors as well. Thus, we requested permissions for 524 statements written in English (seven had no identification), ending up with 330 in the corpus (63% of the whole). The following data on authors’ first languages are thus drawn mainly from the authors themselves. Where this info was not obtained, we estimated native/non-native status by whether the author was academically based in an ENL country or not. These figures are therefore close, but likely not perfect.

Figure 3. Distribution of English-language examiners' statements based on authors' use of English as a lingua franca (ELF) or English as a native language (ENL).

Figure 3. Distribution of English-language examiners’ statements based on authors’ use of English as a lingua franca (ELF) or English as a native language (ENL). Pardon the Pac-Man reference.

Among these 524 English-language statements, an estimated 376 are written by non-native English writers for Finnish faculty members, constituting an English as a lingua franca (ELF) interaction. As can be seen in Figure 3 at right, this makes up 72% of these examiners’ statements across all faculties. With only 148 statements written by authors using English as a native language (ENL), we see a local example of a global trend. As English becomes established as a global lingua franca, its native speakers increasingly fall into the minority of users of the language.

A more striking picture emerges when these language figures are separated by faculty. Figure 4 is a set of stacked columns showing the percentage of examiners’ statements written by ELF (blue) and ENL (red) authors. Among the natural sciences, ELF authors make up an average of 82% of English-language statements, with 57% ELF authors among the more humanistic faculties:

Figure 4. Distribution of English-language examiners' statements by ELF (blue) & ENL (red) authors, separated by faculty.

Figure 4. Distribution of English-language examiners’ statements by ELF (blue) & ENL (red) authors, separated by faculty.

These findings appear to support a widely held perception that English has long been established as the lingua franca of the natural sciences. With such large majorities of non-native English authors, these figures do suggest an increasingly heterogeneous linguistic landscape within the English of the natural sciences. On the other hand, some of these differences may simply reflect the academic networks within these faculties. Whatever the case, the difference in the raw counts of ELF/ENL authors between the natural science and humanistic faculties is statistically significant (two-tailed Fisher’s exact test, p<0.0001).

Whose ELF? Inside the examiners’ statement corpus

In the end, we received permission to include 330 English-language examiners’ statements in the corpus (402,135 words). We ended up with a fairly even split between the natural science faculties (183,679 words, 46%) and the humanistic faculties (218,456 words, 54%). As for authors’ first languages, a total of 34 first languages were reported, with 14 of these first languages (including English) making up 90% of words in the corpus (see Figure 5 below). The proportion of ENL authors is 29% of words, about the same as their proportion in the number of English-language statements overall (Figure 3 above).

Figure 5. Distribution of all 34 first languages reported by authors in the preliminary examiners' statement corpus.

Figure 5. Distribution of all 34 first languages reported by authors in the preliminary examiners’ statement corpus.

After ENL authors, native speakers of Finnish, German, Swedish, and French are the best-represented authors in the corpus, together making up 41% of words. Authors with nine more first languages (see Figure 5) range from one to four percent of the corpus, totaling 20% of words. The final 10% of words are contributed by authors with 20 different first language backgrounds. This picture reflects the changing face of global English overall – while ENL users are hardly as marginal as they are portrayed in some ELF research, they appear to be in the minority in ELF contexts such as this.

As interesting as these findings may be, it’s just the beginning of our work. Having answered the question of “Whose English is it?”, we next have to ask, “What kind of English is it?” This descriptive question is what drives the corpus compilation, and this work is already underway. Project assistant Ruut has begun her MA thesis research on evaluative language in the examiners’ statement corpus. In the meantime, WrELFA corpus compilation continues with the last major part of the corpus, SciELF – a collection of scientific articles that are written by ELF users and have not undergone language checking. More to come!

Acknowledgments

This subcorpus of preliminary examiners’ statements has taken a lot of work and resources. In the first place, essential funding has been provided first through the Global English (GlobE) consortium, and now through the Changing English (ChangE) consortium, both of which have been funded by the Academy of Finland.

Prof. Mauranen, who just finished a term as dean of the Faculty of Arts, has taken the lead on gaining access to the data from various faculties and sending out those 524 permission requests (thanks!). The biggest job of processing, proofreading, regularising, and anonymising those 400,000 words themselves has been carried out by ELFA project assistants Jani Ahtiainen, Ruut Kosonen, and Ray Carey (that’s me). With the outstanding contributions from Jani and Ruut, my work has moved away from processing texts toward managing the XML master corpus and the scripts for converting the data into several useful formats.

Advertisements

4 thoughts on “Whose English? A window into written academic ELF

  1. Thoroughly fascinating.
    One thing not mentioned in your detailed account however, is whether you asked the 524 who gave permission for their work to be included in the corpus whether their theses or papers were revised or edited by an English native speaker prior to submission or publication. This may have a slight influence on the EFL samples. I say slight, because the cost of having one’s entire thesis revised is often out of reach for those who have written them.
    Only last night I was revising a 500-word abstract in English of a doctoral thesis for a Portuguese native. The final product bears little resemblance to the original received, and now looks sounds like formal English as written by an English native…

    • Ray Carey says:

      Hi Allison, it’s a good question. Just to clarify, the doctoral theses aren’t included in the database, just the examiners’ statements. These are relatively short texts (about 1200 words each by the corpus average, many less than 1000 words), and we sought them out specifically because they are a somewhat impermanent text type that is not expected to undergo language revision.

      The authenticity of the texts (in the sense of being the author’s own) is underscored by the regularisation process. Typographical errors and simple misspellings are annotated in the XML corpus with both the original and regularised (i.e. “correct”) spelling. In addition, while proofreading we tagged unexpected or unconventional items with <sic> tags, letting the reader known that this is indeed a faithful representation of the original text.

      These sic-tagged items would be things like “the <sic>important</sic> of this dissertation’s research” or “the discussion does not really <sic>going</sic> into much depth”. These examples are actually from English native-speaker statements, and they appear throughout the corpus.

      This made me curious, so I had to run a count. Typos or misspellings are found at least once in 44% of the ELF texts (104 out of 236 texts), and in 34% of the ENL texts (32 out of 94). The sic tag is even more common, appearing in 71% of ELF texts and 57% of ENL texts. The examiners’ statements thus appear to be thoughtful, but written in a hurry.

      To me, the suggestion that the authors were working without so much as an automatic spell-checker makes the texts that much more impressive. Overall I’d characterise it as high-level academic English — good English that might not meet “native-like” expectations.

      Best,
      Ray

  2. […] compiling a corpus of these English-language examiners’ statements submitted in 2011 & 2012 to six University of Helsinki faculties. We finished this task last month, ending up with 402,0135 words of text (the WrELFA corpus overall has passed 800,000 words since my last update). During that process, Ruut compiled figures from each of the faculties on how many examiners’ statements were submitted and what language they were written in. This post looks at where English stands in the examination process of one of the top research universities in Europe.  […]

  3. […] Ray Carey and colleagues are "compiling a corpus of these English-language examiners’ statements submitted in 2011 & 2012 to six University of Helsinki faculties. We finished this task last month, ending up with 402,0135 words of text (the WrELFA corpus overall has passed 800,000 words since my last update). During that process, Ruut compiled figures from each of the faculties on how many examiners’ statements were submitted and what language they were written in. This post looks at where English stands in the examination process of one of the top research universities in Europe."  […]

Leave a Reply to Whose English? A window into written academic E... Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: