Language users or learners? Lexical evidence from spoken ELF

Click the image to jump to the article (behind paywall):Kao, S. & Wang, W. (2014) Lexical and organizational features in novice and experienced ELF presentations. Journal of English as a Lingua Franca, 3(1), 49-79. DOI: 10.1515/jelf-2014-0003.

Click the image to jump to the article (behind paywall):
Kao, S. & Wang, W. (2014) Lexical and organizational features in novice and experienced ELF presentations. Journal of English as a Lingua Franca, 3(1), 49-79. DOI: 10.1515/jelf-2014-0003.

One of the key distinctions made in research on English as a lingua franca (ELF) is the difference between language users and learners. ELF data is typically approached from the viewpoint of second language use instead of second language acquisition. Rather than seeing non-native English speakers as perennially deficient pursuers of “native-like” proficiency, ELF researchers start from the position that non-native English is principally English in use – English serves as a vehicular language for doing stuff, and especially for professional life in international domains like academia.

These issues are explored in a study in a recent issue of the Journal of English as a Lingua Franca. Shin-Mei Kao and Wen-Chun Wang take up the user/learner distinction by investigating “lexical and organizational patterns in the presentations made by speakers of different ELF proficiency and experience levels” (Kao & Wang 2014: 54). To do this, they perform a lexical analysis of academic presentations from three different groups – novice students who can be considered as language learners, academic experts using English as a lingua franca, and academic experts who are also English language specialists.

The three datasets are from the following sources:

  • novices/language learners – 43 student presentations in an English for Academic Purposes (EAP) course held at National Cheng Kung University, Taiwan. Students are a mixture of Taiwanese and international students, with most students from the field of engineering. Presentations ranged between 2-5 minutes each, with an average of 360 words.
  • ELFA corpus – 30 conference presentations from the Corpus of English as a Lingua Franca in Academic Settings (ELFA). These academic experts consist of 49 presenters (mostly between the ages of 31-50) with 20 different first-language backgrounds (and no native English speakers). Each presentation on average lasts 21 minutes with 2568 words.
  • John Swales Conference Corpus – 23 conference presentations from the JSCC, recorded at a conference in Michigan celebrating Swales’ retirement. The 28 presenters are all academic experts and English-language specialists from 13 different first-language backgrounds, including an unknown number of English native speakers. The presentations average 3007 words, and only monologues are included to match the ELFA data.

Two possible problems should be pointed out here. First, the authors’ claim that JSCC constitutes an “ELF corpus” is debatable. It’s clearly from an international conference, but without specific information on the language backgrounds of the presenters or participants, it’s hard to argue that a corpus recorded at an English studies conference in the US does not primarily occur in an English as a native language (ENL) setting. In any case, this data is distinctive for representing English language specialists, regardless of the scholars’ first languages.

The more serious problem is the disparity between the length of the student presentations and the expert presentations. While the ELFA and JSCC data consist of typical 20-30 minute conference presentations, the 2-5 minute student presentations make some comparisons impossible. In particular, any comparison of the number of unique lexical or organisational items (i.e. lexical types) between data of such different lengths will inevitably be skewed – there’s far fewer opportunities to use a broad range of lexical items in a 5-minute talk, which also will inevitably be organised differently than a conference talk. For this reason, I only discuss here particular findings related to lexical richness.

Lexical Richness Profiles: automated lexical analysis

The research question I consider here concerns the lexical richness features that distinguish expert from novice presentations. As described by Kao & Wang (2014), lexical richness is divided into three different measurements – lexical variation (the type-token ratio), lexical density (proportion of content words), and lexical sophistication (proportion of high-frequency vs. advanced tokens). Table 1 from Kao & Wang (2014: 60) outlines these indices:

The three indicators used to measure lexical richness.Source: Kao & Wang (2014: 60)

The three indicators used to measure lexical richness. Note that K1 is the first 1,000 most frequent words of English; K2 is the next 1,000 most frequent words of English; and AWL is the Academic Word List.
Source: Kao & Wang (2014: 60)

To obtain these figures, the authors first pruned the data to remove fillers, repetitions, repairs and fragments. Then, the transcriptions were fed into an online tool, VocabProfile (2013,, which is based on Laufer and Nation’s Lexical Frequency Profiler. The VocabProfile tool is part of the Compleat Lexical Tutor website maintained by Université du Québec à Montréal. They offer numerous analytical tools that you can experiment with freely online.

Learners & users: evidence from lexical profiles

The first finding coming out of these measurements is a broad similarity between the datasets. The initial measurement of lexical variation (aka type-token ratio) is the total number of unique lexical items (types) divided by the total number of words (tokens). The two expert corpora yielded identical findings, with both the ELFA and JSCC conference presentations having a lexical variation ratio of 0.25. The novice presentations had a higher lexical variation ratio of 0.42, but as noted by the authors, this difference is attributed to the much shorter lengths of the student presentations. Turning to lexical density, all three groups showed similar results. For this figure, the number of content words (excluding grammatical or function words) is divided by the total words. The lexical density score was .46 for the students, .48 for ELFA presenters, and .50 for JSCC presenters.

The interesting differences between the datasets show up in the measurements of lexical sophistication. This involves categorising each word into 1) the 1,000 most frequent words in English (K1); 2) the next 1,000 most frequent words in English (K2); 3) the Academic Word List, containing 570 headwords and roughly 3000 words in total, all of which are less frequent than the K1-K2 vocabulary; and 4) all other “off-list” words. The results of this categorisation are shown in Figure 2 of Kao & Wang (2014: 65):

Coverage of four lexical levels over the three sets of data. "Novices" (red) are the student presenters, "researchers" (green) are from the ELFA corpus, and "linguists" (blue) are from the JSCC.Source: Kao & Wang (2014: 65). Color added by the blogger via the magic of MS Paint.

Coverage of four lexical levels over the three sets of data. “Novices” (red) are the student presenters, “researchers” (green) are from the ELFA corpus, and “linguists” (blue) are from the JSCC.
Source: Kao & Wang (2014: 65). Color added by the blogger via the magic of MS Paint.

As can be seen, the K1 and K2 vocabulary are distributed similarly between the three sets of data, together covering 90% of tokens in the average novice presentation, 85% in the ELFA corpus data, and 84% in JSCC. The biggest difference is found with items on the Academic Word List (AWL). The two groups of experts show a similar use of AWL items, which cover 7.25% of tokens in the average ELFA presentation and 7.87% of tokens in JSCC. Here the difference in the student presentations is clear, where AWL items cover only 2.53% of tokens for the average novice presenter. The proportion of off-list words is again similar for all three groups, though the authors argue that qualitative analysis shows the experts use “more advanced and specific words to describe events, actions, and concepts” (Kao & Wang 2014: 65).

This distinction made in the sophistication of off-list words, along with the “massive use of AWL” (ibid.), are claimed to be the differentiating factors between the expert presenters and the student novices. It would be nice to test these figures for statistical significance as well, but the figures reported in the paper (and above) are not cumulative for each dataset, but are already averaged based on number of presentations. I’m no statistician, but my understanding of the few tests I’m familiar with is that raw figures are needed, before averaging or regularisation based on database size is carried out, and these figures aren’t available in the paper.

No one is born an expert

Though not discussed here, the authors go on to compare organisational vocabulary found in the three datasets. In the end, they sum up their observations as follows:

The most striking finding of the study is the very similar lexical and organizational patterns used by the two expert groups in their presentations, despite their characteristic differences in native languages, cultures, and academic disciplines.

(Kao & Wang 2014: 70)

Thus, the primary distinction was not between the English-language specialists and the non-native English speakers in the ELFA corpus and student data. Instead, the lexical profiles of the expert ELF users in the ELFA corpus closely matched those of the English-language specialists at a US linguistics conference. I would have expected to find some differences between people who study English for a living and those who use English as an academic lingua franca for field-specific communication, but they’re not born out in the lexical profiles. Instead, it would appear that when we’re looking at ELF use in professional domains like academia, the expert-novice distinction is more relevant than that of native or non-native speaker. No one acquires expertise – or the ability to express it – by virtue of birth or first language alone.


Kao, Shin-Mei & Wang, Wen-Chun (2014) Lexical and organizational features in novice and experienced ELF presentations. Journal of English as a Lingua Franca, 3 (1), 49-79. DOI: 10.1515/jelf-2014-0003.


2 thoughts on “Language users or learners? Lexical evidence from spoken ELF

  1. eflnotes says:

    hi Ray

    i wonder how much different (or not) results would have been if a spoken academic word list was used? e.g. say from Michigan Corpus of Academic Spoken English (MICASE) and/or British Academic Spoken English (BASE)

    do they discuss this in paper?


    • Ray Carey says:

      That’s a good point, and I don’t think they mentioned (or I didn’t notice) the fact that all their word lists are based on written academic English. There are some word lists on the BASE website, but I don’t know if something like AWL has been developed for spoken English… if not, then why not?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: