Tag Archives: VOICE corpus

Needles in a haystack: questioning the “fluidity” of ELF

As I’ve earlier argued on this blog, sometimes the claims of “fluidity”, “diversity”, and “innovation” found in English as a lingua franca (ELF) research are overstated. It’s so diverse that even ordinary diversity won’t do – it’s “super-diversity” now. It could very well be ultra-mega-diversity-squared, but the question of the prominence of these presumably innovative features is a quantitative one. More specifically, it’s a question of how frequently any variant forms might occur in naturally occurring ELF interaction, relative to the conventional forms. One of my shameless nerd hobbies is writing little Python programs to query corpora, and several of these mini-studies have appeared on this blog. I especially enjoy working with the VOICE corpus, which is great because 1) it contains a million words of unelicited ELF interaction; 2) it’s ready for processing as well-formed XML; and 3) it has been meticulously part-of-speech (POS) tagged for both the form and function of each word in the corpus.

The value of this double form-function tag is that it reveals every token in the corpus where a word like fluently, which is formally recognisable as an adverb, functions in a different way, like as an adjective: i think you are very fluently in english. This example of fluently from VOICE has a form tag of RB (adverb), but a function tag of JJ (adjective) to reflect that fluently seems to be serving in an adjectival function. This kind of form-function variation in ELF is presumably prominent enough that it necessitates this double tagging to adequately describe the fluidity. The VOICE team was kind enough to carry out this formidable task involving manual inspection of all million words. Now that this resource is in place (and freely available), the instances of these form-function mismatches can be easily found, counted, and viewed in context.

I’ve wondered for some time how often these variant form-function tokens occur overall, in relation to their conventional forms. My interest was renewed by the recent paper by VOICE project researcher Ruth Osimk-Teasdale in the Journal of English as a Lingua Franca. One of the main workers on the VOICE POS-tagging project, she investigates word class shifts in VOICE. She narrows her data to double form-function tags that reflect a shift of category across word classes (like from adverb to adjective). These inter-categorical word class shifts therefore exclude variations within a word class, like singular nouns which are treated as plural. She focuses on items like fluently above, where word class conversion occurs without any change to the form of the word itself.

Assigning these form-function tags – and the analysis of them – are directly linked to the fluidity of ELF: Keep reading…


On the other side: variations in organising chunks in ELF

Variations in organising chunks aren't that common, but they do tend to stand out.Source: Livio Bourbon via The Telegraph

Variations in organising chunks aren’t that common, but they do tend to stand out.
Source: Livio Bourbon via The Telegraph

When working with ELF data – English used as a lingua franca between second/foreign-language speakers – one of the things that stands out are slight variations in conventional chunks of language. A formulaic chunk like as a matter of fact might be realised as as the matter of fact, or you could hear now that you mention it spoken as now that you say it. There’s no sense in calling them errors, since the variants won’t cause miscommunication, they resemble their conventional counterparts in both function and form, and the less-preferred variant is likely found elsewhere. It’s just not the English native-speaker preference.

These variations are interesting linguistically and they tend to stand out impressionistically for researchers, but I’ve wondered how often these variations actually occur in ELF – both in frequency and also in their distribution relative to conventional forms. It’s not an easy question to answer. Many of these formulaic chunks of language occur infrequently, so finding a couple variants doesn’t really tell you much. The example above of now that you say it occurs twice in the million-word ELFA corpus, with just one instance of the conventional form. Alternatively, as the matter of fact is found in ELFA 21 times compared to just eight occurrences of the expected chunk, but only two speakers account for those 21 instances.

We can see from these examples that a formulaic chunk that rarely shows up won’t reveal much about how often variation occurs among ELF users, across speech events, in different times and places. To find out more, I wanted to start with the highest frequency chunks I could find. These are described by Linear Unit Grammar as organising chunks, the recurring and relatively fixed chunks we use to structure our speech and writing, like on the other hand. Using the corpus freeware AntConc, I looked at the most frequent 3-, 4- and 5-word clusters (aka n-grams) in the ELFA corpus of spoken academic ELF. Keep reading…

Tagged , , ,

In search of wild diversity: a closer look at 3rd-person zero marking in ELF

The late Australian naturalist Steve Irwin (aka the Crocodile Hunter) had an infectious love of wild diversity. Source: Sydney Morning Herald

The late Australian naturalist Steve Irwin (aka the Crocodile Hunter) had an infectious love of wild diversity.
Source: Sydney Morning Herald

One of my most-read posts has been on the frequencies of 3rd-person singular present verb forms (he says, she says) in English spoken as a lingua franca (ELF). When looking at English used primarily between non-native speakers of English, is there a greater likelihood of finding the unmarked “zero” form of 3rd-person singular present – he say, she say? This so-called “dropping” of 3rd-person -s has been promoted as the emerging “default option” in ELF interaction, most notably by Martin Dewey (Dewey 2007; Cogo & Dewey 2006).

My previous post questioned the quality of Dewey’s data, much of which is elicited data from English classroom settings (i.e. not naturally occurring ELF interaction). In addition, his database of 60,000 words is far too small to warrant the sweeping generalisations he proposes, and I offered counterfindings from the better compiled, one-million-word VOICE corpus (Vienna-Oxford International Corpus of English). The VOICE team recently released a part-of-speech (POS) tagged version of the corpus with double POS-tags showing each word’s function and form, allowing quick calculations of 3rd-person -s vs. zero distributions.

While Dewey reports that 52% of the 3rd-person singular present verbs in his data appear without the -s morpheme (these are for main verbs, not auxiliaries), there is no support for this “emerging default option” in VOICE. After excluding all the forms of high-frequency be and have, the 5335 remaining verbs functioning as 3rd-person present singular verbs (tagged fVVZ) include only 310 cases of 3rd-person zero – just under 6% of the total. How could this be so different from Dewey’s findings? His small, unrepresentative database is the likely cause, but there must be more to the story.

This post goes deeper into the findings on what I’ll now refer to as “3rd-person zero” in the VOICE corpus of naturally occurring ELF interactions – do specific individuals, speech events, or speakers of certain first languages produce the 3rd-person zero form more often than others?

In search of wild diversity

There are ELF researchers who seem to start their studies determined to hack their way through a wild linguistic jungle of unexplored diversity, like a Crocodile Hunter for linguists. It’s true that diversity is prominent in ELF talk and it’s more fun to study than homogeneity, but not finding wild diversity where it was expected is also a significant finding. So what else can we say about these 310 cases of 3rd-person zero found in VOICE? Keep reading…


In defense of good data: the question of third-person singular –s

There's a special place in heaven called "Midsummer in Finland". This is a recent sunset viewed at the Saimaa in eastern Finland.© Nina Valtavirta

There’s a special place in heaven called “Midsummer in Finland”. This is a recent sunset viewed at the Saimaa in eastern Finland.
© Nina Valtavirta

In the early days of ELF research, it was sometimes claimed that English used as a lingua franca (ELF) between its second language speakers might be a separate and unique variety of English. No one seems to want to defend this claim any longer, and more emphasis is placed on the inherent complexity and fluidity of these lingua franca encounters. Yet, despite this distance from explicit claims of variety status, there is still the tendency for ELF researchers to treat ELF as a bounded object.

This is the argument developed by Janus Mortensen in the latest issue of the Journal of English as a Lingua Franca. He discusses a tendency in ELF research to treat ELF as a language system alongside English as a native language (ENL), in effect reifying ELF or treating it as a bounded object. As a result of this reification, ELF is “turned into a bounded object that can be delimited and characterized in terms of specific properties”, including properties of a formal linguistic nature (Mortensen 2013: 30).

One such linguistic property that Mortensen discusses is the marking of 3rd-person singular verbs in present simple tense: she studies in the university. This so-called 3rd-person singular –s morpheme is an anomaly of the English verb system (I study, you study, we study, and they study, but she studies), and some varieties of English regularise this feature: she study in the university. This “dropping” of the 3rd-person –s (also referred to as 3rd-person zero) has been proposed as a prominent feature of ELF talk since the early 2000s, and it is precisely this notion of a broadly claimed “ELF variant” that Mortensen objects to.

“Emerging as the default option”

As recently as 2012, ELF researchers Alessia Cogo and Martin Dewey have made the claim that “at least in certain types of ELF settings, 3rd person zero appears to be emerging as the default option in informal naturally occurring communications” (Cogo & Dewey 2012: 49). Keep reading…