One of my most-read posts has been on the frequencies of 3rd-person singular present verb forms (he says, she says) in English spoken as a lingua franca (ELF). When looking at English used primarily between non-native speakers of English, is there a greater likelihood of finding the unmarked “zero” form of 3rd-person singular present – he say, she say? This so-called “dropping” of 3rd-person -s has been promoted as the emerging “default option” in ELF interaction, most notably by Martin Dewey (Dewey 2007; Cogo & Dewey 2006).
My previous post questioned the quality of Dewey’s data, much of which is elicited data from English classroom settings (i.e. not naturally occurring ELF interaction). In addition, his database of 60,000 words is far too small to warrant the sweeping generalisations he proposes, and I offered counterfindings from the better compiled, one-million-word VOICE corpus (Vienna-Oxford International Corpus of English). The VOICE team recently released a part-of-speech (POS) tagged version of the corpus with double POS-tags showing each word’s function and form, allowing quick calculations of 3rd-person -s vs. zero distributions.
While Dewey reports that 52% of the 3rd-person singular present verbs in his data appear without the -s morpheme (these are for main verbs, not auxiliaries), there is no support for this “emerging default option” in VOICE. After excluding all the forms of high-frequency be and have, the 5335 remaining verbs functioning as 3rd-person present singular verbs (tagged fVVZ) include only 310 cases of 3rd-person zero – just under 6% of the total. How could this be so different from Dewey’s findings? His small, unrepresentative database is the likely cause, but there must be more to the story.
This post goes deeper into the findings on what I’ll now refer to as “3rd-person zero” in the VOICE corpus of naturally occurring ELF interactions – do specific individuals, speech events, or speakers of certain first languages produce the 3rd-person zero form more often than others?
In search of wild diversity
There are ELF researchers who seem to start their studies determined to hack their way through a wild linguistic jungle of unexplored diversity, like a Crocodile Hunter for linguists. It’s true that diversity is prominent in ELF talk and it’s more fun to study than homogeneity, but not finding wild diversity where it was expected is also a significant finding. So what else can we say about these 310 cases of 3rd-person zero found in VOICE? Clearly there’s nothing dominant in a rate of variation at 6% of 3rd-person forms. But is it widespread?1
The answer is yes – at least one 3rd-person zero verb is found in 102 out of 151 speech events (68% of events). But do they appear in certain types of domains more than others? The VOICE corpus consists of five domains: leisure, educational, and three professional domains of business, organisational, and research & science. 3rd-person zero forms are distributed pretty evenly among them, with Figure 1 below showing the rate of 3rd-person zero only ranging from 5% in the leisure domain to 8% in professional research & science. Widely distributed, but not especially prominent in any individual domain.
A similar picture emerges when the same findings are generated for the 10 different event types in the corpus (e.g. conversation, interview, meeting, working group discussion). The rate of variation for 3rd-person zero again only ranges between 4% and 8% between event types, with the corpus-wide figure falling in the middle at 6% of forms. There must be something wild around here somewhere…
Focus on the outlier events
If your corpus won’t yield wild diversity, then maybe your outlier events will – those single speech events with the highest rate of occurrence of whatever you’re hoping to find. Sorting the VOICE events by percentage of 3rd-person zero alone gets skewed results, since an event with a 50% occurrence of 3rd-person zero isn’t interesting if there’s only two 3rd-person (fVVZ) verbs in the event. So, I took a different approach and sorted the events by raw number of occurrences of the 3rd-person zero form in each event.
Among the 102 events in VOICE where 3rd-person zero occurs, it’s found only 1-5 times in 88 of these events. The remaining 14 events contain 6-15 instances of 3rd-person zero, the highest raw counts in the corpus. See Table 1 above for the breakdown of these 14 outlier events by rate of 3rd-person zero variation – the highest single rate of occurrence is 23% (6 out of 26 VVZ function verbs), with the lowest rate of occurrence at 5% (7 out of 139). Even the highest isn’t half of the rate of variation reported in Dewey’s exceptionally high findings.
Surely some wild variation must be found to at least partially vindicate Dewey’s anomalous figures. Let’s look at the real outlier events – the three events with the highest number of instances of 3rd-person zero, though they only include 13-15 instances in each. When these events are broken down by speaker, we find that each event has one or two speakers who exceed a 20% rate of the 3rd-person zero form. Consider the mega-outlier event, a working group discussion with 15 instances of 3rd-person zero, the highest in the corpus. Table 2 below shows the rate of occurrence by speaker; 4 of the speakers use 3rd-person zero at least once, but the other 5 speakers use only the conventional 3rd-person -s.
Focus on individual speakers
Judging by our data so far, 3rd-person zero is commonly found in speech events, but it’s not especially frequent, and nobody seems to use it all the time. Nor does everyone use it some of the time. But our outlier event in Table 2 shows one speaker using the 3rd-person zero form 7 out of 16 times that a 3rd-person present singular function was found. This is a Dewey-like rate of occurrence at 44% of forms, so maybe individual speakers demonstrate the “emerging trend” that he has claimed.
There are 182 speakers in the corpus who use 3rd-person zero at least once, and together they have an average rate of occurrence of 3rd-person zero at 27% of forms. But we have to remember that these 3rd-person zero users still deploy the 3rd-person -s morpheme 73% of the time. Moreover, these 182 speakers amount to a mere 14% of the 1260 speakers in VOICE.2 Even more depressingly, 615 speakers in VOICE are found using only the conventional 3rd-person -s form. So much for a “default option”.
Let us hope against hope and again have a look at the outlier cases. There are only 12 speakers in the entire million-word corpus who deploy the 3rd-person zero form at least 5 times (see Table 3). Only one of these speakers would make Dewey truly proud with 100% 3rd-person zero forms – 5 for 5. Altogether we find a total of four speakers whose rates of 3rd-person zero are at least 50% of forms, i.e. approaching Dewey’s rate of occurrence of 52% of forms. Although these 4 speakers are a tiny group of outliers in a corpus of 1260 speakers, they are carrying Dewey’s banner in what he describes as a “competition” between the 3rd-person -s and zero forms and in which “the 3rd person zero is the variant that is winning this competition” (Cogo & Dewey 2006: 77).
Focus on speakers of different first languages
We see from the outlier list of 12 individual speakers (Table 3 above) that a nice mix of first languages (L1s) are represented – 9 different L1s in total. This is further evidence that 3rd-person zero is a widespread feature in ELF interaction; it may not be frequent or found with high rates of occurrence, but it’s broadly spread around between speech events, domains, event types, and speakers. But can big differences be found between speakers of different L1s? Our list of outlier speakers suggests the answer is no. But this is the question everyone wants to know about, so I’ll finish with what I believe are the most striking findings of all.
We’ve learned from Dewey that we have to be careful about making representative claims based on small sets of data. The same rule applies when making claims of representativeness that extend to a whole group of L1 speakers. The VOICE corpus includes speakers of 49 different L1s, but many of these L1s are represented by just a few thousand words of speech, hardly enough to even make generalisations about a single speaker. In order to get somewhat meaningful findings, we have to look at just the L1s that are represented in some abundance.
As it turns out, the speakers of only 7 L1s are represented by at least 4% of total words in VOICE (see Table 4 below). While German is the highest at 25% of words, the L1s representing 4% of the corpus each yield at least 40,000 words of speech. We’ll treat that as our somewhat arbitrary cut-off point. Interestingly, L1 English speakers are one of the highest represented L1 groups in VOICE, with 7% of words in the corpus. So, taking these top 7 L1s for our data – German, Dutch, English, Danish, Italian, Polish, and Spanish – and calculating their rates of occurrence of 3rd-person zero, some Crocodile-Hunter-worthy findings emerge.
First of all, the 3rd-person zero distributions are all lower than the outlier cases we’ve looked at above. L1 speakers of Spanish and Italian show higher rates of 3rd-person zero occurrence at 11% and 13% respectively, but these figures are still quite low and I’m not inclined to make any sweeping proclamations based on 40,000 words of L1 data. The most striking figure comes from the three L1s most strongly represented in VOICE – German, Dutch, and English. Their rate of occurrence of 3rd-person zero is identical at 3% of verbs functioning as 3rd-person present singular. That’s right – even the all-glorious, “perfect English” speakers use 3rd-person zero as often as native speakers of German and Dutch.
In sum, these 7 L1s make up 60% of the tokens in VOICE, with an average rate of 3rd-person zero variation of 5%, just below the corpus-wide average. The remaining 40% of tokens are provided by the speakers of the 42 other L1s in the corpus, who raise the rate of 3rd-person zero variation to just 6%. Thus, it appears the 3rd-person zero form is widely and commonly encountered, though not in abundance – just often enough to get stuck in researchers’ intuitions that it must be a prominent feature in ELF.
1 To get these findings, I wrote a Python script that generates a comma-separated value (.csv) file containing all 5335 instances of verbs (excluding the common auxiliaries be and have) with a 3rd-person present singular function tag (fVVZ). By processing the corpus in XML format, I can associate each instance’s metadata of form (-s form or zero), speaker ID and first language, speech event, etc. I opened this .csv output in MS Excel as a spreadsheet and analysed the data with the Pivot Tables features. You can download or look at my Python code from this link, or download the .csv file here and explore the raw data yourself (right click the link, “Save target as…”, open in Excel, and enjoy).
2 This is a somewhat confusing figure. According to its online statistics, the VOICE corpus contains 753 unique individuals, but some of these people appear in more than one unique speech event. So, they clarify that there are 1260 individual speakers within the 151 speech events, but some of these speakers are the same person in different events. There’s no easy way to tease this out of the data (although it’s annotated in the XML) so I’m going with the 1260 speaker figure.
Cogo, A. & Dewey, M. (2006) Efficiency in ELF Communication: From Pragmatic Motives to Lexico-grammatical Innovation. Nordic Journal of English Studies, 5 (2), 59-93. Available online: http://ub016045.ub.gu.se/ojs/index.php/njes/article/view/65
Dewey, M. (2007) English as a Lingua Franca: an empirical study of innovation in lexis and grammar. Unpublished PhD thesis, King’s College London.
VOICE (2013) The Vienna-Oxford International Corpus of English (version POS XML 2.0). Director: Barbara Seidlhofer. Available online: http://www.univie.ac.at/voice/page/download_voice_xml