On the other side: variations in organising chunks in ELF

Variations in organising chunks aren't that common, but they do tend to stand out.

Variations in organising chunks aren’t that common, but they do tend to stand out.
Source: Livio Bourbon via The Telegraph

When working with ELF data – English used as a lingua franca between second/foreign-language speakers – one of the things that stands out are slight variations in conventional chunks of language. A formulaic chunk like as a matter of fact might be realised as as the matter of fact, or you could hear now that you mention it spoken as now that you say it. There’s no sense in calling them errors, since the variants won’t cause miscommunication, they resemble their conventional counterparts in both function and form, and the less-preferred variant is likely found elsewhere. It’s just not the English native-speaker preference.

These variations are interesting linguistically and they tend to stand out impressionistically for researchers, but I’ve wondered how often these variations actually occur in ELF – both in frequency and also in their distribution relative to conventional forms. It’s not an easy question to answer. Many of these formulaic chunks of language occur infrequently, so finding a couple variants doesn’t really tell you much. The example above of now that you say it occurs twice in the million-word ELFA corpus, with just one instance of the conventional form. Alternatively, as the matter of fact is found in ELFA 21 times compared to just eight occurrences of the expected chunk, but only two speakers account for those 21 instances.

We can see from these examples that a formulaic chunk that rarely shows up won’t reveal much about how often variation occurs among ELF users, across speech events, in different times and places. To find out more, I wanted to start with the highest frequency chunks I could find. These are described by Linear Unit Grammar as organising chunks, the recurring and relatively fixed chunks we use to structure our speech and writing, like on the other hand. Using the corpus freeware AntConc, I looked at the most frequent 3-, 4- and 5-word clusters (aka n-grams) in the ELFA corpus of spoken academic ELF.

From these lists, I selected the highest frequency, stand-alone organising chunks I could find that might have “movable parts” inviting variations:

  • on the other hand (n=155), together with its optional predecessor, on (the) one hand (n=34)
  • at the same time (n=133)
  • so to speak (n=20), together with its less conventional counterpart, so to say (n=39)
  • from my point of view (n=19) and the related in my view (n=5), as discussed in Mauranen 2009

Using these as my base, I used AntConc to manually search for any possible variations on these conventional chunks. By treating both conventional and variant forms as different surface realisations of the same organising chunks, I could calculate the rate of occurrence of conventional and approximate forms. But I also went a step further. One might assume that these approximations are just the product of the pressures of producing speech in real time. To see if this is true, I searched for the same chunks in the first 300,000 words of the in-progress WrELFA corpus of written academic ELF. My findings are reported in September’s issue of the Journal of English as a Lingua Franca.

Low frequency, high variation

I started the study with the least frequent organising chunk on the list – the 5-gram from my point of view. This particular chunk was first analysed in ELFA corpus data by Mauranen (2009), who shows how this conventional chunk can be seen as blending with the similarly functioning in my view organiser, as seen in approximate forms such as in my point of view. Looking further into the data, I found this in my view to be a productive frame for other one-off chunks expressing stance toward the main message. Interestingly, none of the approximate forms found in the spoken ELFA corpus were found in written WrELFA. Instead, the written and spoken corpora displayed their own unique forms:

WrELFA corpus
• in my view point
• to my view
• in my eyes
• in my feeling
• in my own private delusion
ELFA corpus
• in my point of view
• on my point of view
• in my sense
• in my sense of judgment
• in my belief
• in my thoughts

In these cases, we can see how a low-frequency chunk like from my point of view or in my view can be easily adapted to form unique chunks that serve the same interactive function. However, these unique forms are still infrequent, with a normalised frequency of only 1.5 occurrences per 100,000 words in the ELFA corpus. Yet, if the unique forms are treated as variants of the conventional from my point of view and in my view chunks, the variants make up 40% of all these forms. This example shows a commonly noticed class of variation in ELF – low frequency chunks of language that show a high rate of unique variations.

It stands to reason that English native speakers might likewise employ similar variations in low-frequency chunks of language. However, none of the unique forms listed above were found in a comparable L1 English corpus, the Michigan Corpus of Academic Spoken English (MICASE). In fact, the conventional chunks only occur eight times per 100k words in MICASE, compared to 24 times per 100k words in ELFA. So, this example shows ELF speakers using a pair of formulaic chunks three times more often than English native speakers in similar academic settings, and with a greater likelihood of extending them into unique variants. Should we also expect to find such variants becoming ELF-specific preferences?

When low frequency meets high: speak vs. say

A similarly low-frequency organising chunk is so to speak, which occurs 20 times in the ELFA corpus. When searching for variant forms, I only came across one – so to say, which I verified in context to be serving the same organising function as a single item. However, this less conventional choice was found 39 times in ELFA, and if both chunks are treated as forms of the same organiser, so to say makes up 66% of forms. Could this be a candidate for an emerging ELF-specific preference?

The frequency of the organising chunk so to speak (blue line) is compared with so to say (red line). Click on the image to view the Google n-gram.

The frequency of the organising chunk so to speak (blue line) is compared with so to say (red line). Click on the image to view the Google n-gram.

Turning again to the MICASE corpus, I also found 20 instances of so to speak, but only five instances of so to say that were verified to function as a single item. Moreover, these five instances were spoken by four non-native English speakers and one native English speaker. This Google n-gram chart shows the general distribution over time: so to say is indeed attested in English, but so to speak is relatively much more common. The same corpora will also tell you that say in itself is far more frequent than speak, so the predominance in ELFA of so to say might seem to be a case of “low meets high” – a low-frequency chunk appears to be moving toward a variant with a much higher frequency component, say.

Interestingly, this so to say chunk is attested twice in the written WrELFA data as well. But I extended my search to the similarly sized VOICE corpus (Vienna-Oxford International Corpus of English), which includes spoken ELF interactions from a variety of domains, including academic. Here the results were startling: the distributions between the two spoken ELF corpora are practically identical in both frequency and percentage of forms (see Table 1). The conventional so to speak occurs 1.9 times per 100k words in each corpus, with so to say occurring 3.3 times per 100k words in VOICE and 3.8 times per 100k words in ELFA.

Table 1. Distributions of so to speak and so to say in the ELFA and VOICE corpora.
Source: Carey 2013: 217

Table 1. Distributions of so to speak and so to say in the ELFA and VOICE corpora.
Source: Carey 2013: 217

Considering this evidence from two large ELF corpora compiled at similar times in different places – ELFA in Finland and VOICE mainly in Vienna – we might guess that the so to say preference could be taking hold, at least in Europe. But since we’re dealing with a low-frequency chunk in itself, another set of data could show quite different results. It will ultimately be the corpora of future generations that will show whether we have guessed right or wrong.

High frequency, low variation

The examples so far have dealt with variation observed among low-frequency organising chunks. Now we turn to a group of much higher frequency chunks that organise text. The first of these is at the same time, which occurs 133 times in the ELFA corpus. When searching for alternate forms, all I could find were variable uses of prepositions and articles. The most common variation was in the same time, used seven times by five different speakers in five events. Apart from this, I only found a handful of shortened versions that omit a minor element from the chunk:

  • and the same time i think er <COUGH> perpetuating a number of quite standard er european academic (talks)
  • and the same time some sides of your research academic libraries were mhm the same
  • but at same time it transgresses the boundaries between the private and public spheres

In the last example, it’s interesting to note that in the very next sequence the same speaker uses at the same time, suggesting that these fixed chunks can be stored as a whole unit but still be uttered with minor variations that don’t change the function of the chunk. Overall, I only found 12 instances of variation of this high-frequency chunk in ELFA. So while it’s easy to find examples of variable pronoun and article usage in ELF, this tendency seems to be diminished in the context of high-frequency, relatively fixed chunks. The distribution of conventional and approximate forms of this organising chunk were similar in the spoken and written ELF corpora:

Table 2. The conventional chunk at the same time and its total approximations found in the ELFA and WrELFA corpora.
Source: Carey 2013: 219

Table 2. The conventional chunk at the same time and its total approximations found in the ELFA and WrELFA corpora.
Source: Carey 2013: 219

“On the other side”

As with at the same time, the organising chunk on the other hand is high-frequency (158 occurrences in the ELFA corpus) with a low rate of occurrence of approximate forms. In this chunk, however, there’s a tendency for approximate forms to feature a lexical replacement: side for hand. This on the other side variant is uttered 11 times by seven different speakers in ELFA, with further one-off variants of from the other side and from other side. Likewise, the only three approximate forms I could find in the written WrELFA texts were on the other side, attested by three different authors.

As further evidence that these approximate forms are functioning in the same way as the conventional on the other hand, they are often accompanied by similar approximations of the optional preceding chunk on (the) one handon (the) one side, from one side, or in one side. While these patterns are interesting, it’s hard to see them as an emerging trend in ELF. When taken together with the conventional on the other hand, the approximations only make up 12% of forms in the ELFA corpus:

Table 3. Conventional and approximate forms of on the other hand in ELFA and WrELFA.
Source: Carey 2013: 223

Table 3. Conventional and approximate forms of on the other hand in ELFA and WrELFA.
Source: Carey 2013: 223

Again we see a high-frequency organising chunk that is overwhelmingly attested in its fixed, conventional form. Turning again to the North American MICASE corpus, on the other side can only be found three times with this organising function, uttered by three speakers with L1s (first languages) of English, Dutch, and German. It was pointed out to me at last June’s Changing English conference here in Helsinki that on the other side is a direct translation from German. He wondered if this would account for these occurrences in the ELF corpora. I didn’t have a definite answer then, but I looked into it and this is what I found.

The on the other side chunk is used by speakers in ELFA with L1s of German, French, Romanian, Spanish, Portuguese, and Swedish. Turning to WrELFA, on the other side is used by three authors with L1s of German, Italian, and Czech. With a bit of work in Google Translate, I found that most of these Indo-European languages indeed have adverbial equivalents of on the other hand in which the hand element can be translated as side. Does this mean that on the other side could emerge as a preference in Europe? Considering the high frequency of on the other hand, along with the low rate of variation in this European ELF data, I’ll place my bet with on the other hand to remain the dominant form of this chunk.

Wrapping up: spoken vs. written ELF

When searching for approximate forms of these organising chunks, I was curious to see if variations would appear in the written ELF data at all. Variations did appear, but not with particularly dramatic frequencies, so I concluded my study by adding all the occurrences of conventional and approximate chunks from this study in the spoken ELFA and written WrELFA data. What emerged was a surprisingly similar picture:

Table 4. An overview of all occurrences of the conventional chunks and their approximate forms included in this study.
Source: Carey 2013: 223

Table 4. An overview of all occurrences of the conventional chunks and their approximate forms included in this study.
Source: Carey 2013: 223

The conventional chunks occurred in both academic ELF corpora about 36 times per 100k words. The approximated chunks were found slightly more often in the spoken data, where there was also a higher rate of approximation (21% of forms). However, I found no statistically significant difference between the spoken and written data, suggesting that the variations found in ELF cannot be dismissed as mere aberrations of speech. Yet, it should be kept in mind that this study is based on the first 300,000 words of WrELFA, and we’ve more than doubled the size since I carried out this research. Once we reach a million words, I’ll verify these findings on the full written dataset and see how these figures hold up. In the meantime, it appears that the gap between written and spoken academic ELF is not as great as one might guess.



Carey, Ray (2013) On the other side: formulaic organizing chunks in spoken and written academic ELF. Journal of English as a Lingua Franca, 2 (2), 207-228. DOI: 10.1515/jelf-2013-0013.

ELFA (2008) The corpus of English as a lingua franca in academic settings. Director: Anna Mauranen. http://www.helsinki.fi/elfa/elfacorpus.

Mauranen, Anna (2009) Chunking in ELF: Expressions for managing interaction. Intercultural Pragmatics, 6 (2). DOI: 10.1515/IPRG.2009.012.

Simpson, Rita C., Sarah L. Briggs, Janine Ovens & John M. Swales (2002) The Michigan corpus of academic spoken English. Ann Arbor, MI: The Regents of the University of Michigan. http://quod.lib.umich.edu/m/micase/.

VOICE (2013) The Vienna-Oxford international corpus of English (version 2.0 online). Director: Barbara Seidlhofer. http://voice.univie.ac.at.

WrELFA (2013) The corpus of written English as a lingua franca in academic settings. Director: Anna Mauranen. http://www.helsinki.fi/elfa/wrelfa.

3 thoughts on “On the other side: variations in organising chunks in ELF

  1. eflnotes says:

    another very rich corpus post, thanks!

    i was inspired to do a quick look at the TED corpus (which can be found here https://wit3.fbk.eu/)

    there are 4 instances of so to say (total corpus size approx 2.4million)
    compared to 16 instances of so to speak. those 4 – Swedish, Indian, Swiss, South African.

    also found a use of in the same time by an American http://www.ted.com/talks/bonnie_bassler_on_how_bacteria_communicate.html, the video shows that she had a slight hesitation after saying it but carried on without “correcting ” it.

    by the way what was the performance of AntConc like when doing 3, 4 and 5 ngrams? it took between 30 to 40 mins ( though could be longer as i did not actually time it but based on how long I was away from computer!) on the 2.4 mill TED corpus running on a 2.8Ghz Intel Core Duo Apple mac


    • Ray Carey says:

      Thanks Mura for your interest! And thanks especially for mentioning the TED corpus, which I wasn’t aware of. It could be an interesting reference corpus for academic presentations. Too bad the XML doesn’t encode the speakers’ first languages, but I expect it has potential as an ELF corpus in its own right. How did you get the L1s for the “so to say”ers? just Googling names?

      The ‘so to say’ case is funny. As a reformed Californian, it sounds very marked. But it’s out there and seeing examples of it in use makes it seem much more natural. I’m not surprised that a native speaker might say it spontaneously and wonder if it’s “right”. Interesting that it otherwise turns up in TED among L2 users.

      As for AntConc performance on n-grams, I don’t remember, but I don’t think it took that long. After looking at the XML, I wonder if all those subtitle encodings in [square brackets] might trip up the algorithm.


      • eflnotes says:

        hi ray, yes that’s right used info from corpus to google people;

        i used the txt version not the xml, i think maybe performance could also be due to some incompatibilities with latest mac osx and last version of antconc, eagerly awaiting new version, was mentioned it may be out in december some time

        the TED corpus is very interesting e.g. found a lot of this construction – What I’m going to do is, I’m going to; something i would not tell students to say but said by these top presentational speakers!


