Fluent chunks 2: How to label your chunks

Photo by Alan Chia via Wikimedia Commons

Photo by Alan Chia via Wikimedia Commons

Most people recognise that we don’t speak in “sentences”. Still, speech is analysed and described using the concepts of sentence grammars, even when these writing-based systems must be bent and stretched, or vice versa – isn’t it cheating to “clean up” naturally occurring speech so it fits into a sentence grammar?

In a previous post I introduced Linear Unit Grammar, or LUG, a chunk-based approach to analysing spoken and written text. In that post I introduced the linear, word-by-word process of chunking up a string of transcribed speech by placing intuitively directed chunk boundaries. The discussion focused on this short extract from an academic conference in the ELFA corpus. When asked about her experience with students in Brazil, the speaker responded:

er i c- i i so i i went to portugal er i live in portugal er for 13 years so i er my experience with brazilian students is is a long way @@ okay a long time ago (note: @@ = laughter and er is like uh in the US style)

How do you divide this into a well-formed constituency tree? The short answer is you don’t, and neither do speakers in actual interaction. LUG analysis attempts to mirror the real-time, linear processing of language as multi-word chunks, regardless of “grammaticality”. The first step in LUG analysis is placing chunk boundaries, which was discussed in detail in the previous post, with the transcribed extract above as an example. This is where we ended up:

er | i c- | i | i | so | i | i went to portugal | er | i live in portugal | er | for 13 years | so | i | er | my experience | with brazilian students | is | is a long way | @@ | okay | a long time ago

The convention established by Sinclair & Mauranen (2006) is to next move each chunk to its own line, one chunk to a line, like this:

er
i c-
i
i
so
i
i went to portugal
er
i live in portugal
er
for 13 years

so
i
er
my experience
with brazilian students
is
is a long way
@@
okay
a long time ago

And then we can start to label our chunks based on their orientation. Sinclair & Mauranen have proposed two broad chunk categories – those which are oriented to the message (M), and those which are oriented to organising (O) the text. These M chunks serve to increment the shared meaning (i.e. providing propositional content), while the O chunks structure the text and especially the interaction. Even interactive features like laughter and filled pauses (er or uh) are treated as organising chunks.

Consider the chunks from our example utterance after this first round of analysis of Message (M) and Organising (O) chunks:

Oer
Mi c-
Mi
Mi
Oso
Mi
Mi went to portugal
Oer
Mi live in portugal
Oer
Mfor 13 years

Oso
Mi
Oer
Mmy experience
Mwith brazilian students
Mis
Mis a long way
O@@
Ookay
Ma long time ago

The message-oriented M chunks are those that provide – or appear to be starting – some content that contributes new information. The organising O chunks, however, tend to be relatively fixed chunks that are used and re-used (i.e. are high-frequency) to relate the M chunks to each other. These O chunks are often multi-word units like on the other hand or in my view, but even single-word organisers like and and but are treated as O chunks in their own right.

In our extract here, there isn’t a big variety of O chunks. The only textual organiser (OT) present is so, an all-purpose transitional marker. The other O chunks are all oriented to organising the interaction (OI) – several filled pauses (er), an instance of laughter (@@) and okay. There is no further subdivision of O chunks proposed in LUG; they’re either OT or OI, like this:

OIer
i c-
i
i
OTso
i
i went to portugal
OIer
i live in portugal
OIer
for 13 years

OTso
i
OIer
my experience
with brazilian students
is
is a long way
OI@@
OIokay
a long time ago

With the organising chunks taken care of, the next step is analysing the meaning-oriented M chunks. Taking the initial chunks in our first column, they consist of a series of repetitions and re-starts as the speaker formulates her response. These fragmentary openings seem to be the start of an unfinished M chunk, but it’s hard to say with such little prospection, or forward-looking anticipation of what likely follows. In other words, there aren’t enough clues to guess where the idea might be going.

These types of M chunks are labeled MF for Message Fragment (you can hover over the chunk labels below to see their full names). The final chunk, i went to portugal, can be treated as a whole M chunk in its own right (i.e. a complete incrementation of meaning), as it is interpretably complete and does not prospect or anticipate some further completive element. These complete M chunks are just labeled as M, for a stand-alone Message chunk.

OIer
MFi c-
MFi
MFi
OTso
MFi
i went to portugal

The next message-oriented chunk i live in portugal can be interpreted in the same way. The difference here is that, after a filled pause, an additional chunk (for 13 years) adds extra information to the just-preceding M chunk. This extra chunk doesn’t create any new prospection of its own; it is a Message Supplement (MS) chunk, as shown below:

OIer
i live in portugal
OIer
MSfor 13 years

These supplementary MS chunks are common, but they’re not always in this “trailing” position, linearly speaking. Sometimes MS chunks appear before prospection has been completed. In these cases, the supplementary chunk suspends the prospection that has already been established. Consider the next few chunks in our extract:

OTso
MFi
OIer
M-my experience
MSwith brazilian students

I’ll stop here in mid-prospection to focus on this analytical judgment. It’s been a topic of discussion in some talks I’ve given using this extract. Some would argue that my experience with brazilian students should be a single chunk. In practice, there’s no reason why both judgments couldn’t be equally valid interpretations. LUG is based on the assumption that different analysts/language users will chunk a text in different ways. However, for the sake of developing a corpus of LUG-annotated ELF interactions, I try to systematise recurring judgments such as these as much as possible, for the sake of consistency in corpus compilation.

Brazil_cover

So, taking this judgment as a compilation principle, we find a new chunk label, the M- (‘M dash’). Unlike the fragmentary MF chunks we’ve seen, my experience more strongly prospects some completive element. Moreover, the following MS chunk with brazilian students adds more information about this experience without providing any completion itself. So, M- is used to designate a substantial M chunk that is awaiting a completive chunk, the +M (‘plus M’). The MS that follows can thus be seen as a suspension, in the terms of David Brazil (1995), since it suspends the prospection already established by my experience.

Here’s how it looks with the completive +M in place:

M-my experience
MSwith brazilian students
MFis
+Mis a long way

I would argue that with the preceding context, this utterance is interpretably complete. The less generous reader might protest that it doesn’t make sense. Though there was no outcry from her audience, the speaker also seemed to recognise that a clarification was in order. With a laugh (@@) and okay as interactive padding, she offers a long time ago as a rephrasing of the preceding +M. These common chunks that revise a just-preceding chunk are labeled as Message Revision (MR), and they encompass word-for-word repetitions as well. Like the supplementary MS chunks, Message Revision chunks are not prospective (forward-looking), but retrospective insofar as they refer back to a just-preceding chunk.

Apart from a couple other types of M chunk, that’s pretty much the whole LUG system at work. Here’s the entire extract with full LUG annotation (and remember you can hover over chunk labels for full text):

OIer
MFi c-
MFi
MFi
OTso
MFi
i went to portugal
OIer
i live in portugal
OIer
MSfor 13 years

OTso
MFi
OIer
M-my experience
MSwith brazilian students
MFis
+Mis a long way
OI@@
OIokay
MRa long time ago

It seems like a lot of work for a few seconds of spoken text! In a way it is, but you quickly find a rhythm once you get familiar with LUG. In fact, Prof. Mauranen has confirmed my experience of spontaneously performing real-time LUG analysis while listening to others speak (incidentally, this is also a sign of the need for fresh air). Once you realise how robust LUG is for handling all sorts of utterances, you appreciate the deceptive simplicity of the model.

scl_25_pb

Working with LUG also reveals repeated patterns in speech/text, and one’s analytical judgments start to get systematised. Sinclair and Mauranen (2006: 158-162) already anticipated the potential for automating LUG analysis computationally. My PhD project takes up this programming challenge, but only as a preliminary step in corpus compilation. The real interest is finding the broadly distributed patterns of fluency and dysfluency in spoken ELF, first at the broad, corpus level – what ordinary (dys)fluency looks like – and then to compare the output of individual speakers, with native speakers of English also in the ELF mix.

The value of LUG is that the analysis goes deeper than mere counts of dysfluency features; instead dysfluencies are seen in their broader context of meaning-making, including how they function within and around those M chunks. The extract here only gives a hint of what I mean, but what’s the rush? I’ll be back with some more examples this summer.

 

ResearchBlogging.org

Brazil, David (1995) A Grammar of Speech. Describing English Language series. Oxford: Oxford University Press.

Mauranen, Anna (2012) Linear Unit Grammar. The Encyclopedia of Applied Linguistics. DOI: 10.1002/9781405198431.wbeal0707.

Sinclair, John McH. and Anna Mauranen (2006) Linear Unit Grammar: Integrating speech and writing. Studies in Corpus Linguistics, vol. 25. Amsterdam: John Benjamins.

Advertisements

3 thoughts on “Fluent chunks 2: How to label your chunks

  1. James Thomas says:

    LUG is of considerable interest to me, but I’m more interested in its potential to work with written language, from both receptive and productive angles. The 2006 vol and your work are my only LUG sources to date and both are spoken language oriented.

    I’d be interested to know your thoughts on my idea (prob. not mine alone), that an academic text’s O language consists primarily of discourse markers, adjuncts plus the usual conjunctions, relative pronouns etc. The M language, on the other hand, consists of the clauses made up of the key lexical words of the topic. Sentence initial non-finite clauses are sometimes M, sometimes O, it would seem.

    There’s more, but I’d like to hear your thoughts before I go on.

    Thanks,

    James

    • Ray Carey says:

      Hi James, great to hear from you — Anna Mauranen mentioned your name to me some time ago as someone who had contacted her with your LUG interests. You’re right about the emphasis on spoken language, but I’m happy to say that the first PhD based on LUG was recently submitted by a student named Cameron Smart from Birmingham University. I haven’t yet seen it, but he will defend the thesis next month with Anna as his examiner. His work is based on written texts taken from the internet, so it will be interesting to see how LUG has been applied to a written corpus.

      Your characterisation of the O and M elements matches my own experience. Much of the products of LUG analysis can be described in these more conventional terms of grammatical analysis, and Sinclair & Mauranen were clear that LUG does not replace traditional grammatical analysis and will often overlap with it. So, if you’re describing these chunks as adjuncts, non-finite clauses, etc., then what’s the added value that’s brought by LUG analysis? When traditional grammars are already focused on sentence-based parsing, I wonder what LUG could additionally contribute to analysis of written text?

      Applying LUG to written text is something I’ve thought about, but my interests are indeed centered on describing spoken interaction. I’d be interested to hear more of your experiences and how you see LUG from the standpoint of written text.

      Best,
      Ray

  2. James Thomas says:

    The added value is the ‘subtractive’ value, or reductionist. The grammatical labels for O language may be of use to various taxonomists, but labeling them simply as words and phrases that are intuitively satisfying chunks that have Organisational roles in the text and occur in between the Message units is an intuitively satisfying lumping process. They can be split later if necessary.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: