When my misguided and unconventional life crash-landed in the University of Helsinki’s English department, I had no idea I was in the midst of a world-class center of linguistic research. It took less than a year to figure it out, and before I had finished my bachelor’s degree, I was working as a research assistant. Now as a PhD student in the same department, I understand very well that I’m part of a proud tradition of innovators in English linguistics.
Earlier this year, I was awarded a three-year grant from the Finnish Cultural Foundation (Suomen Kulttuurirahasto) to pursue my research in Linear Unit Grammar (LUG) and English as a lingua franca (ELF). On August 1, the foundation dispensed the first installment of my grant from the Ossi Ihalainen trust, and I want to take a moment to honor the man who supports my work, some 20 years after his passing. Oddly enough, we have a few things in common.
Prof. Ossi Ihalainen of the University of Helsinki’s English department was a pioneer in the use of computers to research linguistic corpora. This was during the 1980s, when mainframe computers were needed to perform searches and the programming language of the day was still FORTRAN. He was also an expert in British dialectology, having conducted extensive fieldwork in the 1970s on the dialects of the Somerset region of southwest England. His crowning achievement was a hundred-page chapter on English dialects for the Cambridge History of the English Language.
Prof. Ihalainen died before his time at the age of 51. He fought leukemia for over a year before his death in 1993, and, in an act of academic heroism, he labored through his illness to complete the abovementioned chapter on English dialects. At the time of his death, his will established a trust to support linguistic research in Finland. And still in 2013, that’s exactly what he’s doing, and I do indeed thank him for that.
A digital pioneer
Prof. Ihalainen’s passion was the study of English dialects using computational methods. The field of corpus linguistics was in its infancy in the 1980s, and it was limited to the study of standard, written English. Ihalainen pioneered the extension of this technology to the study of non-standard dialects of English. I wanted to get a sense of the technical nature of his work, so I sought out his article from 1988 on his early work with the Helsinki Corpus of British English Dialects, one of the corpora still maintained by his colleagues in the University of Helsinki’s VARIENG research unit.
I was happy to find that Ihalainen was a programmer, and his article even contains bits of FORTRAN code that he used for querying his corpus. Like me, his programming was driven by linguistic curiosity, and he shows how computers can be used to explore the features of naturally occurring, non-standard English. Surprisingly, I can see from his paper that the programming principles behind his work are no less relevant today than they were in the 1980s. In fact, his corpus queries are still more sophisticated than could be grasped by today’s “average” corpus linguist.
Moreover, the basic challenge of computing natural speech remains the same. Like Ihalainen, I’ve spent many hours transcribing recordings of naturally occurring conversation, and I too can attest that
… entering data manually is both expensive and slow. Given the rate at which data processing technology is advancing, one can envisage a time when tape-recorded speech can be converted into writing automatically. However, this piece of futurology does not help us in our present situation.
(Ihalainen 1988: 582)
Nor does it help me today. Voice-recognition technology leaves much to be desired, especially for the wide world of accents encountered with English in its lingua franca use. Yet, there’s no question that computing technology has exploded in my lifetime. One challenge I haven’t shared with Ihalainen is that of data storage, as can be seen from the following description of his dialectal field recordings:
The tape-recordings have been transcribed orthographically and stored on diskettes and computer tape. We have at the moment some three hundred thousand words on tape and we are aiming at about half a million words. There is much more data available, but we do not have the means to store it all.
(Ihalainen 1988: 572)
Happily, data storage is no longer an issue, and the dialect corpus started by Ihalainen today stands at one million words with over 200 hours of recordings stored in digital form. At the same time, I’m sure that in 20 more years young scholars will laugh at our one-terabyte hard drives the way we laugh at 5¼-inch floppies. Nor have I forgotten that my first computer, purchased by my parents in 1991, contained a state-of-the-art 40-megabyte hard drive (see below).
The democratisation of computation
Not all linguists, of course, have at their disposal a mainframe computer, and even if they did, they might not have the time to learn all the intricacies of its use.
(Ihalainen 1988: 575)
Prof. Ihalainen was among the elite of early corpus linguists. Expensive hardware was needed to perform searches, even a mainframe computer. Today, a cheap laptop allows you to write programs that can query a million-word corpus in seconds. There has never been a better time to study language with a computer, and yet researchers today still don’t “have the time to learn all the intricacies of its use.” Concordance software like WordSmith keeps linguists chained to the graphical user interface. But even with basic programming skills, linguistic research is limited only by the imagination of the scholar.
In the same spirit as Prof. Ihalainen, I’m working to extend existing technologies to the study of language chunking and the description of today’s dominant form of spoken, non-standard English – English used as a lingua franca between its second-language speakers. Much of this is uncharted territory, and it’s not entirely clear if I’m up to the task. But with the support of Ihalainen’s colleagues who carry on his work, and with the support of Ossi himself, I have no doubt that I’ll enjoy the pleasure of a puzzle much greater than myself.
Helsinki Corpus of British English Dialects. Research Unit for the Study of Variation, Contacts and Change in English (VARIENG). http://www.helsinki.fi/varieng/CoRD/corpora/Dialects/index.html. Accessed 8.8.2013.
Ihalainen, Ossi (1988) Creating linguistic databases from machine-readable texts. In Methods in Dialectology, Alan R. Thomas (ed.). Multilingual Matters, 48. Clevedon: Multilingual Matters. 569–584.
Rissanen, Matti (1993) Ossi Ihalainen (1941–1993): In memoriam. Neuphilologische Mitteilungen (Bulletin of the Modern Language Society of Helsinki), 94. Helsinki: Neuphilologischer Verein. 241–242.