Pugs and flower-beds: Jane Austen’s use of language

Those who write variations of Jane Austen’s novels often make a big effort to emulate her style. I certainly did, with my short story collection Crime and Prejudice. It matters especially to me, because I’m a bit of a language nerd, and my nerdery (there’s a new word!) led me to take a corpus linguistic approach. I’d published in that field before, as an academic, so now it was time to put what I knew into practice in the service of literature. Well, my own literature. And it was great fun.

But first, what is corpus linguistics, and why is there a picture of a skeleton at the bottom of the page? “Corpus” means “body”, and “linguistics” is the scientific study of language. Corpus linguistics therefore involves the creation of a body of text that can be analysed. For writers, this enables us to search for details such as which words are used frequently by a particular writer, or which characters in their books use certain words. For my own research for Crime and Prejudice, I downloaded all Jane Austen’s works from the wonderful Project Gutenberg website and created a simple Word document. Then I used the file to check if words I was unsure about actually appeared in Jane Austen’s novels.

My final vocabulary list was of course not identical to Jane Austen’s, but this was a good start. I was also lucky enough to have online access to the complete Oxford English Dictionary (OED), so I could check if a particular word was used in Jane Austen’s time. Jane Austen does not use the words “constable” or “handcuffs”, for example, but the OED confirmed that they were appropriate for that era. And I drew material from the Old Bailey online records of crimes that occurred around the time Jane Austen was writing. This was another wonderful source of vocabulary, such as the poisonous “decoction of foxglove”. This appears in my story about Mr Bennet, and is based on a case in which an untrained doctor prescribed way over the safe limit to an unfortunate apprentice.

A different use of my Jane Austen corpus was to examine contracted forms in speech. Certain characters in her novels and juvenilia (pieces she wrote when she was younger) adopt the following shortened forms (the figures indicate the total number of uses in her work): an't – 8;  can't – 10; don't – 72; he'll – 2; I'd – 1;  I'll – 9; I'm – 5; it's – 3;  shan't – 9; 'tis – 18; t'other – 5; 'twould – 1; we'd – 1; we'll – 1; won't – 18; you'll – 2. It’s important to look closely at these numbers. There are 18 uses of “won’t”, for example, but they are nearly all in Sense and Sensibility, and many are spoken by Mrs Jennings and her daughter Charlotte, highlighting their chatty way of speaking. The only contraction commonly used by a wide range of characters is “don’t”, appearing often as an admonition and sometimes reinforced by repetition, e.g. “don’t talk so, don’t talk so” (Fanny Price in Mansfield Park) and “don’t speak it, don’t speak it” (Emma).

There were a couple of surprises in my corpus searches. I needed a flower-bed in Mr Bennet’s story about foxglove and pugs, and the earliest example from the OED was 1873. And yet, there it is in Mansfield Park, where Lady Bertram complains about the heat: “Sitting and calling to Pug, and trying to keep him from the flower-beds, was almost too much for me.” An even bigger surprise was Jane Austen’s use of the word “sister-in-law”. I had heard that this term was not used in her novels, but it appears twenty times, with the same meaning that we have today – the sister of someone’s spouse. Project Gutenberg does not have a record of the editions used in their Jane Austen collection, but I have verified the use of “sister-in-law” in other editions of her work, including some published by Penguin. If anyone knows the origin of this mystery regarding Jane Austen's supposed non-use of the term, please enlighten me. Have I made a discovery here, or have I missed something? I'd genuinely love to know!

All in all, I was very satisfied with the corpus linguistics approach. It gave me inspiration for criminal cases to use in my stories, and it allowed me to check the accuracy of my vocabulary. And I promise no pugs were harmed by the decoction of foxglove.

Reference

Old Bailey Proceedings Online (www.oldbaileyonline.org, version 9.0) October 1826. Trial of JACOB EVANS (t18261026-215). Available at: https://www.oldbaileyonline.org/record/t18261026-215

N.B. corpus linguistics involves the use of a computer program called a concordancer. I only did very basic research, so a Word document and the find function were sufficient, but there are some great concordancers out there. My favourite is Sketch Engine, which will give you all kinds of word patterns and collocations (words that go together).