◎ JADH2016

Sep 12-14, 2016 The University of Tokyo

Jane Austen in Vector Space: Applying vector space models to 19th century literature
Sara J Kerr (Maynooth University)

Jane Austen is known as one of the founders of the modern English novel. Traditionally she has been seen as a writer who focused on the minutiae of domestic life, but more recent critics have been finding ideas which challenge this view, positioning her as a far more political writer than previously thought.

An exploration of Austen’s ideology is challenging for a number of reasons. A lack of contextual material means that no reliable, first person account exists. The historical representation of Austen has been heavily mediated by her family. With the publication of A Memoir of Jane Austen in 1869, some fifty years after her death by her nephew James Edward Austen-Leigh, Austen’s reputation as a skilled but uncontroversial writer, in keeping with the Victorian ideal of conservative, religious, womanhood, was set.

Yet, there is a disconnect between the Austen presented by her family and the Austen we glimpse through her novels. Most modern scholars agree that there are some political elements in Austen’s work, but there is considerable disagreement as to their nature. Austen was writing with the goal of publication and was clearly aware that publishers and the public had clear expectations of novels and that openly contentious works were unlikely to be published. Her writing, first and foremost, aimed to entertain and as a result, it is perhaps unsurprising that her political ideology is to hard to identify. This research takes a quantitative view of Austen’s novels, focusing on her representation of independence and dependence at a personal, social and political level, to explore her political ideas in detail.

Traditional close reading by necessity focuses on the detailed analysis of small sections of text. Although the corpus of Austen’s novels is not large, scholars have identified very different political views using the same source material. For example, Butler (Jane Austen and the War of Ideas. Oxford: Clarendon Press, 1997. Print.) presents Austen as a conservative writer whereas Johnson (Jane Austen: Women, Politics, and the Novel. University of Chicago Press, 1990. Print.) and Neill (The Politics of Jane Austen. Macmillan, 1999. Print.) claim that she is more radical in her views. In searching for insight into the traces of Austen’s political views, we need to look for more subtle patterns within and across the texts. In effect, we are looking for an understanding which goes beyond the individual novel, and in Moretti’s words “close reading will not do it” (Distant Reading. London: Verso, 2013. Print. p48).

The advent of distant and scaled reading techniques within literary studies has enabled the exploration of texts in a manner which “defamiliarize…making them unrecognizable in a way…that helps scholars identify features they might not otherwise have seen” (Clement, Tanya. “Text Analysis, Data Mining and Visualisations in Literary Scholarship.” MLA Commons | Literary studies in the digital age. Oct. 2013. Web.). Topic modelling is, perhaps, the most popular of these tools for Digital Humanists who wish to transform texts and view them through a different lens. However, the application of ‘word2vec’ (an algorithm which represents words as points in space, and the meanings and relationships between them as vectors) has the potential to be of even greater use. It can work effectively on a smaller corpus and can be applied to full texts, whereas, as Jockers has noted (“‘Secret’ recipe for topic modeling themes’. matthewjockers.net. 12 Apr. 2013, Web.), topic modelling is more effective when working with a large, noun only corpus. In addition, ‘word2vec’ allows the exploration of discourses surrounding a theme. Rather than asking ‘which topics or themes are in this corpus of texts?’ the application of the ‘word2vec’ algorithm allows us to ask ‘what does the corpus say about this theme?’

Typical results are illustrated through an investigation of the words ‘independent’ and ‘independence’. The ten nearest words to ‘independent’ include ‘decorum’, ‘fortune’ and ‘greatness’, but also ‘contemptible’ and ‘deplorable’. Similarly, the fifty nearest words to ‘independent’ and ‘independence’ contain many of the expected words: ‘privilege’, ‘matrimonial’, ‘property’, ‘fortune’, however, also present a different group: ‘inequality’, ‘littleness’, ‘illiberal’ and ‘unfair’. This group of words suggests that while much of Austen’s discourse surrounding independence is in keeping with views of her writing as conservative, a more critical discourse also exists.

Expanding the number of words in the model to one hundred reveals two of these negative clusters:

  1. bias, inequality, insignificance, wounding, littleness, unfair, degradation, shameful
  2. deplorable, arrogance, contemptible, conceit, spoiled

This is suggestive of two separate ideas being expressed: the negative impact of the unfair distribution of wealth on the less fortunate, and the negative impact independence could have on its recipient. As wealth and property underpinned the existing social hierarchy these views may be seen as political.

The ‘word2vec’ model, originally created by Tomas Mikolov and his colleagues at Google in 2013, takes in a corpus of texts and represents words as points in a multi-dimensional space, word meanings and relationships between words are encoded as distances and paths in that space, through the creation of an artificial neural network. The created model can then be interrogated and the results visualised.

Applying ‘word2vec’ to literary studies allows the discursive space surrounding a particular topic to be examined, highlighting areas for further exploration. While close readings can identify specific examples where Austen is critical of the world in which she lives, the application of ‘word2vec’ suggests that a more consistent discourse critical to the existing power structures exists across her novels. An exploration of the novel corpus within vector space places the discourse within a semantic space through which Austen’s ideological views can be interrogated in combination with a more traditional close reading, leading to a thicker, more nuanced interpretation.

Although a relatively recent addition to the range of computational tools being used for Digital Humanities, these initial analyses suggest that there is good reason to explore the application of vector space models to corpora of literary texts further.