Sep 12-14, 2016 The University of Tokyo

HYU:MA -­ A Model for Library-­Supported Projects in Japanese Digital History
Peter Broadwell, Tomoko Bialock (University of California, Los Angeles)

This presentation provides an overview of a cluster of new digital research projects at UCLA that focus upon Japanese historical literature, visual arts, and theater traditions. These projects are noteworthy for their use of emerging technologies in digital scholarship and instruction, as well as their basis in an active partnership between faculty members and the university library — specifically, between subject librarians, library technology specialists, and professors in the department of Asian Languages and Cultures. In addition to contributing new digital scholarship tools and resources, research findings, and instructional methods, these projects may demonstrate a promising alternative model for enabling more sustainable and collaborative digital history projects both within and across institutions.

The projects described here are meant to develop novel ways of making Japanese cultural history more accessible and compelling as fields of study for new generations of undergraduates and graduate students. They use as a guiding metaphor the emergent notion of the digital “humanities macroscope” (stylized as ヒュー:マ, or HYU:MA) — an infrastructure for research and instruction that facilitates computational access to large historical corpora, tools and visualizations. In doing so, the macroscope enables exploration of historical phenomena across a range of perspectives: from close reading (micro-scale), to distant reading (macro-scale), and at all levels in between (meso-scale).

The forerunner of the new generation of digital projects in Japanese cultural history at UCLA is the Hentaigana App, a highly successful collaborative effort between faculty, library staff and students at UCLA and Waseda University to develop an app for iOS and Android smartphones that helps students learn to read premodern Japanese calligraphic writing.[1] The highly interactive, customizable and even entertaining features of the Hentaigana App allow scholars very quickly to develop substantial facility for reading historical manuscripts. Experiencing Japanese classical literature in this form, rather than solely from typeset critical editions, enables scholars to engage much more closely with the original historical context of the texts and encourages them to ask more detailed questions about the circumstances that produced them. Importantly, the app uses actual historical texts selected and digitally curated by librarians and faculty at Waseda and UCLA as the source materials for its interactive lessons.

Fig 1. An advanced n-gram search and visualization interface for Japanese poetic texts, based on the Bookworm open-source project (http://bookworm.culturomics.org/).

The HYU:MA tools developed subsequent to the Hentaigana App follow the model it exemplifies: employing modern digital technologies to provide new perspectives on historical Japanese cultural expressive forms — primarily poetry and prose fiction, but also aspects of visual culture — while making use of digitized library collections and benefitting from library staff members' increased proficiency with the development and use of digital scholarship tools. Among such products is an interactive resource, based on the open-source Bookworm "n-gram” viewer project, that enables scholars to produce and browse interactive visualizations of word frequencies among tens of thousands of Japanese poems over several centuries beginning from approximately 600 AD, including the imperial anthologies of waka poems (see Figure 1).

Besides allowing scholars to gain a “distant reading” perspective on fluctuations in word usages over time via interactive kanji- and kana-based search features, the n-gram search interface provides visualizations of the contributions of individual poets and the compilations of specific anthologies, as well as the genre categories assigned to the poems within the anthologies. This latter feature enables contemporary scholars to re-engage with prior analyses of poem types and vocabularies from the early years of mixed quantitative and qualitative waka studies,[2] as well as more recent inquiries in the fields of computational corpus linguistics.[3] A related type of analysis that is presently underway involves a comparison of human-labeled poem genres to those assigned by a computational classifier that has been “trained” by observing the words most commonly associated with traditional genres. The so-called “confusion matrix” that results from this comparison highlights the poems on which human and computerized classifiers disagree (see Figure 3). These “disagreements" tend to highlight poems that do not fit neatly into a single category; in studies of other bodies of literature, such liminal cases have proven upon further, close reading to be some of the most culturally and historically significant works within the entire corpus.[4]

Fig 2. A sample topic model browser for 1,564 novels from Aozora Bunko, based on the open-source dfr-browser project (http://agoldst.github.io/dfr-browser/).

Further tools in development at UCLA involve the use of Latent Dirichlet Allocation-based topic modeling to generate interactive visualizations of the concentrations of various computationally detected semantic “topics” within single works (for example, Genji monogatari), as well as across very large corpora, including the publicly accessible prose materials in the Aozora Bunko digital library (see Figure 2). When applied to single works, topic modeling can reveal interesting mid-level (meso-scale) attributes of the work that may have escaped researchers’ notice; when used on much larger collections that no single reader can expect realistically to read and comprehend in full, this tool provides a helpful “distant,” aggregate view of the primary topics in the entire corpus and their attributes over time; scholars may then choose to examine a few specific works, topics, or time periods in greater detail.

Fig 3. A “confusion matrix” visualization of official genre classifications (vertical axis) of waka poems from the imperial anthologies, versus the classifications made by a naïve Bayes classifier trained on the official classifications (horizontal axis).

One other experimental approach that we are pursuing involves utilizing recent advances in computational image analysis to provide distant and meso-scale perspectives on large collections of images. These methods include automatic generation of image mosaics, analyses and visualizations of the color profiles of images, and even the ability to search for similar objects across multiple images. Such techniques are particularly applicable to the rich visual characteristics — which, ironically, often make text-based analytical approaches considerably more difficult — of many Japanese cultural products, from Ehon banzuke playbills to ukiyo-e prints (See Figure 4).

Fig 4. An aggregate visualization of the front pages of the UCLA Library’s collection of digitized Ezukushi banzuke playbills, sorted horizontally by hue and vertically by brightness, using the open-source Coverspace tool (http://benschmidt.org/projects/coverspace/).

As a concluding observation, it is important to emphasize again the substantial and ongoing involvement of librarians and library technical staff in the digital scholarship endeavors described above. This arrangement may demonstrate an alternative model for digital history research projects, one that perhaps can help to provide lasting, sustained benefits for instruction and research in an otherwise dynamic and highly changeable scholarly landscape.


[1] Cynthia Lee. “New app helps students learn to read ancient Japanese writing form.” UCLA Daily Bruin, Oct. 8, 2015. http://newsroom.ucla.edu/stories/new-app-helps-students-learn-to-read-ancient-japanese-writing-form.

[2] Heizō Kitagawara 北川原平造. “Shikika no kōzō: Kokin wakashū nōto 四季歌の構造: 古今和歌集ノート.” Kiyō: Ueda Joshi Tanki Daigaku 紀要: 上田女子短期大学. March 31, 1993. http://ci.nii.ac.jp/els/110006406589.pdf?id=ART0008407967&type=pdf&lang=jp&host=cinii&order_no=&ppv_type=0&lang_sw=&no=1462493837&cp=.

[3] Hilofumi Yamamoto. “The Differences of Connotations between Two Flowers, Plum and Cherry, in Classical Japanese Poetry, 10th Century.” JADH Conference 2015: Encoding Cultural Resources. September 1-3, 2015, Kyoto University Institute for Research in Humanities: 31-32.

[4] See David Mimno, et al. “The Telltale Hat: LDA and Classification Problems in a Large Folklore Corpus.” Digital Humanities 2014. Lausanne, Switzerland, July 10, 2014. http://dharchive.org/paper/DH2014/Paper-163.xml.