◎ JADH2016

Sep 12-14, 2016 The University of Tokyo

The Kanseki Repository: A new online resource for Chinese textual studies
Christian Wittern (Kyoto University)

The Kanseki Repository (KR) has been developed by a research group at the Institute for Research in Humanities, Kyoto University under the leadership of Author(s). It features a large compilation of premodern Chinese texts collected and curated using firm philological principles based on more than 20 years of experience with digital texts. Among its unique features is the fact that the texts can be accessed, edited, annotated and shared not only through a website, but also through a specialized text editor, which thus morphes into a powerful workspace for reading, research and translation of Chinese texts. The Kanseki Repository includes all texts in the Daozang and Daozang jiyao and a large collection of Buddhist material, including all texts created by the CBETA team, where applicable enhanced through the inclusion of recensions from the Tripitaka Koreana, in addition to a large selection from general collections like Sibu congkan and Siku quanshu.

The source texts of the Kanseki Repository are available at @kanripo on the website github.com. These texts are displayed at www.kanripo.org and also used in the Emacs Mandoku (see www.mandoku.org) package.

This presentation will outline the main considerations for creating this repository of texts and its associated tools and methods. This includes

These points are further discussed below

Philological foundations

In a seminal article, the Swiss scholar Hans Zeller[1] emphasised the fact that all scholarly editing should make a clear distinction between the record of what is transmitted and the scholarly interpretation thereof. While this distinction is blurry at times, it has informed the design of the Kanseki Repository, which arranges the editions of a text it represents into those that strive to faithfully reproduce a text according to some textual witness ('record') and those that critically consider the content and make alteration to the text by adding punctuation, normalizing characters, collating from other evidence etc. ('interpretation').

Basic technologies
Git and GitHub

The distributed version control software git is used as a low-level transportation layer and maintenance technology. It allows users to download texts and upload revised versions, create their own versions and keep track of revisions. Github is a commercial web services based on git, that adds social-networking functions and cloud-services.


Emacs is the main user interface for users that require a sophisticated and advanced editing environment. On top of the Emacs package “Org mode” has an extension been developed that adds additional functionality that facilitates interaction with the digital archive.

Web interface at www.kanripo.org

This website provides access to the texts, including full-text search, display of transcribed text and facsimile(s) of different editions. Users can log in using their Github credentials and get access to more advanced functions such as selecting lists of text of special interest, advanced sorting functions by text category or date as well as cloning of texts to the Github user account and editing on site. The site went into testing mode in October 2015 and is scheduled to a first public release in March 2016.

Towards a platform for text-based Chinese studies

All modes of interaction described above are based on the distributed version control system git, using the Github site as a 'cloud storage'. However, in addition to providing storage, Github also provides a feedback mechanism through “pull-requests”, where users can flag corrections to a text for the @kanripo editors to consider for inclusion in the canonical version, thus making it available to all users.

The model outlined here is extensible and allows other developers of websites related to Chinese studies to access the same texts, and provide specialized services to the user, for example by enhancing the text through NLP processing. These enhanced versions can be saved (“committed” in git language) in the same way to the users account and are then also visible to the client programs described here[*1].

This will open the door to a open platform of texts for Chinese studies, where the texts of interest to the users form the center of a digital archive, with different services and analytical tools interacting and enhancing it. The user, who makes a considerable investment in time and effort when close reading, researching, translating and annotating the text, never loses control of the text and does not need to worry about losing access to it when one of the websites goes offline.

By providing versioned access to the texts in question, it is also possible to make any analytical results reported in research publications reproducible[2] by indicating the additional tools and processes needed, ideally also in a Github repository in the same ecosystem.

The aim is not just to provide a static, completed, definitive edition of a text, but as fertile a ground as possible for the interaction between the text and its readers, hopefully improving both through this process.


[*1] A “shadow” of the texts in the @kanripo account in a format suitable for text mining have been made available for specialized processing in @kr-shadow (http://github.com/kr-shadow). These texts will be updated from the master- branch of a corresponding text in @kanripo.


[1] Hans Zeller. Befund und Deutung - Interpretation und Dokumentation als Ziel und Methode der Edition, in: G. Martens and H. Zeller (ed.), Texte und Varianten : Probleme ihrer Edition und Interpretation. München, 1971, p. 45-89, translated as Record and Interpretation: Analysis and Documentation as Goal and Method of Editing in: Hans H. W. Gabler, G. Bornstein, and G. B. Pierce (ed.), Contemporary German Editorial Theory, Ann Arbor 1995, p. 17-58.

[2] Vikas Rawal, Reproducible Research Papers using Org-mode and R: A Guide, at https://github.com/vikasrawal/orgpaper [accessed 2016-05-04]