◎ JADH2016

Sep 12-14, 2016 The University of Tokyo

Linking Scholars and Semantics: Developing Scholar-Supportive Data Structures for Digital Dūnhuáng
Jacob Jett, J. Stephen Downie (University of Illinois at Urbana-Champaign), Xiaoguang Wang (Wuhan University), Jian Wu, Tianxiu Yu (Dunhuang Research Digital Center), Shenping Xia (Dunhuang Research Academy)

The Digital Dūnhuáng Project (Wu, 2015; Zhou 2015) is a very large-scale field digitization project in the process of digitizing the contents of the Mògāo Caves, Dūnhuáng’s vast system of 492 Buddhist temples and cave sites. The caves contain thousands of sculptures, murals, and other cultural artifacts that were fashioned during the thousand years (~400-1400 CE) that the city served as a crossroads on the Silk Road and vital Buddhist cultural center. The Mògāo Caves are a UNESCO World Heritage Site and are of interest to both scholars and the general public alike. The level of interest in this cultural treasure is reflected by the 1.1 million visitors to the caves in 2015 alone.

There has been a great deal of effort, realized through the International Dūnhuáng Project 1 (IDP), to digitally preserve and publish the many manuscripts found in Cave 17. More recently, the Digital Dūnhuáng project of the Dūnhuáng Academy has been digitally capturing the sculptures, paintings, and other important cultural artifacts found within the caves. They are creating high resolution images so that they may be made more accessible to scholars worldwide and shared with those unable to physically travel to Dūnhuáng (Wang, 2015). Thus far the project has only digitized the contents of 120 of the 492 caves. Despite the modest number of caves photographed, the Digital Dūnhuáng project has already produced 941,421 digital images of the cultural artifacts. We estimate that by the project’s end, almost four million digital images will have been produced.

Digital Infrastructure

In this poster abstract, we present a proposed formal metadata model designed to improve the utility of the soon-to-be millions of Dūnhuáng cave images with the special intention of enhancing the impact of these important resources on digital and traditional humanities and religion scholarship worldwide. The process of digitization—the production of digital photographs—of the Mògāo Caves rich repository of cultural heritage is an ongoing process.

Figure 1. Persistent identifiers and base taxonomic classification

We assert that the digital annotation of the Dūnhuáng photographs and the things denoted in them is a key aspect for providing remote scholars the means to interact with this treasure trove of historic works. Thus, before any digital annotation can take place, we propose that a necessary first step is to inventory and identify the cultural artifacts in the caves (Downie, 2015). Figure 1 (above) illustrates one method in which this can be done, creating a rich interlinked web of man-made objects and the conceptual objects they depict.

The creation of persistent identifiers for all of the caves’ contents at their various intellectual levels of scholarly interest is the cornerstone upon which our proposed interactive digital infrastructure is to be built. Once an inventory of persistent, web-accessible objects has been put into place, then scholars may interact with the various intellectual targets for scholarship by adding their own unique layers of digital annotations.

Figure 2. Simple scholarly annotation[*1]

As Wang et al. (2016) observe, metadata, deep semantic analysis and topical indexing are among the kinds of annotation taking place with regards to the digital photographs being produced by Digital Dūnhuáng. Figure 2 (above) illustrates a simple scholarly annotation scenario. In this example, a scholar has labeled the target conceptual object (the disciple) in the red box with a name, “Kaspaya.”

Figure 3. Direct scholarly discourse through digital annotation

These technologies make use of linked data (Berners-Lee, 2006; Bizer et al., 2009) through RDF[*2]-conformant ontologies and serialization formats, such as JSON-LD[*3]. Once a digital foundation of persistent identifiers and basic categorization has occurred and annotation infrastructure has been implemented, the scholars may interact directly or indirectly with one another through the act of annotating (illustrated in Figures 3 (above) and 4 (below)). In this example, a second scholar adds a dissenting view of what the disciple’s name should be, saying “no, this disciple’s name is ‘Maudgalyayana’.”

Figure 4. Indirect scholarly discourse through digital annotation

These illustrative examples merely showcase one of the many scholarly discourse use cases—promoting discourse—digital annotations of this kind can play. These annotations may also be part of a process for arriving at a consensus for the identity of the monk depicted by the statue or they might record a narrative of discussions about the caves’ contents. Digital annotations like these might also be applied in classroom settings, permitting students and instructors with means to interact with the cultural objects that they would not normally have.

Of course, the mechanics and limitations of digital systems are such that it is not always apparent that the annotators are actually naming the same entity. As Arms (1995) observes, the scholarly users of the Digital Dūnhuáng’s images do not want to interact with the digital photographs as much as they would like to make assertions regarding the things denoted within the photographs. One potential method for remedying this problem is to extend the framework with properties that are designed to operate in parallel to process of anchoring annotations to their targets. An example of this appears in Jett et al. (2016) and is illustrated in Figure 5 (below).

Figure 5. Preserving the intellectual focus of scholarly discourse

In this case the property, “hasTargetFocus” is used to preserve the fact that the two scholars are discussing the same abstract thing, the old disciple, even though their annotations are anchored to two completely different entities (i.e., to a region of a photograph and to an annotation of the region of that photograph, respectively). This level of representation is useful even if their annotations where anchored to precisely the same target because it clarifies that their annotations are about the monk depicted by the statue and not the statue itself or the photograph that depicts it.

Another advantage that digital knowledge representation systems bring is the flexibility of extensible frameworks. Not only do extensible frameworks allow more of a scholar’s intentions to be preserved they also permit choice of domain vocabularies for description of resources (e.g., CIDOC-CRM[*4]) and the ability to support specialized digital tools. For example, scholars using Digital Dūnhuáng might wish to use the International Image Interoperability Framework’s image selector[*5], which allows them to rotate the subject of an image in three dimensions as well as specifying some particular part of an image. Similarly, the use of this framework, will allow scholars to gather up all of the annotated instances of, for example, the disciple “Kaspaya” from all of the Dūnhuáng caves across time and space. Persistent identifiers and a basic categorical framework are the cornerstone for building a digital scholarly workplace.


[*1] Note that for the sake of readability, many core annotation properties concerning the annotations’ provenance, such as date created, have been left out of these illustrated examples. The annotation model’s full property set can be found at: https://www.w3.org/TR/annotation-model/

[*2] https://www.w3.org/RDF/

[*3] http://json-ld.org/

[*4] http://www.cidoc-crm.org/html/5.0.4/cidoc-crm.html

[*5] http://iiif.io/api/annex/openannotation/#status-of-this-document


[1] Arms, W. Y. (1995). Key concepts in the architecture of the digital library. D-Lib Magazine 1(1). Available via: http://www.dlib.org/dlib/July95/07arms.html

[2] Berners-Lee, T. (2006). Linked data. Designed Issues: Architectural and Philosophical Points. Accessible via: https://www.w3.org/DesignIssues/LinkedData.html

[3] Bizer, C., Heath, T. & Berners-Lee, T. (2009). Linked data—The story so far. International Journal on Semantic Web and Information Systems 5(3), pp 1-22. DOI: 10.4018/jswis.2009081901

[4] Downie, J. S. (2015). “Enhancing the impact of Digital Dunhuang on digital humanities scholarship.” Panel presentation given at DH 2015 (Sydney, Autralia, 30 June – 3 July 2015).

[5] Jett, J., Cole, T. W., Dubin, D. & Renear, A. H. (under review). “Discerning the intellectual focus of annotations.” Paper submitted to Balisage: The Markup Conference 2016 (North Bethesda, MD, 2-5 August 2016).

[6] Wang, E. (2015). “Explicating the potentials of Digital Dunhuang on scholarship and teaching.” Panel presentation given at DH 2015 (Sydney, Autralia, 30 June – 3 July 2015).

[7] Wang, X., Song, N., Zhang, L., Jiang, Y. & Marcia, Z. (2016). Understanding the subject hierarchies and structures contained in Dunhuang murals for deep semantic annotation: A content analysis. Unpublished working paper to be submitted.

[8] Wu, J. (2015). “Introducing the ‘real’ Dunhuang and the Digital Dunhuang project.” Panel presentation given at DH 2015 (Sydney, Autralia, 30 June – 3 July 2015).

[9] Zhou, P. (2015). “Digital Dunhuang: Digitally capturing, preserving, and enhancing real Dunhuang.” Panel presentation given at DH 2015 (Sydney, Autralia, 30 June – 3 July 2015).