◎ JADH2016

Sep 12-14, 2016 The University of Tokyo

High-throughput Collation Workflow for the Digital Critique of Old Japanese Books Using Computer Vision Techniques
Asanobu Kitamoto (National Institute of Informatics), Kazuaki Yamamoto (National Institute of Japanese Literature)

Massive digital image collection of about 300,000 pre-modern Japanese books is expected to be released as open data in coming years thanks to the effort of the project “Building International Collaborative Research Network for Pre-modern Japanese Texts” lead by National Institute of Japanese Literature. One of the fundamental tasks in such a massive collection is collation, or more specifically, comparison of books to identify different editions and their relationship. Books with the same title may have different content, not only in terms of textual content, but also in terms of variants and impressions evidenced by small differences that are difficult to notice by human inspection. The goal of our research is to develop a high-throughput workflow for comparing different editions of books at the pixel level of digital images.

In contrast to text-based comparison, image-based comparison has advantages as follows. First, it does not require transcription of books before comparison. Second, it is also effective for non-textual comparison such as difference of paintings, or quality of printing, as long as books in comparison have the same layout with minor differences. Although text-based comparison is powerful to allow comparison beyond different physical layout, we believe that image-based comparison is relevant because this simple but tedious task is what computers can perform better than humans. This work, however, is still in a preliminary phase, and the following result is more of preliminary than comprehensive.

The whole workflow can be summarized as follows. First, a page divider tries to divide a digitized image into a set of page images for a page-to-page comparison. But the page divider heavily depends on specific capturing condition, so we can choose either automatic or manual approaches for this task. Second, using computer vision techniques, feature points are automatically extracted from page images of different editions. Extracting feature points is an active area of research in computer vision, and they generally give us satisfactory results. Please note, however, that an unsolved problem remains in comparison across images of different quality, such as full-color, gray-scale, and (nearly) binary images. Third, feature points are used as reference points for registration using rigid or non-rigid registration techniques. Rigid registration, which only involves shift, rotation and scale, usually gives satisfactory results for the purpose of inspecting minor difference, but non-rigid registration may be required for advanced analysis, such as local distortion of woodblock. Fourth, after registration, two images are superimposed and compared for each pixel to color-code intensity difference to highlight large difference. A useful color scale for a human inspector.is to assign red and blue color for large difference and white color for small difference.

Figure 1 shows a preliminary result about comparing two editions of the same book. The left panel shows the result of correspondence between reference points on two images. The right panel shows the color-coded difference between two editions after registration, illustrating that most of the pixels become white or gray due to cancelation of same characters on two editions. A human inspector can easily identify large differences in two editions represented by red or blue color, namely stamps in different locations.

Figure 1: Matching two images using reference points extracted from two images, and the comparison of two images using red/white/blue color scale.

Even if two editions are the same, however, two editions cannot be totally canceled to produce a purely white image due to following reasons. First, a page image contains not only characters but also other noises, such as stain on the paper, or partial transparency of the paper showing characters on the other side. Second, local variation cannot be removed by a simple rigid registration, such as local distortion of the woodblock at the edge, or intensity variation of the ink in the middle. A human inspector, however, can quickly filter out those noises, and can easily identify meaningful differences without influence of subjectivity in human reading.

A future work is to build an edition comparison service for comprehensive image-based analysis of book editions. When an image of one edition is uploaded to the service, the server compares the uploaded image with other editions in the storage, and suggest that it is one of the existing editions or is a new one. This may be a killer app for the archive of old Japanese books because having more editions, variants, and impressions in the storage means higher accuracy of comparison, which is the reason to attract more users. This kind of positive feedback is known as network effect.

Lastly, we would like to emphasize that the target of this research is at the level of text critique, but not at the level of text interpretation. This is one example of our proposed concept “digital critique” which uses information technology to enhance a traditional human-based criticism. We expect that this workflow is beneficial to scholars because it will reduce the burden of scholars who need to perform a tedious text critique task of character-by-character comparison, and it will allow them to focus more on a higher level of research such as text interpretation.


The project was supported by collaborative research grant from National Institute of Japanese Literature. Registration is performed using open source software, OpenCV. The books used in the experiment is (1) 枕草子春曙抄, 国文研高乗, and (2) 春曙抄, 国文研鵜飼.