◎ JADH2016

Sep 12-14, 2016 The University of Tokyo

Development of the Dictionary of Poetic Japanese Description
Hilofumi Yamamoto (Tokyo Institute of Technology), Bor Hodošček (Osaka University)
Introduction

The main purpose of this project is to de- velop a dictionary for Yamato Japanese descrip- tion(Yamamoto et al. 2014). To this purpose, the present study proposes a method of extracting sub communities as classical Japanese poetic vocabu- lary. The analysis is based on co-occurrence pat- terns defined as any two words appearing in the same poem.

Many scholars of classical Japanese poetry have tried to explain constructions of poetic vocab- ulary based on their intuition and experience. As scholars can only describe constructions that they can consciously point out, those that they are un- conscious of will never be uncovered. When we de- velop a dictionary of poetic vocabulary using only our intuitive knowledge, the description will lack important lexical constructions. We believe that in order to conduct more exact and unbiased de- scriptions, it is necessary to use computer-assisted descriptions of poetic word constructions using co-occurrence weighting methods on corpora of classi- cal Japanese poetry.

A typical item in a general dictionary con- tains the item’s definition, part of speech, expla- nation, and example sentences. An item in the proposed dictionary contains not only the above- mentioned four types of information, but also in-cludes lists of words grouping sub communities, which allows one to better grasp the construction of poetic words.

In terms of lexical study, many quantitative studies of vocabulary are focused on the frequency of the occurrences of words. However, research re- lying on word frequency alone does not contribute to the analysis of mid-range words—words with not too high but not too low frequencies (Hodoˇsˇcek and Yamamoto 2013). We therefore use the R package ‘linkcomm’ to calculate network centrality between collocations (Freeman 1978). In the context of lexi- cal analysis, we regard this calculation of sub com- munity discovery as a way to describe the poetic roles of mid-range words.

Methods

We will attempt to extract all of the sub com- munities of ume (plum), sakura (cherry), and tachibana (mandarin orange) from the Hachidaishu¯ database[*1]. We will use ‘linkcomm’ procedure to calculate word centrality to uncover the key sub communities (Csardi and Nepusz 2006, Ahn et al. 2010). As materials of this research we will use the Hachidaishu¯ (ca. 905–1205). We mainly collect the data from Kokkataikan (Shin-pen Kokkataikan Henshu¯ Committee 1996), Niju¯ichidaishu¯ database published by NIJIL (Nakamura et al. 1999), Shin- Nihon Koten Bungaku Taikei (Kojima and Arai 1989), and Shin-kokinshu¯ (Kubota 1979).

Results

Figure 1: Network of tachibana (mandarin orange)

Table 1: The sub-cluster of tachibana (mandarin or- ange): Top 10 words having higher den- sity values are extracted; we used the aver- age, McQuitty, and single clustering meth- ods; values in parentheses indicate maxi- mum partition density.

Table 1 and Figure 1 were extracted based on the network of tachibana (mandarin orange). We found that the three methods, average, McQuitty, and single, are not different in terms of community discovery. We discovered the largest community, mukashi, (old times) which includes 15 nodes in the graph of tachibana.

Discussion

Table 1 lists the centrality values given by the three methods, which show similar tendencies among the three methods. These words are clearly relating to the poem which is famous for its tachibana flowers[*2] written by an anonymous author but commonly at- tributed to Ariwara no Narihira.

All poems have some supporting words sup- porting a key word acting as the central player, which can be extracted by the function getCommu- nityCentrality(). However, the proper number of words to be extracted are not known in the present study.

Conclusion

The present paper proposes to further the develop- ment of a dictionary of classical Japanese poetry using pairwise term information which is generated by the community centrality procedure. We con- ducted an experiment using the R package “linked communities” and showed that the methods in the experiment extracted similar sub cluster terms which contribute to the description of classical Japanese poetry.

Note

[*1] We will report only on tachibana because of limited space.

[*2] Satsuki matsu / hana tachibana no / ka o kageba / mukashi no hito no / sode no ka zo suru of No. 13 in Chap- ter 3: Summer, the Kokinshu¯ (ca. 905) which appear in the Tales of Ise (ca. 800) as well.


References

[1] Ahn, Yong-Yeol, James P Bagrow, and Sune Lehmann Jrgensen (2010) “Link communities re- veal multiscale complexity in networks.”, Nature, Vol. 466, No. 7307, pp. 761–764.

[2] Csardi, Gabor and Tamas Nepusz (2006) “The igraph software package for complex network re- search”, InterJournal, Vol. Complex Systems, p. 1695.

[3] Freeman, Linton C. (1978) “Centrality in social networks conceptual clarification”, Social Net- works, pp. 215–239.

[4] Hodoˇsˇcek, Bor and Hilofumi Yamamoto (2013) “Analysis and Application of Midrange Terms of Modern Japanese”, in Computer and Humanities 2013 Symposium Proceedings, No. 4, pp. 21–26.

[5] Kojima, Noriyuki and Eiz¯o Arai (1989) Kokin- wakashu¯, Vol. 5 of Shin-Nihon bungaku taikei (A new collection of Japanese literature), Tokyo: Iwanami shoten.

[6] Kubota, Jun (1979) Shinkokinwakashu¯, Shincho Ni- hon Koten Shu¯sei, Tokyo: Shinchosha.

[7] Nakamura, Yasuo, Yoshihiko Tachikawa, and Mayuko Sugita (1999) Kokubungaku kenkyu¯shiryo¯kan d¯etab¯esu koten korekushon (Database Collection by National Institute of Japanese Literature “Niju¯ichidaishu¯” the Sh¯oho edition CD-ROM): Iwanami Shoten.

[8] Shin-pen Kokkataikan Henshu¯Committee ed.(1996) Shimpen Kokka-taikan: CDROM Ver- sion: Kadokawa Shoten.

[9] Yamamoto, Hilofumi, Hajime Murai, and Bor Ho- doscek (2014) “Development of an Asymptotic Word Correspondence System between Classi- cal Japanese Poems and their Modern Transla- tions”, in Proceedings of Computer and Human- ities 2014, Vol. 2014, pp. 157–162.