◎ JADH2016

Sep 12-14, 2016 The University of Tokyo

Attributes of Agent Dictionary for Speaker Identification in Story Texts
Hajime Murai (Tokyo Institute of Technology)

In order to interpret and to analyze story structure automatically, it is necessary to identify who the agents are that appear in the story. This involves identifying general expressions in story text for story agents and analyzing pronouns, omissions, and the aliases of agents.

These goals assumes use of natural language processing techniques such as morphological analysis [1] and dependent analysis [2]. After morphological information and dependent relationships were obtained, the next step would be identification of agents and those behaviors in order to analyze the narratological structure of the story texts.

In this article, agents in story texts are generally proactive beings who have a will, though there may be some exceptions. In many cases, the agents are human beings. However, there are also various other agents, such as aliens, space creatures, devils, ghosts, robots, and automated machines, depending on the genre of the stories.

In general texts, some agents may be called by proper nouns at first time. However in many cases, they would be called by pronouns after second time. Moreover, most of agents have several aliases as a nickname, an official position, or a role in the family. Therefore it is necessary to identify the relationships between proper nouns and pronouns and other expressions about agents in a story text.

Moreover in Japanese text, the omission of agent vocabulary in sentences occurs frequently. Therefore, it is also necessary to estimate the omitted agent words in order to extract the story structure. In addition to that, the speaker and listener are not clarified in the dialogue texts of many stories. In such cases, the estimation of agents is also necessary.

Attributes for Agent Estimation

These estimation tasks regarding agents are very complex and the accuracy of the results is not sufficient even with recent technologies [3]. However, there are some clues to identify those agents. At first, types of pronouns give information about referring agent words. For example, “He” signifies that referred agent is male and singular. For instances, if there is “He” in some text and also if there is only one male singular proper noun, that “He” probably matches to the male singular proper noun.

In addition to those, honorific expressions are frequently appeared in dialogues in story texts. If hierarchical relationships between appeared agents in some story text can be extracted, honorific expressions become important clue to estimate and to identify agents. Moreover, calling expression such as “Honey” in dialogue also show relationships between agents. Therefore, general knowledge about relationships between agents should be stored as some database for precise agent estimation.

For instance, there are agent words in story texts that indicate family relationships (father, mother, sister, brother, etc.), vocational relationships (president, employee, etc.), and general nature of relationships (enemy, ally, friend, etc.). In some stories, it is not only individuals but also specific groups, organizations, regions, states, tribes, and nations that become agents. At first those agent words should be collected and should be categorized. In the next step, attributes for agent estimation could be granted to those words.

Table 1 shows current list of necessary attributes for agent estimation. It is desirable to extract those attributes from some elements in story texts.

Table 1: Attributes and Potential Clues for Agent Estimation

Structures for Agent Dictionary

In order to utilize attributes of agent words in agent estimation tasks, it is neccesary to construct some dictionary or database which contains those information about agent attributes.

As shown above, there is a wide range of agent vocabulary indicating proactive beings in the story text. Nevertheless, it is possible to extract these agent words from the story text and to construct a database list. Moreover, it may be possible to make a machine-readable, structured database based on the categorization of type of vocabulary and relationship.

Table 2: Category for Agent Words

Table 3: Example of Attributes of Agent Words

Therefore, agent vocabulary appearing in story texts and general vocabulary from dictionaries that can be used as agent vocabulary were collected. The vocabulary was then categorized and a structured list of agent vocabulary was developed [4] (Table 2).

In addition to the category, attributes are granted to those collected agent words. Table 3 shows an example of stored attributes for each agent word. In table 3, agent words about family were granted attributes about family.

Conclusions and Future Works

In order to estimate relationships between agent words in story texts, relevant attributes were examined and those were structured with the category of agent words. By utilizing the developed database of agent vocabulary, candidates for text expressions which may indicate agents in story text can then be easily identified. If likely candidates for agents can be detected, they will become the foundation for more precise story structure analysis.


[1] Matsumoto Y, Kitauchi A, Yamashita T, Hirano Y, Matsuda H, Takaoka K, Asahara M. Japanese morphological analysis system ChaSen version 2.0 manual. NAIST Techinical Report. Apr. 1999.

[2] Daisuke Kawahara, Sadao Kurohashi. A fully-lexicalized probabilistic model for Japanese syntactic and case structure analysis, In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 176-183, June 2006.

[3] Hua He, Denilson Barbosa, and Grzegorz Kondrak. Identification of speakers in novels. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 1312–1320, Sofia, Bulgaria, August 2013.

[4] Hajime Murai. Creating a subject vocabulary dictionary for story structure extraction. IPSJ Symposium Series, 2015:111–116, December 2015 (In Japanese).