This paper presents the results of an analysis on the relationship between film information and audience measurement at a film festival. The aim of the analysis is to create a model that can predict attendance at the halls and the congestion rate of halls and identify the important attributes at screenings. The results of the analysis revealed that the categorization of films screened is the most important factor for the audience to attend film screenings.
Artistic contents are delivered to audiences more often in digital format via digital networks. In stark contrast to convenient consumption through digital transmission, live performances are sometimes considered a better way to fully enjoy artistic content, a notion that has gained popularity in recent times. When the contents are films, film festivals are considered a form of live performance (Bordwell, Thompson, & Ashton, 2004). During the festival, both the creators and the audiences get together and discuss the films that are screened. Until now, little has been known about film festivals as a media beyond the publically known festival organization and the official statistics provided by the organizers. Exceptions are the analysis of film selection processes in a film festival (Inoue & Sakuma, 2014) and the special journal issue focusing on the historical and geographical diversities in film festivals (Papadimitriou & Ruoff, 2016). The current work is an attempt to understand the properties of film festivals in terms of audience participation by building prediction models of hall attendance and congestion rates. A similar attempt has been made to predict box-office revenues from the search statistics on upcoming films (Google, 2013). However, compared with major commercial films, artistic films shown in a film festival have little information on the potential audiences. Therefore, we focused on the information about the films and the organization of the film festival for building the prediction model.
We considered the Yamagata International Documentary Film Festival (YIDFF) as the target event. YIDFF is held biennially. We used data from the 174 films screened in the year 2011. The information about the films were either provided by the organizers or retrieved by Web crawling on the festival website.
We used multiple regression and random forest to construct the prediction models. These methods were chosen prioritizing interpretability over accuracy of prediction. The dependent variables were either the raw data on the number of the audience or the congestion rate of the hall. We mainly discuss the congestion rate model here. The independent variables were as follows: number of countries involved in film production (real number), running time (real number), capacity of the halls (real number), talk held after screening (binary), weekday or holiday (binary), number of films the director appeared in previous YIDFFs (real number), program (one of 8 categories), starting time (real number), and the number of audience in the previous film in the same hall (real number). The 8 programs considered are as follows: IC (International Competition: 15 outstanding films selected from entries from around the world); NAC (New Asian Currents: Introducing up-and-coming Asian documentary filmmakers); NDJ (New Docs Japan: A selection of new Japanese documentaries); IS (Islands/I Lands, NOW—Vista de Cuba: A program focusing on Cuba as an “Island”); MT (My Television: A program featuring Japanese TV documentaries, with a focus on works from the 1960s and 1970s); TJ (A Reunion of Taiwan and Japanese Filmmakers: 12 Years Later: Filmmakers from YIDFF New Asian Currents ’99 return with old and new films); FY: (Films about Yamagata: The third edition of this regular program that looks at Yamagata and its relation to cinema); CU (Great East Japan Earthquake Recovery Support Screening Project “Cinema with Us”).
When multiple regression analysis was used, the adjusted coefficient of determination was found to be 0.43. When random forest was used, the adjusted coefficient of determination was 0.37. Both values are lesser than 0.5, which is often the threshold for reliability. Therefore, we could not obtain a reliable prediction model from the available data. The factors contributing to the prediction of congestion rates were the capacities of the halls (as per the regression analysis) and the programs (as per both methods).
We analyzed film popularity based on audience measurement in the Yamagata International Documentary Film Festival (YIDFF). This analysis based on multiple regression and random forest methods indicated that the programs as part of which the films are screened are an important factor for predicting higher audience participation. For example, the organizers had assigned halls with similar capacities to two special programs: CU (Great East Japan Earthquake) and IS (Cuba). However, the program CU had more audience participation than the program IS, probably because the audiences were more attracted to a familiar and current topic.
This work is based on an analysis performed by Yuri Koseki. Kazunori Honda helped to improve this abstract.
 Bordwell, D., Thompson, K., & Ashton, J. (2004). Film art: An introduction (7 ed.). New York: McGraw-Hill.
 Google. (2013, 6). Quantifying Movie Magic with Google Search.
 Inoue, M., & Sakuma, S. (2014). Analysis of the film selection process for a film festival. The 7th International Workshop on Information Technology for Innovative Services (ITIS-2014), (pp. 582- 587). Victoria, Canada.
 Papadimitriou, L., & Ruoff, J. (2016). Film festivals: origins and trajectories. New Review of Film and Television Studies, 14 (1), 1-4.