Universidad de Costa Rica Sistema de Estudios de Posgrado IMPROVING COOPERATION BETWEEN HUMAN AND COMPUTER PLAYERS THROUGH PLAN RECOGNITION IN THE GAME PANDEMIC Tesis sometida a la consideración de la Comisión del Programa de Estudios de Posgrado en Computación e Informática para optar al grado y título de Maestría Académica en Computación e Informática Pablo Sauma Chacón Ciudad Universitaria Rodrigo Facio, Costa Rica 2020 Dedication To Artificial Intelligence, may we awake the right Basilisk one day. ii Acknowledgements I would like to thank my advisor, Markus Eger, for all the attention and help he provided me in the development of this thesis. From midnight chats to awesome discussions on varied topics, he gave me the “little push” I needed to extend beyond my Shire. I would also like to thank the other members of my committee, Juan José Vargas Morales and José Andrés Guevara Coto, who also supported me in this journey and gave me valuable feedback for my work. Beyond my committee, I want to thank all the other faculty members that helped me in this endeavor: professors and administratives alike. Special thanks to professor Gabriela Marín Raventós who was always willing to answer all of my questions. Special thanks to my family, who have supported me my entire life. My parents always inspired me to reach higher and follow my dreams and curiosity which ultimately is what brought me here. A special mention to my sister, who is having her own battle with a master’s at the UK, you are missed. And to my extended family: Pedro, Nacho, Luisa, Alex and Mariana, you know you are family to me. I want to thank all of my friends, regardless of time and place, you have all left a footprint on my heart. Special mention to Pepe, Moriato, Pancha, Achille and Leo, who have shared laughs with me all this years; my high school friends: Josem, Camacho, Andrix and Tacsan, the members of Team Miaw and Luisito’s company. All my other friends who share memories with me, I keep you always in my mind and heart. These acknowledgments would not be complete without a mention to Mich, who has been by my side the last eight years. Thank you for standing by me, your commitment, patience (lots of patience) and loyalty to bear with me all these years. Finally, I just want to say I am grateful for having this opportunity. Oh Fortuna, velut luna, statu variabilis. iii Contents Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Hoja de Aprobación . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Resumen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Index of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1 Introduction 1 1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.1 Justification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Theoretical Framework 5 2.1 Pandemic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Starting the game . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 Player turn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.3 Overall strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.4 Complexity of the problem . . . . . . . . . . . . . . . . . . . . . 13 2.1.5 Previous implementations . . . . . . . . . . . . . . . . . . . . . 14 2.2 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.1 State-Space planning . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.2 Problem complexity . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.3 Planning in games . . . . . . . . . . . . . . . . . . . . . . . . . 17 v 2.3 Plan recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 Monte Carlo sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Methodology 20 3.1 Game platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Planning model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.1 Restricted Pandemic . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.2 Planning goals . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.3 Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2.4 Evaluating unknown information . . . . . . . . . . . . . . . . . 30 3.3 Plan recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4 Data collection and comparison . . . . . . . . . . . . . . . . . . . . . . 33 4 Results 35 4.1 Agent evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2.1 Survey results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2.2 Performance results . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2.3 Qualitative analysis of the agents . . . . . . . . . . . . . . . . . 49 5 Conclusions 52 5.1 Game platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.2 Intelligent agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3 Comparing the performance of the agents . . . . . . . . . . . . . . . . . 53 5.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 A Player Experience Questionnaire 56 A.1 Previous game experience . . . . . . . . . . . . . . . . . . . . . . . . . 56 A.2 Experience with the AI agent . . . . . . . . . . . . . . . . . . . . . . . 57 vi Resumen Los juegos cooperativos son un desafío importante en el ámbito de la investigación de la inteligencia artificial. El estudio de agentes computacionales, para la cooperación con jugadores humanos, ha tenido un interés creciente en el campo con propuestas de diferentes juegos que presentan una variedad de desafíos. Uno de estos es Pandemic, un juego cooperativo en el que los jugadores deben organizarse para salvar el mundo de cuatro enfermedades representadas con piezas en un tablero. Este juego presenta el desafío de perseguir objetivos múltiples, información incompleta y toma de decisiones, mientras los jugadores se desplazan por el tablero para descubrir la cura a las enfer- medades y evitar perder el juego, ya sea al ser sobrepasados por las enfermedades o al quedarse sin acciones restantes. En esta tesis se presenta un agente planificador para el juego Pandemic y se describen dos experimentos que se desarrollan para evaluarlo. En el primer experimento, se evalúa el desempeño del agente al jugar consigo mismo, al variar el nivel de dificultad y la cantidad de jugadores en el juego. En el segundo ex- perimento, se evalúa el desempeño del agente al jugar con jugadores humanos mientras se utilizan dos variantes del agente: una con un módulo de reconocimiento de planes y otra sin este. Este agente utiliza un horizonte limitado y una heurística específica del dominio al efectuar su planificación; además, realiza un procedimiento de muestreo para manejar la información incompleta del juego. Al utilizar estas técnicas, el agente logra ganar alrededor del 34% de los juegos consigo mismo, lo cual es un hito en este ámbito de investigación. El módulo de reconocimiento de planes le permite al agente analizar las acciones tomadas previamente por otro jugador y determinar cuál es su meta actual. Esta información es utilizada por el agente en su proceso de planificación, al incorporar los futuros movimientos potenciales del otro jugador para elaborar planes que les beneficien a ambos. Este experimento demuestra que los jugadores perciben de manera diferente al agente cuando se utiliza reconocimiento de planes, al hacer que vii los jugadores humanos lo entiendan con mayor facilidad. Finalmente, se discuten otras aplicaciones para los métodos presentados y para la investigación futura del agente desarrollado. viii Abstract Cooperative games are an important challenge in artificial intelligence research. The study of agents that cooperate with human players has been of increasing interest in the field, with different games proposed as providing a variety of challenges. One of these games is Pandemic, a cooperative game in which the players must cooperate to save the world from four diseases represented by game pieces on a board. This game presents the players with the challenge of pursuing multiple goals, hidden information and decision making while moving across the board to discover the cure of the diseases while avoiding losing the game: getting overwhelmed by the diseases or running out of actions. In this thesis we present a planning agent for the game Pandemic and describe two experiments we performed to evaluate it. In the first experiment we evaluated the performance of the agent while playing with itself with varying difficulty setting and number of players in the games. In the second experiment we evaluated the agent’s performance by making it play with human players, using two variants of the agent: with and without a plan recognition module. Our agent uses a limited horizon during planning with a domain-specific heuristic, and performs a sampling procedure to handle hidden information. Using these techniques our agent wins about 34% of the games when playing with itself as partner, which is a milestone in this research domain. The plan recognition module allows our agent to analyze the previous actions performed by the other player and determine what their current goal is. This information is used as part of its own planning algorithm to compute a plan that supports the other player, while incorporating the other player’s potential future actions. Our experiment shows that players perceive our agent differently when it uses this plan recognition technique, making the agent easier to understand for the human players. Finally, we also discuss other applications of our techniques and future research for the developed agent. ix List of Tables 2.1 List of basic actions that can be performed by all players. . . . . . . . . 9 2.2 List of specialized actions and effects for the player roles. . . . . . . . . 10 4.1 Win rate and average number of cures discovered per game by the heuris- tic, planning and plan recognition agents over 3000 games with p players and e epidemics (up to ±1.70% for the win rate and ±0.04 for the average number of cures discovered with a confidence of 95%). . . . . . . . . . . 38 4.2 Spearman’s rank correlation coefficient between evaluated skill, under- standing and helpfulness for the planning agent. . . . . . . . . . . . . . 45 4.3 Spearman’s rank correlation coefficient between evaluated skill, under- standing and helpfulness for the plan recognition agent. . . . . . . . . . 45 4.4 Observed win rate and average discovered cures for the agents with hu- man participants in the first game played by each participant. . . . . . 47 4.5 Observed win rate and average discovered cures for the agents with hu- man participants in the second and later games. . . . . . . . . . . . . . 49 x List of Figures 2.1 Image of the game board of Pandemic depicting the cities of the game, their disease color and their connections with other cities. (Source: boardgamegeek.com, accessed: 11/11/2019) . . . . . . . . . . . . . . . 6 2.2 Pandemic’s outbreak chain example with three cities. . . . . . . . . . . 12 2.3 Graph representation of a Pandemic game in state-space planning. . . . 16 3.1 Client-server architecture implemented as the game platform. . . . . . . 22 3.2 Screenshot of the game client for Pandemic depicting the game board, players’ meeples and cards, and other game information. . . . . . . . . 23 3.3 Comparison of the values obtained for equation 3.2 produced by two different states. The value is reduced as the player moves closer to a nexus of infection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 Comparison of the values obtained for equation 3.4 produced by two different states. The value is reduced as the players concentrate cards of the same color in a single hand. . . . . . . . . . . . . . . . . . . . . . . 29 4.1 Observed win rate for the planning agent when using different role combi- nations. ‘S’ for Scientist, ‘R’ for Researcher, ‘Q’ for Quarantine Specialist and ‘M’ for Medic. Comparison of the results for each combination across different number of epidemics. . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Observed win rate for the plan recognition agent when using different role combinations. ‘S’ for Scientist, ‘R’ for Researcher, ‘Q’ for Quarantine Specialist and ‘M’ for Medic. . . . . . . . . . . . . . . . . . . . . . . . . 40 4.3 Survey answers of the participants’ previous experience with the game Pandemic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 xi 4.4 Survey answers of the participants’ impression on the agents. . . . . . . 44 4.5 Ease of use of the client interface by the participants. . . . . . . . . . . 46 4.6 Distribution of the reasons for the game to end for each of the agents. . 48 xii Index of Abbreviations In this work the following abbreviations and acronyms are used: AI Artificial Intelligence AIIDE Artificial Intelligence and Interactive Digital Entertainment STRIPS Stanford Research Institute Problem Solver xiii 1 Chapter 1 Introduction In recent years there has been a growing interest in artificial intelligence research applied to games. A reason for this is that games provide a domain in which to try and test different methods and algorithms, as they provide a controlled environment, a well defined set of rules and generally include a scoring system which eases the development of an intelligent agent [BFGM06, BNVB13]. In comparison to the use of datasets (like the iris and MNIST datasets), games provide a dynamic environment were actions have effects and rewards might not be immediate. They are a different environment that provide different tasks to accomplish when developing an intelligent agent. An intelligent agent can be understood as a computational abstraction which acts with a certain degree of autonomy in a physical or simulated environment [SW10]. Different algorithms can be applied in the development of intelligent agents. Ma- chine learning is the term coined for the algorithms in which the agent or system learns and improves through experience instead of being directly programmed to perform a task. These algorithms have become popular within the field due to the application of neural networks to a variety of problems, including games. Examples of the application of machine learning to games can be seen in the latest developments by Google, like the creation of the AlphaGo artificial intelligence which was able to defeat a human professional player in the game Go in 2016 [SHM+16] and the creation of AlphaStar which marked a reinforcement learning milestone when it defeated a professional player in the game StarCraft II five times in a row [VBC+19]. Another example of machine learning in games is the Gym project by OpenAI which provides a development plat- form for reinforcement learning algorithms by enabling the users to test their agents 2 in different virtual environments and games [BCP+16, Ope18]. The previously men- tioned examples share the characteristic of the games being adversarial in nature, be it between players or the player versus the environment, nevertheless research regarding cooperative games has grown lately, as demonstrated by the recent attention on the game Hanabi by DeepMind [BFC+19]. However, while all these approaches use machine learning to great success, the need for lengthy training periods and large repositories of data make them impractical for developing agents that play with human players when developing a game. Classical AI methods, such as planning, can be considered another approach to the development of intelligent agents, since they allow the development of agents which adapt to the chal- lenges presented by a problem, even in situations that have never been presented with before. Planning is the process by which, given an initial state, a set of goals and a set of actions with their preconditions and effects, a solution is found so that a state satis- fying the goals is reached [Wel94]. An example of this is the robot Shakey, developed from 1966 to 1972. Shakey was the first mobile robot with problem solving capabilities which used the Stanford Research Institute Problem Solver (STRIPS) program to solve a variety of problems through planning [Nil84, FN71]. Therefore, planning algorithms can be used to design an intelligent agent for a game if its state, actions, preconditions and effects can be represented logically. An example of the application of this technique in a game was done by the com- pany Monolith Productions in their game F.E.A.R where they used planning as a tool to create smarter and easier to develop artificial intelligence [Ork06]. This became a milestone as one of the first well documented cases of the application of planning algo- rithms in a “AAA” game (informal industry classification given to high-end high-cost video games), and demonstrated the great potential this kind of applications could have in systems that interact with users. An important application of planning can be found in plan recognition which can be explained as the inverse process of planning: instead of finding a set of actions to achieve a goal, the objective is to determine which goal, and plan to reach it, a set of observed actions is likely to serve in pursuit of [RG09]. As such, the recognition of an agent’s intention can be seen as a tool to improve systems that interact with human beings or autonomous agents. A similar application has already been shown to apply for the cooperative game Hanabi [Bau10], in which communication theory was used 3 to identify the intention of the players’ actions and an agent was created which could understand human players in a way that was understandable and enjoyable for them [EMS+20]. In this research we propose an agent design for a cooperative game with no means of communication between itself and the human players. This agent utilizes planning to determine the best course of action. Additionally, we propose a variant of the agent that utilizes plan recognition to test whether its cooperation with the human players is enhanced. We also report the performance of our agents when playing in simulated games with itself, as well as with human players and compare the performance between the two agent variants. The game in which the interaction between players and the planning agent takes place is the game Pandemic, which is a cooperative game in which players work to- gether to discover a cure for four different diseases while trying to prevent outbreaks1, the spreading of the diseases and running out of time [Lea08]. The game’s rules and mechanics will be described in detail in section 2.1. 1.1 Objectives The main objective of this research is to develop a planning agent for the board game Pandemic that is capable of recognizing the actions taken by human players and coop- erate with them in playing the game. To accomplish the main objective the following specific objectives were proposed: 1. Create a system that allows human players and an AI agent to participate in a game of the board game Pandemic in a controlled platform. 2. Define a planning model for the game Pandemic that allows an AI agent to ac- complish playing the game of Pandemic. 3. Develop a model of plan recognition for the game Pandemic that allows an AI agent to gain insight on the actions taken by human players and incorporate that information to its plan. 1This work initiated in early 2019, well before and entirely unrelated to, the outbreak of COVID-19. 4 4. Compare the performance of the agent with and without plan recognition when playing the game with human players and their impact in the players’ satisfaction. The mentioned specific objectives each satisfy the following goals accordingly: • The creation of a virtual board of the game Pandemic that allows the participation of human and AI agents. • The definition of the planning model for the game Pandemic. • The model of plan recognition for the actions of a player in the game Pandemic. • The results of the comparison of the models given by the metrics of player satis- faction and win rate. 1.1.1 Justification As previously stated, the application of plan recognition to cooperative environments (in which a human and an autonomous agent must interact and help each other) is a field of research that has not been studied enough. Previous application of plan recognition has been mostly focused on adversarial environments, which do not focus solely on games. Cooperation has mostly been performed through decision trees and finite state machines [Isl05] or through the application of communication theory [EMS+20]. This research presents a case of study of the application of plan recognition in human-AI interaction in games and how this application could improve the experience of human players. Also, by taking feedback from the players our work allows better comprehension of what kind of behaviours or characteristics impact the way humans feel towards cooperation and by doing so, provide a new perspective on how to improve systems that cooperate with humans. The game Pandemic was chosen because of the challenges it presents which include the cooperation between players in a game with non-verbal communication, incomplete information regarding the game state, as well as multiple goals which must be balanced out to succeed. All of these characteristics make Pandemic a challenge different from any other before and we proposed the game as a new domain for human-AI cooperation research, which has been well received by the research community [SCE19]. 5 Chapter 2 Theoretical Framework In this chapter we will explain some of the key notions required for the understanding of the proposed research. In the next sections the following topics will be explained: the rules of the game Pandemic, planning, plan recognition and Monte Carlo sampling. The rules of the game are necessary to understand the nature of the problem. Planning is used as the central strategy of the agents and plan recognition allows the development of an agent that allows interpretation of the players moves. The Monte Carlo sampling simulates outcomes for a given situation and gives an approximation on the expected result. 2.1 Pandemic Pandemic is a cooperative board game created by Matt Leacock in 2008. In Pandemic the rules of the game set the players to work together in order to discover the cure to four diseases that are plaguing the world while preventing the diseases from spreading. As it is a cooperative game, all the players win or lose the game as a team. The game was designed to be played by two to four players on a game map representing Earth. This map contains forty eight cities, as shown in figure 2.1. Each city has a specific color defining which of four diseases will emerge in that city and connections to nearby or adjacent cities (see figure 2.1). The game is comprised by a series of markers that assist the players in tracking the game’s progress. There are player meeples with different colors. Meeples assist players in keeping track of their current locations within the game map. Additionally, there 6 Figure 2.1: Image of the game board of Pandemic depicting the cities of the game, their disease color and their connections with other cities. (Source: boardgamegeek.com, accessed: 11/11/2019) 7 are colored cubes (twenty four for each of the four colors) representing diseases present on each city. The game also includes an outbreak marker and an infection rate marker. These are used to keep track of the number of outbreaks that have occurred during the game and also of the infection rate throughout the game. The game contains six research station markers which can be placed on cities. These markers indicate the sites in which discovery of the cures can be undertaken. Lastly, there are four different cure markers, one for each disease, that keep track of the diseases that have already been cured. The game also includes a variety of different cards: role cards which grant them additional specific abilities or actions during their turns; infection cards, that compose the infection deck, which are used to determine which city will suffer an infection of its colored disease at a given moment; city cards which are required to win the game and have a wide range of uses (explained in section 2.1.2), they also have the name of one of the forty eight cities of the game, their population (which has no effect on game play) and their disease color; event cards which grant the players a specific bonus and can be used at any moment; and epidemic cards which cause the beginning of an epidemic event (described below). As stated previously, the main objective of the game is that the players collabo- ratively discover the cures to all four colored diseases (in which case they all win). Nonetheless, there are three ways in which the players may lose the game: Eight out- breaks occur during the game (representing a worldwide panic), there are not enough disease cubes left when needed (representing a disease spreading too much) or the play- ers ran out of cards in their player cards’ deck when needed (representing the players running out of time). 2.1.1 Starting the game When the game starts, the outbreak counter is set to zero, no cures have been discovered and the infection rate is set to two. Additionally, an initial research station and the player meeples are placed in Atlanta, where the headquarters of the Center for Disease Control is located. Nine cards with city names on them are drawn from the infection deck. Each card represents a city where infection begins. Each of the cities will have a varying degree of infection: the first three cities get three disease cubes each, the next 8 three get two disease cubes each and the last three just one each, of their corresponding city color. Each player receives a unique role card and from two to four player cards depending on the number of players. The player card deck consists of city cards and event cards. It is separated into four to six stacks based on the selected game difficulty and one epidemic card is included into each stack. The stacks are then shuffled individually and piled up to create the player card deck. This way of shuffling and building the player card deck provides a constant rate (or smoothed randomness [Lan18]) to the epidemic events, which means that the events will be spread out somewhat evenly over the duration of the game. Finally, the player which has the city card with the highest population among their player cards starts the game. 2.1.2 Player turn In Pandemic, for each turn each player will go through the following steps: an action phase, a drawing phase and an infection phase. Each of these will be explained in detail in the following sections. Action phase In this phase a player must perform up to four game actions. These are chosen among eight possible basic actions and potentially a special bonus action provided by the player’s role. The basic actions any player can perform are listed and explained in the table 2.1. By discovering cures and treating diseases the players can eradicated a disease. A disease is considered to be eradicated if it has been cured and all of its disease cubes have been removed from the cities. When a disease is eradicated any further infections by that disease to a city will not add any disease cubes. Additionally, the players may perform specialized actions depending on their role. Each of the specialized actions and effects different roles have on the game are listed and explained in the table 2.2. 9 Table 2.1: List of basic actions that can be performed by all players. Action name Description Drive/ferry Move to a neighboring city connected on the map by a white line. Direct flight Move to a given city, discarding a card with that city’s name on it. Charter flight Move to any city, discarding a card with the current city’s name on it. Shuttle flight Move from a city with a research station to another city that also has a research station on it. Build research station Place a research station marker on the current city, dis- carding a card with the current city’s name on it. Treat a disease Remove a disease cube or, if the disease has been cured, all of the same color from the current city. Share knowledge Give to or receive from another player the a city card if both are in that city. Discover cure Discover a cure to a disease by discarding five cards of that disease’s color in a city with a research station. Drawing phase In this phase, the player must draw two cards from the player cards deck. If at any time the player holds more than seven cards then an exceeding amount of them must be used or discarded. When a player draws an epidemic card from the player deck an epidemic occurs. The infection rate marker increases its value by one and the current infection rate might increase: it is increased by one at the third and fifth epidemic. The bottom card of infection deck is drawn, the city indicated by the card gets infected by three disease cubes of its color and the card is discarded. Finally, the discard pile of the infection deck is shuffled, put on top of the infection deck and the epidemic card is removed from the game. Infection phase In this final phase, a number of cards equal to the current infection rate is drawn from the top of the infection deck. Each of the cities indicated by the cards gets infected by one disease cube of its color. Then the game continues with the beginning of the turn 10 Table 2.2: List of specialized actions and effects for the player roles. Player role name Description Contingency planner Can take an event card from the discard pile and store it on their role card. Only one of them may be stored at a time. After use, the card is removed from the game. Dispatcher Can move any player’s meeple to any other city in which there is another player and also move it as if it was its own. Medic Removes all cubes of the same color when treating a disease using only one action. If a disease has been cured then automatically treats the disease when being in a city without consuming an action. Operations expert Can build research stations in its current city without discarding any card. Also, once per turn can move from a research station to any city by discarding a city card. Quarantine specialist Prevents outbreaks and the placing of cubes in their current city and all neighboring cities, except during the game setup. Researcher Can give any city card to other players in the same city and other players can also take the cards in their turn. Scientist Requires only four city cards of the same color (instead of five) to discover a cure for a disease. 11 of the next player to the left. The disease cubes are added to a city by infections and stay in the city until the players remove them through an action. Nevertheless, there can only be a maximum of three cubes of the same color in each individual city. If an infection would add more cubes than the maximum three, then the city stays at three and an outbreak occurs in the city. When an outbreak occurs, the outbreak counter increases by one and all the surrounding cities add one cube of the outbreak’s disease color to their own. This may cause another outbreak in one or more of the neighboring cities causing a chain reaction. In a chain reaction, each outbreak must be solved individually, but without causing outbreaks in any cities which have already been part of that same outbreak chain, as shown in figure 2.2. 2.1.3 Overall strategy The main characteristic of Pandemic is that it presents a challenge to the players by forcing them to make decisions regarding their current objective. It forces the players to balance between containing diseases that are already spreading and discovering cures. Treating diseases can be considered a simpler task than discovering a cure, because the latter often requires card exchanges that can only occur if players are in the appropriate city. Therefore the main problem faced by the players is their cooperation and the way they decide to manage their actions in an effective way. The overall strategy of the game [Boa, Jam17] may vary depending on the roles randomly assigned to each player. Usually the players with the medic and quarantine specialist roles are best used to control infections in the cities, moving promptly to prevent outbreaks around the globe. The contingency planner, operations expert and dispatcher take on more of a support role, using their abilities to focus on helping the other players realize their objectives usually involving easing up the movement on the map. Finally, the researcher and scientist roles are best used when focusing on discovering cures, be it through sharing their cards or receiving cards respectively. When the game begins, it can be helpful for the players to focus on establishing research stations. These stations also allow players to ease their movement and facilitate disease treatment, which helps prevent outbreaks. The strategic focus of outbreak prevention is a priority since each outbreak implies overwork for the players and it 12 (a) The city at the bottom gets infected by the red disease but already has three red (b) An outbreak occurs at the bottom city disease cubes. infecting the neighboring cities with the red disease. (c) The city on the left outbreaks infecting neighboring cities which are not already part of the outbreak chain. Figure 2.2: Pandemic’s outbreak chain example with three cities. 13 is easier to treat one disease cube in the current city than one disease cube in each neighboring city. A general rule of thumb is to keep track of cards in both discard piles, since the cards discarded from the player card deck may make it impossible to achieve victory and the ones discarded in the infect pile represent cities which will not be infected until a new epidemic card strikes. On the other hand, when an epidemic card causes the infection cards to be shuffled and set on top of the infection deck the players know for sure which cities will be infected next. The players should try to use the cards instead of discarding them to the maximum hand limit. Additionally, it is important to increases the focus on discovering cures for the diseases as the game grows closer to the end since the amount of remaining actions grows smaller. While eradicating diseases is not mandatory for winning the game, it can provide some relief in the later stages of the game. Nonetheless, spending too much effort on eradicating a disease might be counter productive so the players must be vigilant of making efficient choices. Discovering an early cure for a disease is also an example of a case in which spending too much effort might be counter productive and the players may want to wait a little longer before making any rash decisions. 2.1.4 Complexity of the problem The described game poses a challenging problem when trying to create an intelligent agent for it. The generalized problem of the game itself has been proven to be NP- complete [NT12] which suggests that traditional approaches to finding a solution are unlikely to be effective. The game can be understood as a graph composed by forty-eight vertices, in which multiple variables are accounted for in each vertex. Also, the players enjoy a wide variety of possible actions which increase the complexity of the game giving a massive range of possible final states in just a single player’s turn. Another fact to take into account is the incomplete information that the players have during the game, due to the randomness of the two different decks. The effect this unknown information has on the game is that the players always need to rethink their current plans when new information is revealed and also adapt their behaviour to the new circumstances. 14 2.1.5 Previous implementations Previous work regarding a virtual implementation for the game Pandemic has been performed. In 2012, an implementation of the tabletop game was developed for the purpose of comparing how different levels of automation impacted the game [WPC+12]. This automation was defined as getting rid of the processes which might be routine to the game, like shuffling cards or setting up the board game, and acting as an impartial referee. An implementation of the game’s mechanics (which does not possess a graphic user interface) was created by Thomas Keefe and it’s available for free on GitHub1. The game has also been implemented by Asmodee Digital together with Z-Man Games to create a digital version of the game available for sale in the Steam store2, but the game allows only human players to participate. When we started developing this project, we did not find evidence of other previous intelligent agents developed for this game. However, in the international conference on the Foundations of Digital Games 2020, Konstantinos Sfikas and Antonios Liapis published their own agent for the game Pandemic, which uses a Rolling Horizon Evolu- tionary Algorithm (RHEA) to develop plans for its agent [SL20]. For their implementa- tion the authors used macro-action encoding to reduce the complexity of the plans and used their genetic algorithm to develop a better plan to be taken by the agent. This approach allowed their agent to achieve a 22% win rate in its best setup (three player and four epidemics setup) for randomized games. 2.2 Planning As mentioned in chapter 1, planning is the process by which, given an initial state, a set of goals and a set of actions with their preconditions and effects, a solution is found so that the set of goals is satisfied [Wel94]. To do this, a model is created that formalizes the problem space into its three main entities which include: the initial world model (what is true and what is false at the beginning of the problem), the goal conditions (conditions a state must satisfy to be considered a goal state) and actions. Actions have 1https://github.com/thomaskeefe/pydemic 2https://store.steampowered.com/app/622440/Pandemic_The_Board_Game/ 15 preconditions, conditions that need to be satisfied for the action to be an applicable, and effects which are the changes applied to the world model [FN71]. This way of formalizing a planning problem was first described as part of the Stanford Research Institute Problem Solver (STRIPS) program and has since became of widespread use in the form of multiple mathematical representations and algorithmic implementations. An important distinction to be made regarding actions is the difference between an specific action and an action schema. An action schema is used to describe an action that may take place with a wide variety of parameters or variables. This allows flexibility when declaring the set of actions by not having to specify the same action for different variables. Actions without free variables (or were free variables have been bound) are also called ground actions. An example example of an action schema is ‘move to X’, while a grounded instance of the schema would be ‘move to London’. By defining a planning problem as previously described, the planning problem can then be understood as a search or path-finding problem. There are two main represen- tations for this search problem which are known as state-space planning and plan-space planning, the first of which will be explained in the following subsection. The second one will only be mentioned briefly as a representation in which possible plans are used as vertices of a graph with the edges being plan refinement actions (like the addition of an action) [Wel94]. 2.2.1 State-Space planning One of the possible representations of the planning problem is to represent the problem as a finite set of all possible states and a finite set of actions which allow to the trans- formation from one state into another [BG01]. In this formulation, a planning problem can be interpreted as a graph, where the states are the vertices of the graph (and the starting vertex the initial state) and the actions are the edges of the graph [Wel94]. For the game Pandemic an example of its graph is shown in figure 2.3. This representation allows the application of known search algorithms and heuristics that may speed up the finding of a solution. An example of a possible heuristic is obtained by solving a simplified version of the problem to obtain an estimate for the complexity of solving it. With an heuristic it is possible to apply the A* (A-star) search algorithm [HNR68] which performs a best-first search on the graph to find an optimal 16 Figure 2.3: Graph representation of a Pandemic game in state-space planning. 17 solution. 2.2.2 Problem complexity As explained above, the planning process can be represented as a graph and therefore, the planning problem understood as a path-finding problem. Nevertheless, the main difference between this two problems resides in the complexity of each of them. While path-finding is known to be a polynomial time problem, planning is not, instead being known to be a PSPACE-complete problem. This difference arises from the fact that path-finding problems tend to have an associated specified graph, while planning prob- lems’ graphs are implicit to the problem and may be exponentially large [BJ11, Byl94]. 2.2.3 Planning in games Planning has been used as part of the development of artificial intelligence agents for games in recent years. One of the best examples of planning applied to games was the previously mentioned case of F.E.A.R, which used planning to decide the actions to be performed by the non-playable enemy characters when interacting with the human player [Ork06]. This system was later used by the developing company (Monolith Productions) in the games F.E.A.R 2, Condemned, Condemned 2 and most recently Middle-earth: Shadow of Mordor [Hig15]. Other games which made use of planning are Tomb Raider by the developer Crystal Dynamics [Con15] and Deus Ex: Human Revolution by the developer Eidos Montreal [Rab13]. In both games, planning algorithms were used to decide the actions of the non-playable enemy characters similar to what had previously been developed. Other companies have also used planning when developing their agents, as is the case of The Creative Assembly, developers of the Total War saga. Planning has been used in their recent releases starting with Empire: Total War3 where the game AI uses planning to manage its troops actions and movements when playing on the battlefield. This approach has been extended to other games of the saga like Napoleon Total War, Total War: Shogun 2 and Total War: Rome 24 with increasing complexity allowing for 3Interview with a developer by Adam Pavlacka. Available at: worthplaying.com/article/2008/ 12/1/interviews/57018/, accessed: August 10, 2020 4As described by Tommy Thompson in the videos series “The AI of Total War” available at youtube. com/user/tthompso and gamasutra.com/blogs, accessed: August 10, 2020 18 a more challenging experience for the players. 2.3 Plan recognition As mentioned in chapter 1, plan recognition is the problem of identifying a set of goals such that the observed actions would be part of an optimal plan for those goals [RG09]. There are two main approaches to solve this problem: the plan library approach and the domain theory approach. The first approach focuses on the use of recipes that conform a plan library. A recipe encodes partial plans (a set of observable actions) and their corresponding goals. When a set of actions is observed, the plan library is check to find a partial plan which matches the observed actions and a goal is predicted. The goals in the library can be either final or partial, with partial goals being used as part of other goals’ partial plans [Car01]. The second approach focuses on the use of the domain theory used for planning: initial state, goals and actions, but uses them as a means to understand the observations of another agent. This approach uses the observations as part of a new planning problem in which those observations are included as required actions that must be taken to satisfy the final goal [RG09, CRY15]. As such, a planning representation of a problem can be suited to handle plan recognition by modifying the goals. A planning process is applied using the different goals to develop plans, these are used to discover the goal which makes for a more feasible explanation of the observed actions. 2.4 Monte Carlo sampling Monte Carlo sampling, also called Monte Carlo method, is the process of obtaining information about a problem by performing random simulation modeling. In this pro- cess, random instances of an object of study are generated, simulated and evaluated to obtain additional information regarding its result. These results are used to gain information about the problem as, for example, its expected result [MU49]. This kind of methods have been used in a wide spectrum of problems, ranging from engineering and physics problems to economics and finance [KBTB14]. Additional, 19 making way to further more specialized methods as for example the Monte Carlo Tree Search [Cou06, YT18]. One of the advantages this kind of algorithms present is that they are an anytime algorithm, which means their execution can be stopped at any moment and the calcu- lated data so far can be used. These methods have been applied previously in various games, including card games like poker [SPPF09, VdBDR09] and Magic: The Gathering [CWP12] 20 Chapter 3 Methodology In this chapter we will explain the methodology applied to fulfill each of the objectives of this thesis. 3.1 Game platform The first objective of this thesis is to create a system that allows a human player and a simulated player to participate in a game of Pandemic in a controlled platform or environment. As mentioned in section 2.1.5, previous implementations of the game have been developed as part of different experiments and engines. However, only one of these frameworks is available online for free. While the implementation of the game by Thomas Keefe could be used to develop the game engine, it does not provide a graphic user interface in which human players can play with a simulated player in real time. The approach taken was the creation of an implementation of the game based on web-technologies to make the game playable in a browser. To this end, a client-server architecture and a user interface to facilitate the player’s control and understanding of the game were developed. The client was also used to get feedback from the players’ experiences. The game logic and server were implemented in Python. The server has the capabil- ity to display HTML webpages, images and files. The game client was developed using Unity and C#, exporting the final product as a WebGL which could be accessed by the players through the server. The client then communicated with the server through 21 regular HTTP protocols and received game information from the server using JSON structured data (as shown in the code snippet). A representation of the architecture used is shown in figure 3.1. JSON data snippet { ... "players": { "p0": { "location": "atlanta", "role": "RESEARCHER", "cards": [ "lagos", "algiers" ] }, "p1":{ ... } }, ... } The obtained result of this development is the client-server system where the client provides the user interface for the players, while the game logic is kept entirely on the server side.A screenshot of the game client can be seen on figure 3.2. 3.2 Planning model The second objective of this thesis is the definition of a planning model for the game. This was achieved by designing a mapping for the game state, the game goals and actions to an AI planning problem. This representation was then used by an agent to perform planning to win the game. In this research, a representation of the game state was developed that allows a state-space planning approach to be used. The game logic represents the game as a 22 Figure 3.1: Client-server architecture implemented as the game platform. 23 Figure 3.2: Screenshot of the game client for Pandemic depicting the game board, players’ meeples and cards, and other game information. game state and provides information regarding all possible actions for the current state. These actions can be used to obtain all neighboring states the would result from taking each specific action, without modifying the original game state. This representation is used by our planning agent which performs A* search to find a path (sequence of actions) that leads to a goal state. However, given the complexity of the problem, some minor restrictions were necessary to create a simplified version of the game without affecting the core gameplay. 3.2.1 Restricted Pandemic Compared to the original, our version of Pandemic has some restrictions. These restric- tions remove the event cards and three of the seven player roles from the game. The removal of the event cards was chosen because they could be used at any given moment (even between steps of an effect like an epidemic or between phases) and many of them had a large action space to choose from (for example, adding a research station at any city). Therefore they present a problem in the form of a search space explosion 24 when performing planning. Nonetheless, their removal doesn’t affect any of the core mechanics of the game and the overall goals and actions remain the same. However, this removal adds additional difficulty to the game because the number of maximum turns in the game is reduced (the player deck runs out of cards faster) and they were a powerful tool for the players because they allowed them to perform powerful actions once per game. The second restriction in our version of Pandemic is that three of the seven player roles were removed. The removed roles are the Contingency planner, the Dispatcher and the Operations expert. The reason for the removal of the Dispatcher and Operations expert lies in the fact that both of them have powerful movement options that caused search space explosions. The Dispatcher can move any other players’ meeple as if it was their own and can also rally a player to any other’s position, this increases the number of available movement actions greatly (and increases with a higher number of players). The Operations expert had the ability to once per turn move from a city with a research station to any other by discarding a card, this greatly increased the number of possible movement actions, especially when considering the player could also build research stations in any city without discarding any cards. The Contingency planner removal was decided upon grounds that their abilities are related to event cards (in the discard and in its hand). As these cards were removed, it left the role without a purpose so it was removed from the restricted version. With these restrictions in place, we will move on to the planning goals to be used by the agent. 3.2.2 Planning goals A goal function allows the evaluation of a state to find out whether it is a goal state or not. In Pandemic players are faced with the problem that there is a main goal (to discover all four cures) but to achieve it they need to avoid losing the game (commonly called a maintenance goal). This forces the players to deal with competing goals were they must choose at each moment which of the two goals to work towards to. Another thing to consider when performing planning, is that planning too far ahead might present different problems. First, the search space of the problem increases about seven-fold with each action taken, with each turn consisting of at least four actions. 25 This increase is the result of the possible actions the players can perform in their turns. This causes the search space to grow exponentially, making planning until the end of the game infeasible, especially on early states of the game. Also, the hidden information of the game (presented by the order of the cards in the player and infection deck) can make planning too far ahead wasteful, since making plans taking all possible scenarios into account becomes increasingly complex and many of the planned scenarios will not be used at all. Lastly, since the planning agent is expected to be able to play with human players the time the agents takes for their turn also becomes an important consideration, since human players dislike to wait too long for a computer player to take its turn. For these reasons we added an upper limit to the search to be performed. The upper limit cuts off planning after approximately 10 seconds, which after some initial tests was chosen to be after 3500 nodes of the graph have been expanded and use the best plan found so far. Two specific goals were developed to be used by the planning agent. We named these goals “discover cure” and “survive” respectively. The “discover cure”-goal is satisfied when discovering a previously unknown cure and the “survive”-goal is satisfied if the players manage to avoid losing the game after a given number of turns. Whenever our planning agent perform a planning action one of the two goals is selected and the agent tries to come up with a plan to satisfy the goal. The agent assumes all the other players are currently working on the same goal when planning their actions. In the case of the “survive” goal we used an upper bound of two turns, which means the agent will plan how to survive (treat infections and prevent outbreaks) up to two turns ahead. When performing planning, to choose the goal to be used the current state is eval- uated. If any of the players has the number of cards required to discover a cure or is missing just one card, while being able to receive a card currently in game from another player, then the “discover cure” goal is chosen. In any other case the “survive” goal is chosen. The agent performs a planning cycle whenever it must take an action and new information has become available to the players, such as when players draw cards or cities become infected. The planning usually happens at the beginning of the agent’s turn, but can also happen if, for example, a player transferred a card which causes the agent to discard in the middle of another player’s turn. After coming up with a plan, the agent will execute its actions until new information becomes available or there are 26 no more actions left in the plan. However, there is an issue to address regarding the “survive” goal: how does the agent choose among different plans when all of them manage to arrive at the set goal? The answer to this question lies in an useful parameter of the A* algorithm: the heuristic. 3.2.3 Heuristic A heuristic is necessary to improve the efficiency of the A* algorithm, since it is used as a means to give priority to nodes which are more likely to lead to the goal quickly. For our implementation of the planning agent, a domain specific heuristic was developed. This heuristic is composed of seven different equations which evaluate different aspects of the game. The purpose behind these evaluations is to give an estimate of the number of actions remaining until the agent can win the game or satisfy a specific goal. To evaluate a state, the heuristic takes into account the following factors: cures discovered, number of disease cubes in cities, cards in the players’ hands and in the discard pile, distance of the current player to locations of interest and distances between cities. The number of cures discovered (or contrarily, remaining to be discovered) is directly related to the main objective of the game and is the main indicator of how close to winning the game the players are. The number of current disease cubes in cities is an indicator of how much the diseases have spread, which has a direct effect on the number of outbreaks. The distance of a player to the nearest city with a research station establishes the minimum number of actions it would take for the player to be able to discover a cure. The distance in relationship to city with disease cubes allows us to evaluate how many actions in average would take the player to arrive at an infected city. The cards in each players’ hands allow the calculation of how many cards are missing until the players are capable of discovering another cure, and the cards in the discard pile allows to keep track of the number of cards remaining for each color. Lastly, the distances among cities allow us to estimate how costly in terms of actions it is for players to move around the board. For the description of the heuristic the following expressions will be used. The expression active (k) evaluates whether the k colored disease is currently active (its cure has not been discovered). The expression distance (p, c) calculates the distance between player p and city c. Similarly, infection (c) calculates the number of disease 27 cubes present in city c. The expression cards (p, k) counts the number of cards of color k that the player p has in its hand. Meanwhile, discard (k) counts the number of cards of color k currently in the discard pile. Lastly, the constant Rp is the number of cards required to discover a cure by player p (is always 5 except for the Scientist player with value 4). To perform counts among different elements of the game, four sets are used in the expressions. The set Color contains the colors of the four diseases, which is used to evaluate whether they are active. The set City contains the twenty four cities spread across the map, which is used to calculate distances and infections among them. The set Player contains the players participating in the game, which is used to count their cards in hands and their distance to cities. Finally, the set Crs contains the cities with a research station built on them, used in a similar fashion as the City set. The first equation in our heuristic evaluates the number of cures that remain to be discovered in the current state. Equation 3.1 calculates the number of diseases that still require a cure to be discovered to win the game by counting the number of active diseases. ∑ hcures = active (k) (3.1) k∈Color To evaluate the current position of the players, two different equations were de- veloped. The first one, seen in equation 3.2, evaluates the positions of the players in relationship to the cities as the average distance to each city weighted by the amount of disease cubes it has. The purpose of this equation is to motivate planning towards positions closer to nexuses of infection, which is necessary when trying to prevent out- breaks. An example of this equation is shown in figure 3.3 where two different states are compared. The second equation, seen in equation 3.3, evaluates the positions of the players with regard to their proximity to the nearest research station. The purpose of these equation is twofold: it motivates players to stay near research stations, which allows them to quickly discover a cure when the right cards are available, and also motivates the construction of more research stations distributed across the board. ∑ ∑ c∈City∑distance (p, c) · infection (c)hdsurv = (3.2) infection (c) p∈Player c∈City 28 (a) Player is closer to a nexus of infection. (b) Player is farther away from a nexus of The heuristic value for hdsurv in this state infection. The heuristic value for hdsurv in is 1.5. this state is 2.25. Figure 3.3: Comparison of the values obtained for equation 3.2 produced by two different states. The value is reduced as the player moves closer to a nexus of infection. ∑ hdcure = min distance (p, c) (3.3) c∈Crs p∈Player To evaluate the value of the cards in the players hands, equation 3.4 is used. This equation calculates the minimum number of cards missing to discover a cure among players’ hands for each disease color, while ignoring diseases which have already been cured. The purpose of this equation is to motivate players to concentrate cards of the same color in one of the player’s hand to get them closer to discover a cure. An example of this equation is shown in figure 3.4 where two states are compared. Another factor to take into account is the number of cards remaining of each color since, if too few cards of a certain color remain, it might be impossible to discover a cure for the disease. For this reason the equation 3.5 was included, which increases in value by the number of cards in the discard pile for which the cure has not yet been discovered. An increase in value in terms of the heuristic means that the players are farther away from winning the game. ∑ hcards = active (k) · min Rp − cards (p, k) (3.4) p∈Player k∈Color ∑ hdisc = active (k) · discard (k) (3.5) k∈Color 29 (a) A single player has the red cards. The (b) Both players have red cards. The heuristic value for hcards in this state is 11. heuristic value for hcards in this state is 12. Figure 3.4: Comparison of the values obtained for equation 3.4 produced by two different states. The value is reduced as the players concentrate cards of the same color in a single hand. Another important value to take into account is the total number of disease cubes on the board, as treating the disease cubes is imperative to avoid losing the game. To this end, equation 3.6 calculates the total of disease cubes present across all cities, and thus motivates players to treat the diseases. As for building research stations, this is important not only because it allows players to discover cures for diseases, but also because players can move directly between cities with research stations on them. This reduces the number of actions needed to get from one city to others and has a great impact in the long run. To evaluate these distances, equation 3.7 is used. This equation calculates the average current distance from each city to each of the other cities and then multiplies it by a linearly decreasing value which represents the number of turns remaining. This value is not constant since cities with research stations allow players to travel between as if they were connected. This motivates players to build research stations in cities which would reduce the travelling actions needed to go from one city to another, however, the relevance of this value decreases over time. ∑ hinf = infection (c) (3.6) c∈City 30 ∑ ∑ distance (c1, c2) · turnsremaininghdist = (3.7) 48 · 47 turns ∈ ∈ maxc1 City c2 City Finally, with the seven equations specified, it is possible to specify the heuristic used to evaluate each state. However each of these values must be multiplied by a specific weight which impacts its relevance in relationship to the others. The final equation can be seen in equation 3.8. Additionally, only one of the distance equations is applied when evaluating a state depending on the goal, with the distance to the nearest research station being used for the “discover cure” goal. The values for the weights used in this final heuristic are the result of a grid search of values in which different values for each parameter were tested. The values to be used in the grid search were chosen from preliminary tests in which for each of the equations of the heuristic different values were tested and chosen. hstate = 0.5 · hdsurv + 0.5 · hdcure+ (3.8) 1 · hcards + 0.5 · hdisc+ 0.6 · hinf + 0.6 · hdist + 24 · hcures This heuristic is applied when evaluating the states as part of the A* algorithm. However, there is still a last question regarding how to tackle the planning problem which is how to handle the hidden information of the game. 3.2.4 Evaluating unknown information The evaluation of hidden information poses a challenge to the planning process because the outcome of a specific plan is unknown. To tackle this problem, a Monte Carlo sampling approach was taken. Our planning agent performs planning using only the currently available information until a goal state is reached. After a goal state is reached, random simulations are performed simulating the possible draws that could happen during the plan. These states are evaluated using the heuristic function to calculate an average expected value for the state resulting from such plan. When considering the possible draws that may occur, while the exact ordering of the cards is unknown, there is partial information about the order which is used 31 when generating the possible draws for the algorithm. This information is often used by veteran players to counteract the uncertainty to some extent. The information regarding the player deck comes from the partial distribution of epidemic cards in it. When assembling the player deck, the cards are split into piles and each of the piles gets an epidemic card. While each pile is shuffled individually, the piles are just stacked to form the deck, which gives players certain ability to predict when an epidemic will or will not show up (depending on the number of cards remaining in the deck and whether the current pile’s epidemic has already been drawn). The information regarding the infection deck comes from the epidemic mechanic, which shuffles the infection deck’s discard pile and puts it on top of the infection deck. What this implies is that, while the exact order is unknown, players can be certain that the next cards can only come from a much smaller pool of possible cards, which are precisely the ones that were previously put into the discard pile. The information is used by our agent when generating the possible draws for both decks when performing a simulation of the outcome for a specific plan. By simulating the draws using the known information, a more precise estimation of the expected heuristic value for the plan is obtained. This expected value for a state allows our agent to compare between multiple equally valid plans and choose the one with the best expected value. An example of this would be the agent having to choose between treating diseases or transferring cards with its ally: while transferring cards would allow the agents to discover a cure, treating diseases could be equally important. If the players are one or two outbreaks away from losing the game and one of the next cards might trigger a game-losing outbreak, the agent would consider the expected value of that scenario to be much worse than the benefit of discovering the cure. All of the previously explained mechanics were used to implement the planning agent. The planning agent is the obtained result of creating the planning model for the game Pandemic and fulfills the second objective of this research. The results of the evaluation of this agent will be presented in section 4.1. 3.3 Plan recognition The third objective of this thesis is to develop a plan recognition model for Pandemic. The main goal to achieve with this objective is to allow the planning agent to analyze 32 and react to the actions of its cooperator and deduce what may be the current goal of that player. The agent can use this information to predict which actions the other player may perform in the future, to be able to cooperate with them. As explained in section 3.2, a state-space representation is used to describe the planning. When performing the plan recognition we use the same representation used by the planning agent. The agent has knowledge of the actions the other players took and the different goals are evaluated to choose the one which best matches the observed actions. This approach basically transforms the problem of plan recognition to one of planning [RG09]. When performing a planning cycle, before starting with its plan, the plan recognition agent performs a recognition process on the actions taken by the player before it. The agent takes the state in which the previous turn started and performs a planning cycle on it using both the discover cure and the survive goals. For each of the goals a different action sequence is obtained, these action sequences are the equivalent of the agent figuring out which plan it would have executed were it in the position of the other player. The goal which produces the action sequence most similar to the observed actions taken by the player is chosen as the current goal for that player. The actions are evaluated in terms of their effect in the game, for example, treating the red disease would be considered equal to treating the blue disease in terms of purpose. Also, in case the player discovered a cure last turn, the actions to be considered would be the ones taken after the cure was discovered, since the goal of “discover cure” was already satisfied and what is important is the future goal and plans. Once the agent recognized the goal the other player is trying to achieve, it performs its planning cycle as normal, with a single exception. When performing its planning process, the plan recognition agent will use the recognized goal as part of its planning when building up the action sequence the other player will do. The prediction of what the other players’ current goals are, allows the agent to make assumptions of the other players future actions and come up with the best plan to fulfill their goals. This process was implemented in our plan recognition agent and is the obtained result of creating our model for plan recognition for the game Pandemic. 33 3.4 Data collection and comparison The final objective of this thesis is the comparison of the developed agents with and without the plan recognition model when playing the game with a human player, mea- suring its win rate and the player’s satisfaction. By the player’s satisfaction we mean the player’s perception in regard to how well the agent performed in game and how well the agent reacted to the player’s actions. This evaluation allows us to evaluate how well our agents are able to cooperate with a human player, and measure the impact that plan recognition has on the way intelligent agents cooperate and play games with human players. Our hypothesis is that plan recognition will improve the cooperation between the human and computer players: increasing the win rate and giving the human player a more enjoyable experience. Two treatments were used to perform this evaluation: the intelligent agent using plan recognition and the basic planning agent as a control group. The participants are randomly and blindly assigned to one of these two treatments. Once each game is finished, a questionnaire is presented to the participant with questions regarding the players’ previous experience with the game and questions regarding their interaction with the intelligent agent and how the agent reacted to their actions. The questionnaire presented to the participants is available in the appendix A and uses a 5-point Likert scale for evaluation of the agents. With the purpose of reducing the noise produced by the randomness of the in game decks, five specific seeds for the random number generator were used to shuffle the card decks. The decks to be used in each game were randomly assigned at the beginning of each game. This process was also done to assign player roles and for the purpose of the experiment only the Scientist, Researcher and Medic roles were used. The Quarantine Specialist role was excluded, because the Quarantine Specialist and the Medic roles serve similar in game purpose, they are roles specialized in fighting the spreading of the diseases. Lastly, out of the three possible combinations available for those roles, only the Scientist-Medic and the Researcher-Medic combinations were used, to reduce to the noise generated by having multiple possible combinations. The game platform developed in section 3.1 was made available online for volunteers to participate in the study by playing the game with an intelligent agent. The game was available online from June 8, 2020 until July 6, 2020 at the address: pandemic. 34 slothlab.info. Before participating in the study, participants were presented with an informed consent form containing the information related to the experiment which they had to read and accept in order to participate. Knowledge of the study was propagated through social media, specifically: Twitter, Facebook and Reddit. In Facebook and Twitter the post was made as public, asking volunteers to participate and if they wanted, share the publication or webpage with other people who might be interested. In Reddit a thread was opened in the subreddits boardgames and samplesize asking the community to participate in the study. These subreddits were chosen because of the areas of interest by the participants of such communities, which made them good subjects for the study. With this experiment we obtained the results for the final objective of this thesis, which will be presented in the following chapter. 35 Chapter 4 Results In this chapter we present the results achieved from the development of this thesis. This chapter is divided in two main sections: first the evaluation of the developed agents in individual tests, and second the results obtained from the experiment run with human participants. 4.1 Agent evaluation To evaluate the agents developed, we used the same metrics that will be used when evaluating the results with human players: win rate and average number of cures dis- covered. The tests were performed with varying number of players (two to four) and epidemics (four to six), to see the impact these variables have in the performance of our agents. Additionally, to provide a baseline for comparison we implemented a simplified version of our agent called the “heuristic” agent, which decides its next action using the heuristic function and chooses the action which leads to the best possible state. Another, simpler “random” agent was implemented to see what would be the expected result for a player which just takes random actions each turns. These agents were cho- sen because there were no other at the time of development implemented for this game in literature which could be used as a direct comparison for our agents. The agents were made to play 3000 games for each combination of number of players and epidemics. In each game, the players were instances of the agent being evaluated. Therefore for each agent a total of 27000 games were played and their results stored. As expected, the random agent did not perform well, and did not win a single game 36 in any setup. It only managed to discover 5 cures over all 3000 games in the “easiest” setup (two players and four epidemics). The results for the remaining setups decrease in the number of cures discovered. Because of this, it can be concluded that Pandemic is not a game that can be won by luck alone and some kind of reasoning is required to win the game. The results for the heuristic agent are shown in table 4.1. While there is a noticeable improvement compared to the random agent, the agent performed poorly overall which might indicate Pandemic is not a game that can be won by heuristic alone. A trend appears in both metrics for the heuristic agent: as the number of epidemics and the number of players increase, the win rate and average cures decrease. The decrease of both metrics from the increase of epidemics was to be expected as, in terms of the game, the number of epidemics in the game is used to select the difficulty of the game (four for easy, five for normal and six for hard). However, the lower win rate with a higher number of players was unexpected. An explanation for this is that, by increasing the number of players, the number of turns per player reduces and therefore cooperation becomes a bigger requirement to succeed. This, in turn, increases the difficulty of the game and reduces the win rate and average number of discovered cures. Lastly, it is worth noting that small changes in the average number of discovered cures seem to have a strong impact in the win rate: with a change of just 0.08 in the average number of cures making a difference of 1.67% in the win rate. The results obtained for the planning agent are shown in table 4.1. The first thing to notice is the increase in both win rate and average discovered cures in comparison to the heuristic agent which is used as baseline. The planning agent manages to win around one third of games, while on average discovering 2.80 cures per game in the two- player four-epidemic configuration. Similar to the heuristic agent, it can be observed that the increase in number of players and number of epidemics decrease both metrics. When analyzing the results of the planning agent in terms of role combinations (the randomly assigned roles the agents had during each game) there are also some trends that appear. This results can be seen in figure 4.1. We will examine these results by the number of players participating in the game. In the two player games it is possible to see that the games in which the Scientist role was present had a higher win rate than the other scenarios. This can be explained by the fact that the Scientist requires fewer cards of the same colored disease to discover 37 Figure 4.1: Observed win rate for the planning agent when using different role com- binations. ‘S’ for Scientist, ‘R’ for Researcher, ‘Q’ for Quarantine Specialist and ‘M’ for Medic. Comparison of the results for each combination across different number of epidemics. 38 Table 4.1: Win rate and average number of cures discovered per game by the heuristic, planning and plan recognition agents over 3000 games with p players and e epidemics (up to ±1.70% for the win rate and ±0.04 for the average number of cures discovered with a confidence of 95%). Setup Win rate Average number of cures p e Heuristic Planning Recognition Heuristic Planning Recognition 2 4 2.27% 34.27% 34.00% 0.90 2.80 2.80 2 5 0.83% 17.63% 17.60% 0.69 2.27 2.27 2 6 0.37% 5.67% 5.40% 0.52 1.67 1.66 3 4 0.60% 21.73% 18.07% 0.82 2.41 2.33 3 5 0.10% 8.73% 8.23% 0.57 1.83 1.73 3 6 0.03% 2.53% 2.27% 0.47 1.30 1.24 4 4 0.03% 15.60% 12.07% 0.49 2.22 2.14 4 5 0.00% 5.13% 3.90% 0.32 1.54 1.47 4 6 0.00% 1.70% 1.37% 0.21 0.92 0.88 a cure (four instead of five), therefore reducing the number of requirements to win the game. In the scenarios with the Scientist role, the ones with the Medic had a higher win rate than the ones with the Quarantine Specialist and the Researcher (with win rates 49.50%, 44.62% and 44.10% respectively). While the Scientist-Researcher combination has two abilities useful at discovering cures (since the Researcher can give cards to the Scientist), they lack any special abilities which allow them to stop the spread of the diseases which can explain why the other role combinations (Scientist-Quarantine Specialist and Scientist-Medic) have a higher win rate. As to why the Scientist-Medic combination has a higher win rate than the Scientist-Quarantine Specialist, this can be explained by the fact that the Medic can treat diseases without consuming actions once the cure for the disease has been discovered. Since the Scientist can quickly discover cures this can explain why their combination has the highest win rate of them all. Among the two player combinations that do not have a Scientist among them, the ones with the Researcher are the ones with the highest win rate. This can be explained because the Quarantine Specialist-Medic combination (12.36% win rate) lacks any kind of benefit which assists in discovering cures which makes it difficult for them to discover all the cures to win the game. Among the two combinations with the Researcher, the Researcher-Medic is the one with the highest win rate (29.98% compared to 26.21%), this can be explained in the same way as for the Scientist-Medic, which seems to support 39 the idea that the Medic is slightly superior to the Quarantine Specialist role as a disease fighting role. In the three player games we can divide the role combinations into those that have a double cure discovery focus (Scientist-Researcher-X) and the ones with double disease fighting focus (X-Quarantine Specialist-Medic). Of these two groups the first one has a higher win rate, this can be explained because of how the Scientist-Researcher dynamic helps players discover cures faster and having a third player which focuses solely on fighting the disease helps them keep the diseases at bay. In these combinations, once again, the combination with the Medic shows a small advantage (31.33% compared to 30.16%) over the one with the Quarantine Specialist. In the combinations with the double disease fighting focus the one with the Researcher has a higher win rate than the one with the Scientist (16.44% compared to 8.90%). This can be explained by the fact that, with a reduced number of turns per player (because of a larger number of players), a role which has more options to interact with the other players to discover cures has a better chance of discovering all the cures. Lastly, since our version of Pandemic only contains four roles, there is only a sin- gle case to examine for the four player games. This four player game produces results similar to the three player games with the Researcher-Quarantine Specialist-Medic com- bination, with a win rate of 15.60%. This is an effect of the reduced number of turns for each player and the fact that each player has to wait three full turns before having the chance to act again, which increases the difficulty of the game since the future state of the game when the turn comes back to a player is harder to predict. As for the plan recognition agent, the results of its evaluation is shown in table 4.1. One of the first thing to notice is that for the plan recognition agent, the results for two player games are similar to those of the planning agent. However, as the number of players increases, the overall win rate and average number of discovered cures is worse in comparison to the planning agent by up to 3.66%. This can be explained by the fact that, when playing games with more players, assuming players will stick to their original plan after two or three turns is counterproductive. In the cases of games with three or four players, players tend to rethink their plans and goals every time they take their turn, because the game state has changed a lot in between. Because of this, in games with a higher number of players, the plan recognition agent performs worse than the planning agent. 40 Figure 4.2: Observed win rate for the plan recognition agent when using different role combinations. ‘S’ for Scientist, ‘R’ for Researcher, ‘Q’ for Quarantine Specialist and ‘M’ for Medic. Finally, the results for the plan recognition agent in terms of role combinations can be seen in figure 4.2. It is possible to observe that these results present the same trends for the two, three and four players games as for the planning agent. This was expected, as both of the agents use the same planning algorithm and heuristics to develop their plan. Additionally, the plan recognition performed by the agent should recognize the action pattern performed by the previous agent easily, as it is the same it would have executed. 41 4.2 Experiment results In this section we will analyze the results obtained from the experiment ran with human participants explained in section 3.4. We were able to recruit a total of 116 participants which played 187 games. However, for the purpose of this analysis we will only take into account the first game played by each participant, for a total of 116 games. Of these 116 games, only 77 participants answered the survey after the game, but not all of these participants answered all the questions. A total of 90 participants were assigned to the planning agent and 97 to the plan recognition agent. However, after the previously mentioned filters, only 32 participants answered the survey for the planning agent and 45 for the plan recognition agent. As for the demographic of the population, we asked the participants about their previous experience in the game Pandemic. The results to these questions can be seen in the figure 4.3. Participants were asked how many times had they played the game before (figure 4.3a), how long ago was the last time they played (figure 4.3b) and how often do they manage to win the game (figure 4.3c). It is possible to see that the majority of the participant population is formed by people with little to no experience in the game: about two thirds of the population have played fewer than eleven games, with over a third being participants who have never played the game. This division is strongly correlated (Spearman’s rank correlation coefficient of 0.76) by how often the players win the game, with about a third of the players winning above one in four games. Another thing to notice is that, of the players with any previous experience in the game, about half of them have not played the game in over a year. 4.2.1 Survey results In this subsection we present the results of the evaluation done by the players. After each game players were asked to evaluate the agent they played with. The results for this evaluation are shown in figure 4.4. The agents were evaluated in terms of their perceived skill (figure 4.4a), whether the participants understood what the agent was doing (figure 4.4b) and its helpfulness towards the participant (figure 4.4c) using a 5-point Likert scale. It can be observed that while overall the plan recognition agent was rated higher by the participants in many cases, this grading was not consistent. When evaluating 42 (a) Previous experience with the game by num- (b) Time passed since the last time the game ber of times played. was played. (c) How often the participant wins a game. Figure 4.3: Survey answers of the participants’ previous experience with the game Pandemic. 43 the skill of the agents, the plan recognition agent presents a higher number of ratings on the “good” qualification, however seems to perform worse at both extremes (having more “very bad” and fewer “very good” evaluations than the planning agent). A similar behaviour can be seen in how the participants rated their understanding of the agent, with the plan recognition agent performing worse at both extremes but having better results at the middle-positive qualifications. In contrast, the helpfulness evaluation seems to have a different behaviour, with the plan recognition agent performing better in the lower grades (has less “never” and “rarely” evaluations) and also a higher number of evaluations in the “always” (highest) grading. While this qualitative examination supports our hypothesis, we performed further statistical analysis of the obtained data to get more accurate conclusions. First, we examined the evaluations performed by the participants for correlation, by calculating their Spearman’s rank correlation coefficient. This coefficient is used to measure the statistical dependence between rankings of two variables. Calculating these coefficients allows us to measure the correlation between the skill, understanding and helpfulness in the evaluations done by the participants. The coefficients for the planning agent and the plan recognition agent are shown in table 4.2 and table 4.3 respectively. From these results it can be concluded that the values for skill and helpfulness show a strong correlation and a moderate correlation exists between understanding and help- fulness. The strong correlation give us insight as to how the participants see helpfulness as a defining characteristic of a skilled player. The participants may consider that a skilled player is more aware of the current game state and more proactive to assist its teammates which would explain the correlation. Regarding the correlation between skill and understanding there is a difference between both agents, with the planning agent showing a moderate correlation and the plan recognition agent a weak one. It is interesting that for the plan recognition agent with the higher correlation between skill and helpfulness (0.76 compared to 0.69), the correlation between understanding and skill is lower. This might imply that, for players, it is more important that a player is helpful than being able to understand it when assessing their skill level. To evaluate the differences between how the participants rated the agents, the data was processed using two different tests. We compared the evaluations for the three different survey questions using a Mann-Whitney U test to test for difference in quali- tative value, and a χ2 test to test for a difference in distribution, using Holm-Bonferroni 44 (a) Skill level of the agent perceived by the par- (b) How often the participant understood the ticipant. agent. (c) How often the agent was helpful towards the participant. Figure 4.4: Survey answers of the participants’ impression on the agents. 45 Table 4.2: Spearman’s rank correlation coefficient between evaluated skill, understand- ing and helpfulness for the planning agent. Skill Understanding Helpfulness Skill - 0.44 0.69 Understanding 0.44 - 0.53 Helpfulness 0.69 0.53 - Table 4.3: Spearman’s rank correlation coefficient between evaluated skill, understand- ing and helpfulness for the plan recognition agent. Skill Understanding Helpfulness Skill - 0.15 0.76 Understanding 0.15 - 0.35 Helpfulness 0.76 0.35 - method to account for multiple testing. The Mann-Whitney U test showed no statis- tically significant difference for any of the three metrics. However, the χ2 test showed that a statically significant difference (p < 0.05) in the “understanding” metric but for 3 the remaining metrics the results were inconclusive. Finally, even though not relevant to the objective of this thesis, the participants were also requested to evaluate how easy to use the game client was. The results to this question can be seen in figure 4.5. The results show that more than half of the partici- pants didn’t considered the game client difficult to use, but none of them considered it very easy to use. Also, the percentage of participants that consider the client difficult (29.9%) or very difficult (7.8%) to use is considerable and further improvement should be considered. 4.2.2 Performance results In this section we will analyze the success, in terms of the game, participants had when playing the game with the agents. The success will be measured in terms of the average number of won games and average number of discovered cures per game, which are the most relevant metrics to the performance of the agents. To this end, the first game played by each of the 116 participants will be grouped according to which of the agents were they assigned to. Using this groupings, a total of 51 participants were assigned to 46 Figure 4.5: Ease of use of the client interface by the participants. 47 Table 4.4: Observed win rate and average discovered cures for the agents with human participants in the first game played by each participant. Agent n Win rate Average cures Planning agent 51 25.49% 2.49 Plan recognition agent 65 27.69% 2.72 the planning agent, while the remaining 65 were assigned to the plan recognition agent. The obtained win rate and the average number of discovered cures for the agents are shown in table 4.4. The results seem to indicate that the plan recognition agent outperformed the planning agent, even though just slightly, in both of the metrics used. A Student’s t-test was applied to determine if there was a difference in the average number of discovered cures between the agents, and did not find a statistically significant difference. It can be pointed out, that the win rate was lower when the agent played with human participants than when it played with itself: 25.49% in comparison to 34.27% for the planning player and 27.69% in comparison to 34.00% for the plan recognition agent. This can be explained by the fact that a large portion of the participants did not have much experience with the game. Another factor to take into account is that the participants were not been accustomed to the game platform and playing with the agent. The reasons why each of the games ended for the different agents are shown in figure 4.6. In the observed results there is a difference regarding the number of times the players lost the game due to outbreaks: the number of times the players lost by outbreaks is lower in the plan recognition agent with an increase in the number of times players lost due to running out of time. However, this difference is not statistically significant. When considering only the second and later games of participants which played more than one game the results change dramatically as shown in table 4.5. For these games the win rate and average discovered cures for both of the agents went up and is even higher than those registered for the agents playing with themselves. The observed win rate for the planning agent is higher than the one of the plan recognition agent, but what is interesting is the fact that the plan recognition agent had a bigger observed average number of discovered cures. Nonetheless, this difference is not statistically 48 Figure 4.6: Distribution of the reasons for the game to end for each of the agents. 49 Table 4.5: Observed win rate and average discovered cures for the agents with human participants in the second and later games. Agent n Win rate Average cures Planning agent 39 41.02% 2.92 Plan recognition agent 32 40.63% 3.09 significant. 4.2.3 Qualitative analysis of the agents In this section, the game logs for the worst and best rated games of the planning and plan recognition agent are evaluated. The purpose of this is to gain insight as to the possible reasons why a player gave a specific rating to the agent and, because of this, only participants with previous experience about the game will be considered. For the planning agent the worst rated game is the game log 1671. In this game the participant was assigned the Medic role and the planning agent the Researcher role. The random decks #3 were assigned to this game. The game was characterized by an early discovery of the red cure as a result of the agent transferring its cards to the human player and a rush to eradicate the disease by the medic. This was followed by a wave of black outbreaks at the center of the map, which the medic had to attend to and the discovery of the black cure by the planning agent. After the discovery of the latest cure, the medic player proceeded to eradicate the disease, which helped the players reduce the cities they had to worry about. Afterwards the players managed to regroup on the American continent which allowed the planning agent to transfer its yellow cards to the human player, forcing the player to discard its less useful cards. With three cures discovered, the players focused on discovering the last (blue) cure and winning the game. While the game had a successful resolution, the agent was rated poorly which raises questions as to why. However, observing the game log it is possible to observe that the planning agent sometimes “stalled” its turn waiting for new cards, performing actions that the human player could see as wasteful (e.g: moving from London to New York, to treat a disease and then moving back to London). The fact that the agent forced the human player to discard might not have been well received by the participant, since wasting cards is usually a poor strategy in the game. 1Game ID: 5a939559ed263d82 50 The highest rated game for this agent is the game log 1682. In this game the participant was assigned the Scientist role and the planning agent the Medic role. The random decks #4 were assigned to this game. The game was characterized by a good distribution of assignments with the planning agent focusing on treating the disease while the scientist focused on containing a portion of the map and discovering cures. The scientist managed to discover the cure for the black disease relatively early in game, allowing the medic to eradicate it further ahead. The medic discovered the cure for the red disease, followed shortly by the scientist discovering the cure for the yellow disease. However, at this point both players had used blue cards as part of their movement actions which made it impossible to discover the last remaining cure and the players lost the game. In its last turn the planning agent managed to eradicate the red disease. Analyzing the log, the reason behind the good rating of the agent might come from the fact that it served its role with great dedication. As the medic the agent was expected to focus on treating the diseases, which the agent did diligently and even managed to eradicate two of the diseases. The eradication of diseases is not required to win the game, but allows players to roam more freely without worrying about those diseases and its performance might have impressed the participant to give it a good rating. For the plan recognition agent the worst rated game is the game log 33. In this game the participant was assigned the Medic role and the plan recognition agent the Scientist role. The random decks #2 were assigned to this game. This game was characterized by a difficult start with three outbreaks which forced the participant to move quickly to treat the spreading diseases. The scientist remained near the research station at Atlanta fighting diseases and then found the cure for the black disease. The medic then proceeded to fight the disease in the central portion of the map and managed to eradicate it. In the meantime the agent continued its watch near Atlanta and discovered the cure for the red disease before moving to Seoul, regrouping with the medic, and building a research station. The medic used this new research station to discover the cure for the blue disease. However shortly after this, the yellow disease reached a critical point in which it spread without control and caused a chain of outbreaks which caused the players to lose the game. In the case of this game, the main reason behind the rating given by the participant could be from the focus the agent had toward discovering cures 2Game ID: 90bb2d5545cf1ce5 3Game ID: 77bd9d679a9a31e3 51 instead of helping out the participant with treating the diseases. The agent spent too much time treating diseases in cities which posed little risk instead of engaging with the medic to treat highly affected areas, in the end this allowed diseases to spread dangerously in a specific part of the map causing the players to lose the game. Finally, the highest rated game for the plan recognition agent is the game log 1714. In this game the participant was assigned the Medic role and the plan recognition agent the Researcher role using the random deck #5. This game was characterized by a starting big transfer of cards from the plan recognition agent towards the participant, which allowed the participant to discover the cure for the black disease by its second turn. A series of outbreaks occurred at different locations but the players managed to treat the spreading diseases. Afterwards the agent managed to discover the cure for the red disease and the participant the cure for the yellow disease. Afterwards another series of outbreaks occurred through the map, but the players managed to treat the diseases and then group together for a series of card exchanges to happen. Finally the agent managed to discover the last missing cure and win the game. In this game the agent managed to keep an active role throughout the game, helping the participant fight the diseases and trading cards actively to discover two of the four cures. This behaviour could explain the grading done by the participant which was the highest rated game for this agent. From the analysis of these games we can conclude what kind of behaviours partic- ipants seemed to regard as positive or negative. Participants tended to grade poorly when agents performed actions that could be considered wasteful: either allowing cards to be discarded, not participating actively in what the participant considered the pri- ority goal and also performing actions with perceived low value. On the other hand, agents that used actions more effectively and performed noticeable actions for the par- ticipant were rated higher. Transferring cards for the participant to discover a cure seems to improve the rating, as long as it does not force the player to discard any cards. 4Game ID: 07f5221758f38062 52 Chapter 5 Conclusions In this chapter we will present the conclusions obtained from the work developed in this thesis and how the produced artifacts and results satisfy the specific objectives. 5.1 Game platform The developed game platform was presented in section 3.1, as well as the specification of how it was implemented. This platform performed satisfactorily as a platform to be used to allow interaction between human participants and artificial intelligence agents as shown in section 3.4. However, while the platform is successful in allowing the interaction with a large number of players, and collect their respective answers, it was not graded as easy to use by the participants. About 40% of the participants graded the platform as neither easy nor difficult to use, and almost 30% of them graded it as difficult to use. This points towards possible improvements to be done to the game platform to make it easier to use to human players. 5.2 Intelligent agents The developed agents proved to be good at playing the game with themselves and with human players. While still below human skill level, the agents themselves managed to win about a third of the games playing by themselves which is a milestone in research regarding the Pandemic game. At the time we started our work, no other agent existed 53 that would have been suitable for our purposes, which is why we had to implement our own versions of a baseline agent (the heuristic agent). In this sense, our agents can provide a baseline for future research on the subject and provide a new example of possible applications to planning as a way of creating intelligent agents to cooperate with human players. Some of the results presented in this thesis have been published in articles at the Artificial Intelligence and Interactive Digital Entertainment (AIIDE) conference. The problem domain and game platform were presented in AIIDE 2019 as part of the Ex- perimental AI in Games (EXAG) workshop in an article titled Pandemic as a Challenge for Human-AI Cooperation [SCE19]. The results for both the heuristic and planning agent were presented in the AIIDE 2020 conference in an article titled PAIndemic: A planning agent for Pandemic [SCE20] along with the codes for them. This article and its code was nominated (along with three other papers) to best artifact at the conference and was acknowledge of interest to the community. 5.3 Comparing the performance of the agents The performance of the planning and plan recognition agents proved satisfactory on the both the standalone tests and the evaluation done by the participants of the experiment. In the standalone tests both of the agents outperformed the heuristic agent used as baseline and obtained similar results for the two player versions of the game. This was to be expected since the interpretation of a possible plan resulting from a planning agent should be the same as the plan the agent would have performed itself. In the experiment with the human participants, we were unable to demonstrate a significant difference between the two variants of our agent. We collected data from 187 games, but with the restrictions applied, these games only provided us with 77 samples of data from players answering the survey. This posed a challenged for the statistical analysis due to the size of the sample which might have produced more significant results with more samples. Another explanation for the lack of an observable difference between the agents could also be that, while the agents might have different behaviour, the difference is likely small which makes it difficult to measure among the noise provided by other factors (for example, the participants). One result to highlight, though, is the statistically significant difference between 54 the two agent types in terms of the participants’ understanding of the agents. This difference being positive or negative still was not conclusive, but provides some evidence that plan recognition has an impact on cooperation with human players. 5.4 Future Work While our agent provides a strong baseline for Pandemic playing agents, our work directly enables several future improvements of our approach. The implemented agent could be improved either by defining a new heuristic function or by developing a new evaluation function that chooses the weights to be used by the heuristic function at each turn during the game. Further research of this topic could focus on other types of agents, perhaps using neural networks through reinforcement learning or through a genetic algorithm approach (like a rolling horizon evolutionary algorithm). A neural network could also be used together with an heuristic function to evaluate the states in a hybrid approach. The experiment could be reproduced with a larger population or a particular group of interest (e.g: people which have played the game before) to obtain more precise results which would allow to confirm a difference should there be one. For the purpose of allowing reproduction of the presented methodology and to allow further research to be performed, we have made parts of our code available in Github in the repository: github.com/BlopaSc/PAIndemic. The repository will contain all the needed source code and a guide about how to use the developed software. The missing source code will be added after the publication of the papers relating to them (the developed agents). All of the assets utilized in the development of the game client were obtained from open source with open licenses which request the authors to be attributed for their work (the authors are duly credited inside the game client). The approach presented in our work could be applied to other cooperative games and problems as future work. The use of a planning agent for a turn-based problem with an heuristic could be used for cooperative games as The Big Book of Madness or Eldritch Horror. Plan recognition planning agents could be applied to games with a “traitor” mechanic (a feature of cooperative games in which one of the players is a traitor scheming against the other players), which poses an interesting problem regarding belief modeling. 55 Lastly, further improvements to the platform could be obtained through an eval- uation of usability or using a study group to come forth with possible improvements. The idea of reusability was central when developing the artifacts of this thesis to allow further development of agents to be done upon the platforms developed. 56 Appendix A Player Experience Questionnaire A.1 Previous game experience 1. How familiar are you with the board game Pandemic? • I have never played before participating in this experiment • I have played a few (1-10) times • I have played multiple (11-50) times • I have played many (>50) times 2. When was the last time that you played Pandemic before this experiment? • I have never played before or can’t remember when I played the last time • The last time I played has been a long time (over a year) ago • The last time I played has been some time (between 3 months and a year) ago • The last time I played was recently (up to 3 months ago) 3. How often do you win a game of Pandemic with 4 epidemics? • I never win or I have never played Pandemic • I almost never win the game (about one in 4 or more games) • I sometimes win the game (about one in 2 or 3 games) • I often win the game (more than one in 2 games) 57 A.2 Experience with the AI agent 1. How easy to use was the User Interface (game display/controls)? • The User Interface was very difficult to use • The User Interface was difficult to use • The User Interface was neither hard nor easy to use • The User Interface was easy to use • The User Interface was very easy to use 2. How would you rate the play skill of this AI agent? • The AI agent played very badly • The AI agent played badly • The AI agent played ok • The AI agent played well • The AI agent played very well 3. How often did you understand what the AI agent was trying to do? • I never understood what the AI agent was trying to do • I rarely understood what the AI agent was trying to do • I sometimes understood what the AI agent was trying to do • I often understood what the AI agent was trying to do • I always understood what the AI agent was trying to do 4. How often did the AI agent helped you achieve your plans and goals? • The AI agent never helped me achieve my plans and goals • The AI agent rarely helped me achieve my plans and goals • The AI agent sometimes helped me achieve my plans and goals • The AI agent often helped me achieve my plans and goals • The AI agent always helped me achieve my plans and goals 58 Bibliography [Bau10] Antoine Bauza. Hanabi. https://boardgamegeek.com/boardgame/98778/hanabi, 2010. Online; accessed 25 February 2019. [BCP+16] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. CoRR, abs/1606.01540, 2016. [BFC+19] Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, and Michael Bowling. The hanabi challenge: A new frontier for AI research. CoRR, abs/1902.00506, 2019. [BFGM06] Michael Bowling, Johannes Fürnkranz, Thore Graepel, and Ron Musick. Machine learning and games. Machine Learning, 63(3):211–215, Jun 2006. [BG01] Blai Bonet and Héctor Geffner. Planning as heuristic search. Artificial Intelligence, 129:5– 33, 2001. [BJ11] Christer Bäckström and Peter Jonsson. All PSPACE-complete planning problems are equal but some are more equal than others. In Daniel Borrajo, Maxim Likhachev, and Carlos Linares López, editors, SOCS. AAAI Press, 2011. [BNVB13] M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, jun 2013. [Boa] Ultra BoardGames. Tips to win pandemic. ultraboardgames.com/pandemic/tips.php. Online; accessed 3 April 2019. [Byl94] Tom Bylander. The computational complexity of propositional strips planning. Artif. Intell., 69(1-2):165–204, September 1994. [Car01] Sandra Carberry. Techniques for plan recognition. User Modeling and User-Adapted Interaction, 11(1):31–48, Mar 2001. [Con15] Chris Conway. GOAP in Tomb Raider. Game Developers Conference, 2015. [Cou06] Rémi Coulom. Efficient selectivity and backup operators in Monte-Carlo tree search. In In: Proceedings Computers and Games 2006. Springer-Verlag, 2006. [CRY15] Rogelio Cardona-Rivera and Robert Young. Symbolic plan recognition in interactive narrative environments, 2015. [CWP12] P. I. Cowling, C. D. Ward, and E. J. Powley. Ensemble determinization in monte carlo tree search for the imperfect information card game magic: The gathering. IEEE Transactions on Computational Intelligence and AI in Games, 4(4):241–257, 2012. 59 [EMS+20] M. Eger, C. Martens, P. Sauma Chacón, M. Alfaro Córdoba, and J. Hidalgo Cespedes. Operationalizing intentionality to play hanabi with human players. IEEE Transactions on Games, pages 1–1, 2020. [FN71] Richard E. Fikes and Nils J. Nilsson. STRIPS: A new approach to the application of theorem proving to problem solving. In Proceedings of the 2Nd International Joint Con- ference on Artificial Intelligence, IJCAI’71, pages 608–620, San Francisco, CA, USA, 1971. Morgan Kaufmann Publishers Inc. [Hig15] Peter Higley. GOAP from FEAR to Shadow of Mordor. Game Developers Conference, 2015. [HNR68] P. E. Hart, N. J. Nilsson, and B. Raphael. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2):100– 107, July 1968. [Isl05] Damian Isla. Handling complexity in the Halo 2 AI. Game Developers Conference, 2005. [Jam17] Trevor James. Strategy tips for new players. boardgamegeek.com/thread/1813968, 2017. Online; accessed 3 April 2019. [KBTB14] Dirk P. Kroese, T. Brereton, T. Taimre, and Z. Botev. Why the monte carlo method is so important today. Wiley Interdisciplinary Reviews: Computational Statistics, 6:386–392, 2014. [Lan18] Nathan Langley. Unraveling Riot’s critical strike algorithm. https://doranslab.gg/ articles/crit-strike-algorithm.html, August 2018. Online; accessed 11 January 2020. [Lea08] Matt Leacock. Pandemic. https://boardgamegeek.com/boardgame/30549/pandemic, 2008. Online; accessed 25 February 2019. [MU49] Nicholas Metropolis and S. Ulam. The monte carlo method. Journal of the American Statistical Association, 44(247):335–341, 1949. PMID: 18139350. [Nil84] Nils Nilsson. Shakey the robot. SRI International Technical Note, 323:135, 04 1984. [NT12] Kenichiro Nakai and Yasuhiko Takenaga. NP-completeness of pandemic. Journal of Information Processing, 20(3):723–726, 2012. [Ope18] OpenAI. Gym: a toolkit for developing and comparing reinforcement learning algorithms. https://gym.openai.com/, 2018. Online; accessed 25 September 2018. [Ork06] Jeff Orkin. Three states and a plan: The A.I. of F.E.A.R., 2006. [Rab13] Steven Rabin. Game AI Pro: Collected Wisdom of Game AI Professionals. A. K. Peters, Ltd., USA, 2013. [RG09] M. Ramirez and H. Geffner. Plan recognition as planning. In Proceedings of the Twenty- First International Joint Conference on Artificial Intelligence (IJCAI-09), 2009. [SCE19] Pablo Sauma-Chacón and Markus Eger. Pandemic as a challenge for human-AI coopera- tion. In Proceedings of the AIIDE workshop on Experimental AI in Games, 2019. [SCE20] Pablo Sauma-Chacón and Markus Eger. PAIndemic: A planning agent for Pandemic. 2020. 60 [SHM+16] David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanc- tot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, jan 2016. [SL20] Konstantinos Sfikas and Antonios Liapis. Collaborative agent gameplay in the pandemic board game. In Proceedings of the 15th International Conference on the Foundations of Digital Games, 2020. [SPPF09] Immanuel Schweizer, Kamill Panitzek, Sang-Hyeun Park, and Johannes Fürnkranz. An exploitative monte-carlo poker agent. In Bärbel Mertsching, Marcus Hund, and Zaheer Aziz, editors, KI 2009: Advances in Artificial Intelligence, pages 65–72, Berlin, Heidel- berg, 2009. Springer Berlin Heidelberg. [SW10] Claude Sammut and Geoffrey I. Webb, editors. Agent, pages 35–35. Springer US, Boston, MA, 2010. [VBC+19] Oriol Vinyals, Igor Babuschkin, Junyoung Chung, Michael Mathieu, Max Jader- berg, Wojtek Czarnecki, Andrew Dudzik, Aja Huang, Petko Georgiev, Richard Powell, Timo Ewalds, Dan Horgan, Manuel Kroiss, Ivo Danihelka, John Agapiou, Junhyuk Oh, Valentin Dalibard, David Choi, Laurent Sifre, Yury Sulsky, Sasha Vezhnevets, James Molloy, Trevor Cai, David Budden, Tom Paine, Caglar Gul- cehre, Ziyu Wang, Tobias Pfaff, Toby Pohlen, Yuhuai Wu, Dani Yogatama, Ju- lia Cohen, Katrina McKinney, Oliver Smith, Tom Schaul, Timothy Lillicrap, Chris Apps, Koray Kavukcuoglu, Demis Hassabis, and David Silver. AlphaStar: Mas- tering the real-time strategy game StarCraft II. https://deepmind.com/blog/ alphastar-mastering-real-time-strategy-game-starcraft-ii/, 2019. Online; ac- cessed 24 February 2019. [VdBDR09] Guy Van den Broeck, Kurt Driessens, and Jan Ramon. Monte-carlo tree search in poker using expected reward distributions. In Zhi-Hua Zhou and Takashi Washio, editors, Ad- vances in Machine Learning, pages 367–381, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg. [Wel94] Daniel S Weld. An introduction to least commitment planning. AI magazine, 15(4):27, 1994. [WPC+12] James R. Wallace, Joseph Pape, Yu-Ling Betty Chang, Phillip J. McClelland, T.C. Nicholas Graham, Stacey D. Scott, and Mark Hancock. Exploring automation in digital tabletop board game. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work Companion, CSCW ’12, pages 231–234, New York, NY, USA, 2012. ACM. [YT18] Georgios N. Yannakakis and Julian Togelius. Artificial Intelligence and Games. Springer, 2018. http://gameaibook.org.