Collaborative Museum Heist with Reinforcement Learning

Abstract:

Non-Playable Characters (NPCs) play a crucial role in enhancing immersion in video games. However, traditional NPC behaviors are often hard-coded using methods such as Finite State Machines, Decision and Behavior trees. This has a few limitations; namely, it is quite difficult to implement complex cooperative behaviors and secondly this makes it easy for human players to identify and exploit patterns in behavior. To overcome these challenges, Reinforcement Learning (RL) can be used to generate dynamic and real-time NPC responses to human player actions. In this paper, we report on first results of applying RL techniques to a Non-Zero Sum, adversarial asymmetric game, using a multi-agent team. The game environment simulates a museum heist, where the objective of the successfully trained team of robbers with different skills (Locksmith, Technician) is to steal valuable items from the museum without being detected by the scripted security guards and cameras. Both agents were trained concurrently with separate policies and received both individual and group reward signals. Through this training process, the agents learned to cooperate effectively and use their skills to maximize both individual and team benefits. These results demonstrate the feasibility of realizing the full game where both robbers and security guards are trained at the same time to achieve their adversarial goals.

Background:

The inspiration was multiplayer games where two teams play against each other but they have different roles and each player has different skills. Such games are called asymmetrical. We also wanted our game to belong in the category of Non-Zero sum games: one player's gain is not equal to the other player's loss. We wanted to examine how Reinforcement Learning (RL) techniques can be used in such games to train a multi-agent team where each agent has a different skill and a unique playing style. Our aim was to comprehend if the agents can produce an interesting strategy in such games by helping both their teammates and themselves.

Design and Mechanics:

We wanted to design a game that didn't include violence but uses strategy. Thus, we designed a game environment that simulates a museum heist, where there are two teams: the guards and the robbers. In our design we took into consideration that training agents with RL is very challenging and hence constrained our game design to achieve satisfactory agent behaviour. Further game design complexities should be considered to expand on this research. Thus, we trained the robbers' team against a scripted behaviour of security guards. The robbers' goal is to steal the valuable items from the museum without being detected by the guards. On the other hand, the guards' objective is to protect the valuables. In addition, each team has members with different roles, for example the robbers consist of a Locksmith that opens doors and a technician that disables security cameras. Whereas the team of guards consists of patrolling guards and security cameras, which can detect the robbers and raise the alarm. While designing the behaviour of the guards, we simulated uncertainty and the fact that the guards could communicate with their fellow-guards to define when to raise the alarm.

There are many design issues that arise when creating an environment for agents to train. For example, to create a challenging enough museum environment for the agents, we introduced a mechanism that generates a randomised map each training step. This mechanism produces a museum consisting of four rooms that are connected with doors. Overall, there are eight different layouts for the museum where the agents can be trained.

Reinforcement learning requires to set observations, actions and rewards. For example, actions are the available actions to a human player of the game. On the other hand, observations are more complicated to set up. We had to consider what a real player could observe and not give an unfair advantage to our agents.Finally, the most challenging part was the fine tuning of the rewards, which required thinking about all the possible occasions to reward or punish the agents that would reinforce them to develop a strategy where they both use their skills and help their partner. Further details on this project are provided in our paper.

Project information

  • Category: Paper Publication
  • Conference: 36th International Conference on Computer Animation & Social Agents 2023 (CASA 2023)
  • Journal: Computer Animation and Virtual Worlds
  • Project date: 2023
  • DOI: https://doi.org/10.1002/cav.2158
  • Source Code: GitHub page
  • Tools: Unity, Unity ML-Agents, C#, Blender

Contributions