【启元世界 AI 英雄帖】首个 3D 开放世界 FPS 游戏 AI 竞赛《荒野寻宝》邀你参赛！

Inspir-AI

近日，启元世界开发和承办的首个 3D 开放世界 FPS 游戏 AI 竞赛《荒野寻宝》正式启动，该竞赛将在 IEEE Conference on Games 2022 上正式亮相。作为 IEEE CoG 的赞助方，**通用人工智能平台公司启元世界（inspir.ai）**面向国内外顶尖高校、机构和公司，诚邀广大 AI 技术研究者和发烧友参赛，共同探索人工智能在多人 3D 开放世界 FPS 游戏领域的学习研究。

在这场顶级赛事中，你将不只是人工智能与游戏再次深度结合的见证者，更是让互娱体验因AI变得更好的参与者和推动者。

在首个 3D 开放世界 FPS 游戏中探索元宇宙

随着AI先后战胜《星际争霸》和 DOTA 的顶级人类玩家，我们正在见证越来越多“AI+游戏”给互娱世界带来的改变。这并不是人类玩家在AI面前的示弱，而是意味着当智能体因强化学习被不断优化，游戏中的NPC可以加快从“人工智障”变成“人工智能”，游戏研发效率可以更高，游戏体验可以更好，我们也因此能够拥有更加沉浸式的互娱大陆。

近年来，3D开放世界和大逃杀吃鸡游戏广泛流行，AI能否攻克 3D 开放世界场景成为大家普遍关注的焦点，但业界却一直没有合适的环境来开展相关研究。启元自主研发的首个 3D 开放世界 FPS 游戏《荒野寻宝》则提供了类似大逃杀游戏（如 PUBG）的 FPS 游戏环境，填补了这个空白。

在 3D 开放世界场景中具备感知、认知、决策能力的 AI 也是启元的重点研究方向。众所周知，元宇宙是一个包含多智能体（大量虚拟人和真实人）、多任务的开放 3D 环境。在《荒野寻宝》中，它要求参赛智能体通过在比赛中相互竞争有限的供给资源，以解决 3D 环境感知，多任务学习，以及多智能体的合作与竞争、人机交互的问题，从而对元宇宙展开更深入的探索。

相比以往比赛，本次3D开放世界FPS游戏AI竞赛包含如下亮点：

更大的世界规模（超过 50,000 平方米）
更真实可见的3D开放世界（基于Unity3D引擎开发）
基于 PCG 的地图生成引入更加多样化的环境要素（超过 100 种不同的地图，随机生成建筑物、植物、障碍物、以及特殊地形）
高度可定制的游戏机制（例如随机生成具有用户指定分布、时间段和数量的奖励项目）

凭借游戏环境的高自由度，参赛者可充分利用来自视觉感知（例如从3D场景网格计算的深度图）和与任务相关的游戏特征（例如目标物品位置）等丰富输入信息，让智能体在广义的世界场景和任务上表现更好。

规则详解——角逐3D开放世界FPS游戏中的AI王者

在比赛中，智能体需要以拟人的方式感知周围环境，基于 3D视觉感知和状态数据，适应动态环境和对手的变化，执行长期规划并灵活处理物资拾取和战斗。为了最大化给定任务的分数，AI还需要泛化到未知的测试环境。

参赛者的最终表现会由其控制的智能体在游戏中收集的补给数量来决定。

比赛分为以下赛道：

赛道 1-1：导航（单人）

智能体需要在开放的大地图中，尽快从一个随机选择的出生点导航寻路到达另一个随机选择的目标点。

每局游戏都有规定的时间限制，参赛者的排名取决于智能体在多局游戏中达到目标点的平均耗时，如超时会有相应的惩罚耗时加成。最终评估时，平均耗时最短的参赛者获得优胜。

赛道 1-2：补给收集（单人）

智能体需要尽可能多地搜集散落在开放大地图中的蓝色补给箱。这些补给箱随机分布在地图的不同位置，例如空地、建筑物各楼层、树下、石头后面等。每个补给箱都可能产出不同数量的补给，室内补给箱的补给产出数量会明显高于室外的补给箱，但是也相对更稀有，从而鼓励参赛者充分探索环境以发掘更快积累补给数量的优胜策略。

由于补给箱的总量是有限的，参赛者需要不断优化智能体探索环境及规划路径的策略，以比其他竞争者更快地收集补给。

赛道 2：补给战（多人）

赛道 2 在赛道 1 的基础上引入了多人模式和战斗系统，多名智能体会被放进同一个开放世界环境中，使用武器攻击其他智能体，抢夺对方已经收集的补给。智能体如被击杀，会立即掉落一半的补给，并在一段时间后在出发区域重新复活。

赛道 2 分为了两个阶段：

第一阶段为资格赛，每个参赛队伍都会单独与我们的baseline智能体进行战斗；

第二阶段为决赛，在上一阶段获胜的10支队伍会在比赛中互相竞争。

多人对战提高了对AI通用能力的要求，智能体不仅要搜集补给，还要学会寻找掩护以确保自身安全，并利用环境优势（例如高处视野，房屋门窗等）攻击抢夺他人的补给。

参赛者的排名取决于多局游戏中收集物资平均数量，每一局游戏都有规定的时长，期间智能体可自由行动。最终评估时，平均供给数量最多的参赛者获得优胜。

本届比赛设定了超10万元的奖金池，以奖励赛道 1 的优胜者和赛道 2 的前三名，以及其他表现优异的玩家。为了鼓励大家在算法上有所创新，我们还特别设置了高达5000美元的技术突破奖。

赛事流程——请留意以下七大时间节点

有意参赛者请关注以下时间节点：

4 月 15 日：准备阶段，发布参赛指南和竞赛基础设施；
4 月 29 日：发布所有赛道的入门示例和baseline代码；
6 月 1 日：提交系统开放，赛道 1 和赛道 2 第一阶段的提交开始，提交内容收到后会立即被在线评估；
7 月 1 日：赛道 2 第一阶段的提交截止；
7 月 6 日：赛道 2 第二阶段入围名单公布，第二阶段的提交开始；
8 月 7 日：所有赛道的提交系统关闭，赛道 2 第二阶段所有有效提交将被集中评估；
8月14日：公布所有参赛者的排名，以及每条赛道的优胜者，颁发相应奖励。

注：所有阶段开始及截止时间均以北京时间当日上午9点为准。原文链接

了解详细赛事信息或报名参赛，可前往赛事官网：https://sites.google.com/view/inspirai-wildscav-cog2022/home

【竞赛问题、经验、答疑讨论】 http://www.deeprlhub.com/d/752-ai

Inspir-AI

赛事简介(GUIDANCE)

INTRODUCTION

Motivation

Following the success of AI commanders in StarCraft and DOTA2, enhancing gameplay experience with AI in various genres of games has been viewed as the next grand challenge. With the rise of open-world games in recent years, learning intelligent agents that have general task-solving capabilities in open-world environments has attracted increasing attention. However, the lack of satisfactory training and testing benchmarks remains an obstacle to research in this field. In this context, the goal of this competition is to advance research in the field of open-world intelligent agent learning. As a stepping stone to the ultimate goal of learning highly intelligent agents "living" in virtual worlds, we decided to focus first on open-world FPS games, taking into account the recent popularity of battle royale games.

Related Work

In general, modern 3D FPS games are inherently incomplete information games that are extremely hard for learning winning strategies in multiple-player scenarios and are known to have no optimal policy. Despite the difficulties, there have been attempts over the past decade to apply reinforcement learning in FPS games. To our best knowledge, the most influential work is the Augmented DRQN model proposed by Lample & Chaplot (2017), where the method leverages both visual input and the game feature information (e.g. presence of enemies or items) and modularizes the model architecture to incorporate independent networks to handle different phases of the game. Their approach successfully learned a competitive FPS agent by minimizing a Q-learning objective and showed better performance than average human players. Following this success, more work on learning FPS game agents has been proposed, such as Arnold by Chaplot & Lample (2017) which benefits from the Action-Navigation architecture, Divide&Conquer by Papoudakis et. al (2018) which further refined the idea of separating the control strategies of map exploration from enemy combat. Although these methods have shown promising results, their training and evaluation context is largely limited to old-fashioned video games with relatively small world sizes and low visual resolution, such as VizDoom (originally 1993) and Quake 3 (originally 1999). Recently, Pearce & Zhu (2021) tried to learn an FPS agent to play CSGO, a phenomenal modern 3D FPS game with high-resolution visual rendering. This new game environment not only introduces more computational burden (mostly due to extracting visual features) but also makes it more difficult for the agent to explore and adapt to the game world efficiently. The new approach addressed the challenge primarily by using behavioral cloning, and the learned agents showed reasonably good performance compared to normal human players in the Deathmatch mode. Building upon these works, in this challenge we seek to further expand the frontiers of AI in playing large-scale open world modern FPS games.

Highlights

The WildScav challenge aims to advance research on learning agents with universal abilities in open-world gaming environments.

The new challenge will provide an FPS game environment similar to current popular battle royale games (e.g. PUBG), allowing multiple players to compete against each other with diverse tactics. Compared to previous work, the new game environment has

a larger world size (over 50,000 m2)
a higher diversity of PCG-based game battlefields (over 100 different maps with randomly spawned buildings, plants, obstacles, and ground area with damping effects)
highly customizable game mechanics (e.g. random generation of bonus items with user-specified distribution, time periods, and quantities)

With this high degree of freedom of the game environment, we expect participants to fully exploit rich input information from both visual perception (e.g. depth map computed from 3D scene mesh) and task-related game features (e.g. target item location) to learn agents that can perform well on generalized world scenarios and tasks.

CHALLENGE STRUCTURE

Infrastructure

This challenge is built upon an open-world FPS game environment. The infrastructure of this challenge mainly consists of the following parts:

Components
- Runtime: The backend runtime environment for game logic simulation developed with the Unity3D game engine.
- Gameplay API: the programming interface allowing users to build their own training environments and actually control agents to perform tasks in the game environment. The API not only provides the communication channel to the backend game runtime to get observation data and send back action commands but also allows users to set some game configuration parameters as they wish. While we are going to have a fixed set of game configurations for the online evaluation, we hope participants make good use of these features to customize their training environments and try to learn agents with more generalized and robust task-solving abilities.
- Replayer: a GUI application powered by the Unity3D game engine developed for visualizing a game replay. This is similar to the spectator mode common in multiplayer FPS games, which allows users to watch the game history interactively. Uses can view the action of an agent from different perspectives and also switch between multiple agents or different observation modes (e.g. first person, third person, free) to see the whole game in a more immersive way. Participants can find the download links of this tool from our GitHub repository. To use this tool, follow the instruction below (assuming we are running the engine on Linux and watch the replay on Windows):
- Decompress the downloaded file to anywhere you like.
- Turn on record when running the game (check details in Python API docs). One record file will be saved at the end of each episode of the game.
- Copy the record file (e.g. xxx.bin) from "fps_Data/StreamingAssets/Replay" (under the root of Game Runtime) to "FPSGameUnity_Data/StreamingAssets/Replay" (under the root of Game Replayer).
- Run the executable entry FPSGameUnity.exe to start the application.
- Select the record you want to watch from the drop-down menu and click "Play" to start playing the record.
- During the play, users can make the following operations
  - Press "Tab": pause or resume
  - Press "E": switch observation mode (between first person, third person, free)
  - Press "Q": switch between multiple agents
  - Press "ECS": stop replay and return to the main menu
Data & Resources
- World Mesh: We provide mesh data for each game map in the form of a .obj file.
- Location List: Since the game is based on open-world settings, players can be spawned at (in theory) arbitrary places in the game world. To avoid potential "blind alley" for the player, we additionally provide all recommended spawning locations of players in a map in the form of a .json file. Though you may also set the spawning location manually, we suggest selecting locations from the provided candidates to avoid potential bad cases.

In general, we hope participants make good use of the provided tools and resources to build their own training environments (e.g. A gym environment with customized observation and action spaces and reward functions) to learn agents with generalizable abilities to solve the tasks in the different tracks.

Game Configuration

We provide a high degree of freedom for the participants to control every single game. Specifically, for each new game, one can control settings like timeout, game mode, map id, supply refresh time, and distribution (to see a detailed description of all configuration parameters, check out our GitHub repository) to customize the agent's learning environment. But notice that in our final evaluation stage, we will have a fixed set of game configurations to maximize the fairness of the competition. Since this is a challenging open-world environment, we expect participants to fully exploit the diversity of the large space of possible configurations of a game and try to learn an agent that can perform reasonably well across various environments, instead of naively overfitting to one specific game configuration.

Agent Observation

The gameplay interface provides multiple sources of information about the agent's self-condition as well as its surrounding environment. The observation mainly consists of two parts, visual perception data, and game variables.

Visual Perception: Unlike previous similar competitions, we do not provide the screen buffer to avoid high computation overload of rendering the game scenes as well as extracting latent features from images (e.g. using a CNN). Instead, we implement an efficient way to compute a low-resolution depth map from the agent's camera using only the location, orientation values, and the mesh data of the static scenes.
Game Variables: We also provide access to multiple classes of game-related variables to allow participants the freedom to construct their own observational features. These variables include location and orientation, state of motion, health, state of combat, and task-related metrics.

From our experience, the above observations are reasonably informative for an agent to make good action decisions. For a detailed description of these variables, check out our Python API docs.

Agent Control

Track 1-1 (Navigation): The first track requires only some basic actions to control movement and orientation, including:
- WALK_DIR: a number between [0, 360] that determines which direction (angle) the agent walks towards.
- WALK_SPEED: a number between [0, 10] that determines how fast the agent walks (with unit m/s).
- TURN_LR_DELTA: the change in the horizontal camera angle (yaw) between two frames. A negative value means a left turn, and a positive value means a right turn.
- LOOK_UD_DELTA: the change in the vertical camera angle (pitch) between two frames. A negative value means looking down, a positive value means looking up.
- JUMP: a bool variable determining whether to jump at the current time step.
Track 1-2 (Supply Gathering): The second track requires players to collect supplies randomly distributed on the world map. Extending from Track 1-1, we only need one more action
- PICKUP: a bool variable determining whether to actively collect a nearby supply.
Track 2 (Supply Battle): In the last track, we introduce the combat system that is very common in FPS games. We have two more actions for players to take, including:
- ATTACK: a bool variable that determines whether to fire the weapon and cost one bullet at the current time step.
- RELOAD: a bool variable that determines whether to refill the weapon's clip using spare ammo.

Other Details

About actions:
- All actions can be set and executed simultaneously within one time step (except ATTACK and RELOAD).
- PICKUP only takes effect if the supply is within the trigger range (fixed at 1m). If multiple supplies are within the trigger area, the closest one is picked up.
- The targeting direction of ATTACK is determined by the agent's current orientation. Therefore, agents may need to adjust TURN_LR_DELTA and LOOK_UD_DELTA in coordination with ATTACK to improve their chances of hitting enemies.
- The RELOAD action takes some time to complete. Additional ATTACK and RELOAD have no effect during this time.
- If the weapon clip is empty and there is still ammunition, a RELOAD is automatically triggered.
- The following table gives a brief overview of the valid actions for each track.
About environment:
- The value range for yaw is set to (-180, 180] and for pitch to [-90. +90] (with -90 means looking up towards sky).
- All walls, trees, rocks, and furnishings are indestructible in our environment and the agent can only fire one bullet at a time.
- The agent does not need to consider the effects of wind speed and gravity on bullet trajectories, although the bullet itself has a fixed velocity.
- The actual movement speed of the agent depends on the combined effect of WALK_SPEED and the environment terrain. For example, stones in the way may slow down or even hinder movement. Uphill terrain also reduces speed. Some special areas reduce movement speed by a certain decay factor (for example, on ice = 0.5)

GETTING STARTED

Our project is put on the Github repository Wilderness-Scavenger.
Follow the installation guide to set up your working environment.
Familiarize yourself with the framework and use of our gameplay API inspirai_fps.
Familiarize yourself with the use of replay tool.
Check out examples and baselines to learn more about the API and start from here to build your own environments.

EVALUATION

Submission

Note: The detailed instruction for submission is pending to be determined. We will release the instruction before the submission system opens.

In brief, a participant should submit an agent controller that takes in the state data (e.g. location, orientation, health, etc.) at each time step and output a control action that tells the backend engine how to operate the player (e.g. move forward/backward). Specifically, the format of state and action is defined in our python gameplay interface and the participants need to wrap their controller into a python class that can be imported from an external module or package. To minimize potential problems during evaluation, we will provide a Docker image setting up a python runtime environment with a compatibility test template. Before submission, participants can test their controller class in this template and make sure the name of the class and all input and output parameters conform to the given specifications. Also, they should check if the working result is consistent with their expectations. Finally, the controller class and other used dependent libraries or data files should be packaged into a single Docker image (which can be simply adapted from ours) and sent to us.

Evaluation Rule

Track 1-1 (Navigation): In general, an agent is measured by how fast it can navigate to target locations. The performance is evaluated mainly based on the time cost by the agent to reach the target location. All submitted agents will be evaluated on 10 new maps (unseen in training) for 100 games (10 games for each map). We set the timeout of a game to be 5 minutes. At the start of each game, a random start location and a random target location are sampled from our manually selected candidate location pool. To ensure fairness, all agents will be tested with the same start and target locations in each game. Finally, the score of an agent is its average time cost in the 100 games. Notice that it is absolutely possible an agent fails in a game, which means it does not reach the target location before timeout. In this case, we add punishment to the actual time cost (which is the timeout limit) and it is computed based on the 3D spatial distance between the target location and the agent's end location. The score will be used for the final ranking, the lower the better.
Track 1-2 (Supply Gathering): The task for agents is to collect supplies as much as possible before the timeout. Similarly, we will evaluate the submitted agents in multiple maps for multiple games. For Track 2, agents are tested across 10 maps for 100 games (10 games for each map). The timeout of a game is set to 10 minutes. In each game, a start location is randomly selected from the candidate location pool and all submitted agents will have the same start location to ensure fairness.
Track 2 (Supply Battle): The goal in the this track is the same as Track 1-2 while the advantageous game strategy may be quite different due to the introduction of the combat system. We run the evaluation across 10 maps for 10 games (one map for each game). Still, the agent start location is randomly picked from the candidate location pool. However, it is unlikely that all agents are spawned at exactly the same location. To address this, we set up a small spawning area in each game, and all the agents are randomly dropped in this area at the beginning with a short period of invincible state to prevent the early fight. The score of an agent is calculated as the average number of supplies it collected across all games. The score will be used for the final ranking, the higher the better.

Evaluation Process

Singleplayer (Track 1): Each submission of Track 1-1 and Track 1-2 is independently evaluated. A team can submit its solution multiple times (limited) and only the latest submission will be counted towards its final ranking. The top 1 team on each task will be given the prize.
Multiplayer (Track 2): We apply a two-stage workflow to simplify multiplayer evaluation. The first stage is a typical qualifier evaluation stage to select out top 10 teams to enter the second stage. The second stage will be the final contest, where agents from different teams directly compete against each other in a game.

Stage 1 (qualifier evaluation): In the first stage, all teams are independently tested by competing with our baseline agents. We use the same game configuration for all submissions. In each game, the agent is spawned randomly near the edge of the world range. Then within a timeout limit of 15 minutes, 10 agents (including 9 baseline agents and the submitted agent) will compete against each other.
Stage 2 (final contest): The top 10 teams from the first stage will enter the second stage. At this stage, each team controls an agent to compete in the game. We use different game configurations to run multiple games to evaluate all agents. The final score of a team will be the average of all scores it receives in all games.

Document