原文pdf: https://arxiv.org/pdf/2206.08332.pdf
我们提出了 BYOL-Explore,这是一种概念上简单但通用的方法,用于在视觉复杂的环境中进行好奇心驱动的探索。 BYOL-Explore 通过优化潜在空间中的单个预测损失来共同学习世界表示、世界动态和探索策略,而无需额外的辅助目标。 我们证明 BYOL-Explore 在 DM-HARD-8 中是有效的,DM-HARD-8 是一个具有挑战性的部分可观察的连续动作硬探索基准,具有视觉丰富的 3-D 环境。 在这个基准上,我们纯粹通过使用 BYOL-Explore 的内在奖励来增加外在奖励来解决大部分任务,而之前的工作只能通过人工演示才能起步。 作为 BYOL-Explore 普遍性的进一步证据,我们表明它在 Atari 中最难的十种探索游戏中实现了超人的表现,同时比其他竞争代理具有更简单的设计。
We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments. BYOL-Explore learns a world representation, the world dynamics, and an exploration policy alltogether by optimizing a single prediction loss in the latent space with no additional auxiliary objective. We show that BYOL-Explore is effective in DM-HARD-8, a challenging partially-observable continuous-action hard-exploration benchmark with visually-rich 3-D environments. On this benchmark, we solve the majority of the tasks purely through augmenting the extrinsic reward with BYOL-Explore’s intrinsic reward, whereas prior work could only get off the ground with human demonstrations. As further evidence of the generality of BYOL-Explore, we show that it achieves superhuman performance on the ten hardest exploration games in Atari while having a much simpler design than other competitive agents.