这是用户在 2024-9-10 19:40 为 https://test-app.immersivetranslate.com/pdf-pro/baba787f-e9eb-444e-b6ac-66af6f29627b 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
Using Bifurcations for Diversity in Differentiable Games
在可微分博弈中使用分岔实现多样性

Jonathan Lorraine 123123^(123){ }^{123} Jack Parker-Holder 1414^(14){ }^{14} Paul Vicol 2323^(23){ }^{23} Aldo Pacchiano 55^(5){ }^{5} Luke Metz 66^(6){ }^{6} Tal Kachman 77^(7){ }^{7} Jakob Foerster 11^(1){ }^{1}

Abstract 抽象

Ridge Rider (RR) is an algorithm for finding diverse solutions to optimization problems by following eigenvectors of the Hessian (“ridges”). RRRRRRR R is designed for conservative gradient systems (i.e. settings involving a single loss function), where it branches at saddles - the only relevant bifurcation points. We generalize this idea to non-conservative, multi-agent gradient systems by identifying new types of bifurcation points and proposing a method to follow eigenvectors with complex eigenvalues. We give theoretical motivation for our method - denoted Game Ridge Rider (GRR) - by leveraging machinery from the field of dynamical systems. Finally, we empirically motivate our method by constructing novel toy problems where we can visualize new phenomena and by finding diverse solutions in the iterated prisoners’ dilemma, where existing methods often converge to a single solution mode.
Ridge Rider (RR) 是一种通过遵循 Hessian 的特征向量(“ridges”)来查找优化问题的不同解决方案的算法。 R R R R RRR R 专为保守的梯度系统(即涉及单个损失函数的设置)而设计,其中它在 saddles - 唯一相关的分叉点处分支。我们通过识别新型分岔点并提出一种遵循具有复杂特征值的特征向量的方法,将这一想法推广到非保守的多智能体梯度系统。我们通过利用动力系统领域的机械,为我们的方法(称为 Game Ridge Rider (GRR))提供了理论动机。最后,我们通过构建新颖的玩具问题来实证地激励我们的方法,我们可以在其中可视化新现象,并在迭代的囚徒困境中找到不同的解决方案,其中现有方法通常收敛为单一解决方案模式。

1. Introduction 1. 引言

In machine learning, optimizing non-convex losses to local minima is critical in a variety of applications. Often, being able to select a specific type of minimum is important such as for policy optimization [1] or to avoid non-transferable features in supervised learning [2,3][2,3][2,3][2,3].
在机器学习中,将非凸损失优化为局部最小值在各种应用中都至关重要。通常,能够选择特定类型的最小值很重要,例如对于策略优化 [1] 或避免监督学习 [ 2 , 3 ] [ 2 , 3 ] [2,3][2,3] 中的不可转移特征。

However, an increasing number of applications require learning in games, generalizing single objective optimization to settings where each agent controls a different subset of parameters to optimize their own objective. Some examples are GANs [4, 5], actor-critic models [5], curriculum learning [6-9], hyperparameter optimization [10-15], adversarial examples [16, 17], learning models [18-20], domain adversarial adaptation [21], neural architecture search [22-26], multi-agent settings [27] and meta-learning [28-30].
然而,越来越多的应用程序需要在游戏中学习,将单目标优化推广到每个代理控制不同参数子集以优化自己的目标的设置。一些例子是 GAN [4, 5]、演员-评论家模型 [5]、课程学习 [6-9]、超参数优化 [10-15]、对抗性示例 [16, 17]、学习模型 [18-20]、领域对抗性适应 [21]、神经架构搜索 [22-26]、多智能体设置 [27] 和元学习 [28-30]。

There are two major challenges with learning in games: first, the learning dynamics can be unstable due to the nonstationarity induced by simultaneously learning agents, e.g. resulting in cycling dynamics rather than convergence. Second, different equilibria can result in vastly different rewards for the different agents. For example, in the iterated prisoners’ dilemma (Sec. 4.1), finding solutions that favor cooperation vs selfishness result in higher return for all agents; or in Hanabi, finding solutions that do not arbitrarily break
游戏中的学习有两个主要挑战:首先,由于同时学习代理引起的非平稳性,学习动态可能不稳定,例如导致循环动态而不是收敛。其次,不同的均衡会导致不同的主体获得截然不同的回报。例如,在迭代的囚徒困境(第 4.1 节)中,找到有利于合作而不是自私的解决方案会为所有主体带来更高的回报;或者在 Hanabi 中,寻找不会随意打破的解决方案
symmetries results in better coordination with humans [31].
对称性导致与人类的协调性更好[31]。

Recently, Ridge Rider (RR [32]) provided a general method for finding diverse solutions in single objective optimization, by following eigenvectors of the Hessian with negative eigenvalues. We generalize RR to multi-agent settings, each optimizing their own objective. The relevant generalization of the Hessian - the game Hessian - is not symmetric and thus, in general, has complex eigenvalues (EVals) and eigenvectors (EVecs). This leads to new types of bifurcation points, where small changes in initial parameters lead to very different optimization outcomes.
最近,Ridge Rider (RR [32])提供了一种通用方法,通过遵循具有负特征值的 Hessian 特征向量,在单目标优化中寻找各种解决方案。我们将 RR 推广到多代理设置,每个设置都优化了自己的目标。Hessian 矩阵的相关泛化 - 游戏 Hessian 矩阵 - 不是对称的,因此,通常具有复特征值 (EVals) 和特征向量 (EVecs)。这导致了新型的分叉点,其中初始参数的微小变化会导致非常不同的优化结果。
Our method, Game Ridge Rider (GRR) branches from these novel bifurcation points along the (complex) EVecs to discover diverse solutions in these settings.
我们的方法 Game Ridge Rider (GRR) 从这些沿着(复杂)EVec 的新颖分叉点分支出来,以在这些环境中发现不同的解决方案。
Figure 1. We visualize the Game Ridge Rider (GRR) algorithm when branching at different types of bifurcations. We have two eigenvectors and can branch in opposite directions, so we have four paths displayed in different colors. Steps with the eigenvector have magenta boundaries. Top: For the small Iterated Prisoner’s Dilemma (IPD) finding then branching at the max entropy saddle allows us to find defect-defect and tit-for-tat solutions. Bottom: For the mixed problem of small IPD and matching pennies, branching at the Hopf bifurcation allows us to find both solutions.
图 1.我们在不同类型的分岔处分支时可视化了 Game Ridge Rider (GRR) 算法。我们有两个特征向量,可以向相反的方向分支,因此我们有四条路径以不同的颜色显示。具有特征向量的步骤具有洋红色边界。上图:对于小的迭代囚徒困境 (IPD) 结果,然后在最大熵鞍上分支,我们可以找到缺陷-缺陷和针锋相对的解决方案。下图:对于小 IPD 和匹配便士的混合问题,在 Hopf 分岔处进行分支可以让我们找到两种解决方案。

2. Background 2. 背景

Appendix Table 2 summarizes our notation. Consider the minimization problem:
附录表 2 总结了我们的符号。考虑最小化问题:
θ:=argminθL(θ)θ:=argminθL(θ)theta^(**):=arg min_(theta)L(theta)\boldsymbol{\theta}^{*}:=\arg \min _{\boldsymbol{\theta}} \mathcal{L}(\boldsymbol{\theta})
We denote the gradient of the loss at parameters θjθjtheta^(j)\boldsymbol{\theta}^{j} by gj:=g(θj):=θL(θ)|θjgj:=gθj:=θL(θ)θjg^(j):=g(theta^(j)):=grad_(theta)L(theta)|_(theta^(j))\boldsymbol{g}^{j}:=\boldsymbol{g}\left(\boldsymbol{\theta}^{j}\right):=\left.\nabla_{\boldsymbol{\theta}} \mathcal{L}(\boldsymbol{\theta})\right|_{\boldsymbol{\theta}^{j}}. We can locally minimize the loss LLL\mathcal{L} using (stochastic) gradient descent with step size ααalpha\alpha.
我们用 g j := g ( θ j ) := θ L ( θ ) | θ j g j := g θ j := θ L ( θ ) θ j g^(j):=g(theta^(j)):=grad_(theta)L(theta)|_(theta^(j))\boldsymbol{g}^{j}:=\boldsymbol{g}\left(\boldsymbol{\theta}^{j}\right):=\left.\nabla_{\boldsymbol{\theta}} \mathcal{L}(\boldsymbol{\theta})\right|_{\boldsymbol{\theta}^{j}} 表示 at 参数 θ j θ j theta^(j)\boldsymbol{\theta}^{j} 的损失梯度。我们可以使用步长 α α alpha\alpha 的(随机)梯度下降 来局部最小化损失 L L L\mathcal{L}
θj+1=θjαgjθj+1=θjαgjtheta^(j+1)=theta^(j)-alphag^(j)\boldsymbol{\theta}^{j+1}=\boldsymbol{\theta}^{j}-\alpha \boldsymbol{g}^{j}
Due to the potentially non-convex nature of LLL\mathcal{L}, multiple solutions can exist and simply following the gradient will not explore the parameter space.
由于 L L L\mathcal{L} 的潜在非凸性,可以存在多个解决方案,并且仅遵循梯度不会探索参数空间。

2.1. Ridge Rider 2.1. 山脊骑士

Ridge Rider (RR) [32] finds diverse solutions in gradient systems for single-objective minimization. RR first finds a saddle point, and then branches the optimization procedure following different directions (or “ridges”) given by the EVecs of the Hessian H=θg=θ(θL)H=θg=θθLH=grad_(theta)g=grad_(theta)(grad_(theta)L)\mathcal{H}=\nabla_{\boldsymbol{\theta}} \boldsymbol{g}=\nabla_{\boldsymbol{\theta}}\left(\nabla_{\boldsymbol{\theta}} \mathcal{L}\right). Full computation of HHH\mathcal{H} —and its EVecs and EVals-is often prohibitively expensive; however, we can efficiently access a subset of the eigenspaces in libraries with efficient Hessianvector products Hv=θ((θL)v)Hv=θθLvHv=grad_(theta)((grad_(theta)L)v)\mathcal{H} \boldsymbol{v}=\nabla_{\boldsymbol{\theta}}\left(\left(\nabla_{\boldsymbol{\theta}} \mathcal{L}\right) \boldsymbol{v}\right) [33-36]. Algorithm 1 summarizes the RR method.
Ridge Rider (RR) [32] 在梯度系统中发现了多种单目标最小化的解决方案。RR 首先找到一个鞍点,然后按照 Hessian H = θ g = θ ( θ L ) H = θ g = θ θ L H=grad_(theta)g=grad_(theta)(grad_(theta)L)\mathcal{H}=\nabla_{\boldsymbol{\theta}} \boldsymbol{g}=\nabla_{\boldsymbol{\theta}}\left(\nabla_{\boldsymbol{\theta}} \mathcal{L}\right) 的 EVecs 给出的不同方向(或“山脊”)对优化过程进行分支。完全计算 H H H\mathcal{H} — 及其 EVec 和 EVals — 通常成本高得令人望而却步;然而,我们可以有效地访问具有高效 Hessianvector 积 H v = θ ( ( θ L ) v ) H v = θ θ L v Hv=grad_(theta)((grad_(theta)L)v)\mathcal{H} \boldsymbol{v}=\nabla_{\boldsymbol{\theta}}\left(\left(\nabla_{\boldsymbol{\theta}} \mathcal{L}\right) \boldsymbol{v}\right) 的库中特征空间的子集 [33-36]。流程图 1 总结了 RR 方法。

2.2. Optimization in Games
2.2. 游戏中的优化

Instead of simply optimizing a single loss, optimization in games involves multiple agents each with a loss functions that potentially depend on other agents. Examples include multiplayer games (e.g. Go [37, 38] and Hanabi [39]), iterated prisoners’ dilemma (IPD) [40, 41] and generative adversarial networks (GANs) [4]. For simplicity, we look at 2-player games with players denoted by AAAA and BBBB. Each player wants to minimize their loss LA,LBLA,LBL_(A),L_(B)\mathcal{L}_{A}, \mathcal{L}_{B} with their parameters θARdA,θBRdBθARdA,θBRdBtheta_(A)inR^(d_(A)),theta_(B)inR^(d_(B))\boldsymbol{\theta}_{A} \in \mathbb{R}^{d_{A}}, \boldsymbol{\theta}_{B} \in \mathbb{R}^{d_{B}}.
游戏中的优化不是简单地优化单个损失,而是涉及多个代理,每个代理都有一个可能依赖于其他代理的损失函数。例子包括多人游戏(如围棋 [37, 38] 和 Hanabi [39])、迭代囚徒困境 (IPD) [40, 41] 和生成对抗网络 (GAN) [4]。为简单起见,我们看一下玩家用 AAAABBBB 表示的 2 人游戏。每个玩家都希望 LA,LBLA,LBL_(A),L_(B)\mathcal{L}_{A}, \mathcal{L}_{B} 通过他们的 parameters θARdA,θBRdBθARdA,θBRdBtheta_(A)inR^(d_(A)),theta_(B)inR^(d_(B))\boldsymbol{\theta}_{A} \in \mathbb{R}^{d_{A}}, \boldsymbol{\theta}_{B} \in \mathbb{R}^{d_{B}} .
θAargminθALA(θA,θB),θBargminθBLB(θA,θB)θAargminθALA(θA),θBargminθBLB(θB)θAargminθALAθA,θB,θBargminθBLBθA,θBθAargminθALAθA,θBargminθBLBθB{:[theta_(A)^(**)in arg min_(theta_(A))L_(A)(theta_(A),theta_(B)^(**))","theta_(B)^(**)in arg min_(theta_(B))L_(B)(theta_(A)^(**),theta_(B))],[Longleftrightarrowtheta_(A)^(**)in arg min_(theta_(A))L_(A)^(**)(theta_(A))","theta_(B)^(**)in arg min_(theta_(B))L_(B)^(**)(theta_(B))]:}θAargminθALA(θA,θB),θBargminθBLB(θA,θB)θAargminθALA(θA),θBargminθBLB(θB)
If LBLBL_(B)\mathcal{L}_{B} and LALAL_(A)\mathcal{L}_{A} are differentiable in θBθBtheta_(B)\boldsymbol{\theta}_{B} and θAθAtheta_(A)\boldsymbol{\theta}_{A} we say the game is differentiable. Unfortunately, for a player to do gradient based optimization of their objective we must compute dLB/dθBdLB/dθBdL_(B)^(**)//dtheta_(B)d \mathcal{L}_{B}^{*} / d \boldsymbol{\theta}_{B} which often requires dθA/dθBdθA/dθBdtheta_(A)^(**)//dtheta_(B)d \boldsymbol{\theta}_{A}^{*} / d \boldsymbol{\theta}_{B}, but θA(θB)θAθBtheta_(A)^(**)(theta_(B))\boldsymbol{\theta}_{A}^{*}\left(\boldsymbol{\theta}_{B}\right) and its Jacobian are typically intractable. There are various algorithms to try to find solutions like Eq. 2, often efficiently approximating dθA/dθBdθA/dθBdtheta_(A)^(**)//dtheta_(B)d \boldsymbol{\theta}_{A}^{*} / d \boldsymbol{\theta}_{B} with a method rely on efficient Jacobian-vector products. We overview these in our related work (Appendix A).
如果 LBLBL_(B)\mathcal{L}_{B}LALAL_(A)\mathcal{L}_{A} 是可微的, θBθBtheta_(B)\boldsymbol{\theta}_{B} θAθAtheta_(A)\boldsymbol{\theta}_{A} 我们说博弈是可微的。不幸的是,对于玩家来说,要对他们的目标进行基于梯度的优化,我们必须计算 dLB/dθBdLB/dθBdL_(B)^(**)//dtheta_(B)d \mathcal{L}_{B}^{*} / d \boldsymbol{\theta}_{B} ,这通常需要 dθA/dθBdθA/dθBdtheta_(A)^(**)//dtheta_(B)d \boldsymbol{\theta}_{A}^{*} / d \boldsymbol{\theta}_{B} ,但 θA(θB)θAθBtheta_(A)^(**)(theta_(B))\boldsymbol{\theta}_{A}^{*}\left(\boldsymbol{\theta}_{B}\right) 其雅可比矩阵通常很难处理。有各种算法可以尝试找到像方程 2 这样的解, dθA/dθBdθA/dθBdtheta_(A)^(**)//dtheta_(B)d \boldsymbol{\theta}_{A}^{*} / d \boldsymbol{\theta}_{B} 通常使用依赖于高效雅可比向量积的方法进行有效近似。我们在相关工作(附录 A)中概述了这些内容。
One of the most straightforward optimization methods is to find local solutions with simultaneous SGD (SimSGD). This does not take into account dθA/dθBdθA/dθBdtheta_(A)^(**)//dtheta_(B)d \boldsymbol{\theta}_{A}^{*} / d \boldsymbol{\theta}_{B} and often fails to converge. Here, gAj:=gA(θAj,θBj)gAj:=gAθAj,θBjg_(A)^(j):=g_(A)(theta_(A)^(j),theta_(B)^(j))\boldsymbol{g}_{A}^{j}:=\boldsymbol{g}_{A}\left(\boldsymbol{\theta}_{A}^{j}, \boldsymbol{\theta}_{B}^{j}\right) and gBj:=gB(θAj,θBj)gBj:=gBθAj,θBjg_(B)^(j):=g_(B)(theta_(A)^(j),theta_(B)^(j))\boldsymbol{g}_{B}^{j}:=\boldsymbol{g}_{B}\left(\boldsymbol{\theta}_{A}^{j}, \boldsymbol{\theta}_{B}^{j}\right) are estimators for θALA|θAj,θBjθALAθAj,θBjgrad_(theta_(A))L_(A)|_(theta_(A)^(j),theta_(B)^(j))\left.\nabla_{\boldsymbol{\theta}_{A}} \mathcal{L}_{A}\right|_{\boldsymbol{\theta}_{A}^{j}, \boldsymbol{\theta}_{B}^{j}} and θBLB|θAj,θBjθBLBθAj,θBjgrad_(theta_(B))L_(B)|_(theta_(A)^(j),theta_(B)^(j))\left.\nabla_{\boldsymbol{\theta}_{B}} \mathcal{L}_{B}\right|_{\boldsymbol{\theta}_{A}^{j}, \boldsymbol{\theta}_{B}^{j}} :
最直接的优化方法之一是找到具有同步 SGD (SimSGD) 的局部解。这没有考虑到 dθA/dθBdθA/dθBdtheta_(A)^(**)//dtheta_(B)d \boldsymbol{\theta}_{A}^{*} / d \boldsymbol{\theta}_{B} ,而且经常无法收敛。这里, gAj:=gA(θAj,θBj)gAj:=gAθAj,θBjg_(A)^(j):=g_(A)(theta_(A)^(j),theta_(B)^(j))\boldsymbol{g}_{A}^{j}:=\boldsymbol{g}_{A}\left(\boldsymbol{\theta}_{A}^{j}, \boldsymbol{\theta}_{B}^{j}\right)gBj:=gB(θAj,θBj)gBj:=gBθAj,θBjg_(B)^(j):=g_(B)(theta_(A)^(j),theta_(B)^(j))\boldsymbol{g}_{B}^{j}:=\boldsymbol{g}_{B}\left(\boldsymbol{\theta}_{A}^{j}, \boldsymbol{\theta}_{B}^{j}\right) 是 和 θBLB|θAj,θBjθBLBθAj,θBjgrad_(theta_(B))L_(B)|_(theta_(A)^(j),theta_(B)^(j))\left.\nabla_{\boldsymbol{\theta}_{B}} \mathcal{L}_{B}\right|_{\boldsymbol{\theta}_{A}^{j}, \boldsymbol{\theta}_{B}^{j}}θALA|θAj,θBjθALAθAj,θBjgrad_(theta_(A))L_(A)|_(theta_(A)^(j),theta_(B)^(j))\left.\nabla_{\boldsymbol{\theta}_{A}} \mathcal{L}_{A}\right|_{\boldsymbol{\theta}_{A}^{j}, \boldsymbol{\theta}_{B}^{j}} 估计量:
θAj+1=θAjαgAj,θBj+1=θBjαgBj(SimSGD)θAj+1=θAjαgAj,θBj+1=θBjαgBj( SimSGD )theta_(A)^(j+1)=theta_(A)^(j)-alphag_(A)^(j),quadtheta_(B)^(j+1)=theta_(B)^(j)-alphag_(B)^(j)quad(" SimSGD ")\boldsymbol{\theta}_{A}^{j+1}=\boldsymbol{\theta}_{A}^{j}-\alpha \boldsymbol{g}_{A}^{j}, \quad \boldsymbol{\theta}_{B}^{j+1}=\boldsymbol{\theta}_{B}^{j}-\alpha \boldsymbol{g}_{B}^{j} \quad(\text { SimSGD })
We simplify notation by using the concatenation of all players parameters (or joint-parameters) ω:=[θA,θB]Rdω:=θA,θBRdomega:=[theta_(A),theta_(B)]inR^(d)\boldsymbol{\omega}:=\left[\boldsymbol{\theta}_{A}, \boldsymbol{\theta}_{B}\right] \in \mathbb{R}^{d} and the joint-gradient vector field g^:RdRdg^:RdRdhat(g):R^(d)rarrR^(d)\hat{\boldsymbol{g}}: \mathbb{R}^{d} \rightarrow \mathbb{R}^{d}, which at the jthjth j^("th ")j^{\text {th }} iteration is denoted:
我们通过使用所有玩家参数(或联合参数) ω:=[θA,θB]Rdω:=θA,θBRdomega:=[theta_(A),theta_(B)]inR^(d)\boldsymbol{\omega}:=\left[\boldsymbol{\theta}_{A}, \boldsymbol{\theta}_{B}\right] \in \mathbb{R}^{d} 和联合梯度向量场 g^:RdRdg^:RdRdhat(g):R^(d)rarrR^(d)\hat{\boldsymbol{g}}: \mathbb{R}^{d} \rightarrow \mathbb{R}^{d} 的串联来简化符号,在迭代时 jthjth j^("th ")j^{\text {th }} 表示为:
g^j:=g^(ωj):=[gA(ωj),gB(ωj)]=[gAj,gBj]g^j:=g^ωj:=gAωj,gBωj=gAj,gBjhat(g)^(j):= hat(g)(omega^(j)):=[g_(A)(omega^(j)),g_(B)(omega^(j))]=[g_(A)^(j),g_(B)^(j)]\hat{\boldsymbol{g}}^{j}:=\hat{\boldsymbol{g}}\left(\boldsymbol{\omega}^{j}\right):=\left[\boldsymbol{g}_{A}\left(\boldsymbol{\omega}^{j}\right), \boldsymbol{g}_{B}\left(\boldsymbol{\omega}^{j}\right)\right]=\left[\boldsymbol{g}_{A}^{j}, \boldsymbol{g}_{B}^{j}\right]
We can write the next iterate in (SimSGD) with a fixed-point operator FαFαF_(alpha)\boldsymbol{F}_{\alpha} :
我们可以使用定点运算符 FαFαF_(alpha)\boldsymbol{F}_{\alpha} 编写下一次迭代 (SimSGD):
ωj+1=Fα(ωj)=ωjαg^jωj+1=Fαωj=ωjαg^jomega^(j+1)=F_(alpha)(omega^(j))=omega^(j)-alpha hat(g)^(j)\boldsymbol{\omega}^{j+1}=\boldsymbol{F}_{\alpha}\left(\boldsymbol{\omega}^{j}\right)=\boldsymbol{\omega}^{j}-\alpha \hat{\boldsymbol{g}}^{j}
The Jacobian of the fixed point operator FαFαF_(alpha)\boldsymbol{F}_{\alpha} is useful for analysis, including bounding convergence rates near fixed points and finding points where local changes to parameters may cause convergence to qualitatively different solutions. The fixed point operator’s Jacobian crucially depends on the Jacobian of our update g^g^hat(g)\hat{\boldsymbol{g}}. When our update is the gradient, we call this the game Hessian because it generalizes the Hessian:
不动点运算符 F α F α F_(alpha)\boldsymbol{F}_{\alpha} 的雅可比矩阵可用于分析,包括固定点附近的边界收敛速率,以及查找参数的局部更改可能导致收敛到定性不同的解的点。定点运算符的雅可比行列式在很大程度上取决于我们更新的 g ^ g ^ hat(g)\hat{\boldsymbol{g}} 雅可比行列式。当我们的更新是梯度时,我们将其称为游戏 Hessian,因为它泛化了 Hessian:
H^:=ωg^=[θA2LAθAθBLAθBθALBθB2LB]ωFα(ω)=IαH^H^:=ωg^=θA2LAθAθBLAθBθALBθB2LBωFα(ω)=IαH^{:[ hat(H):=grad_(omega) hat(g)=[[grad_(theta_(A))^(2)L_(A),grad_(theta_(A))grad_(theta_(B))L_(A)],[grad_(theta_(B))grad_(theta_(A))L_(B)^(TT),grad_(theta_(B))^(2)L_(B)]]],[grad_(omega)F_(alpha)(omega)=I-alpha hat(H)]:}H^:=ωg^=[θA2LAθAθBLAθBθALBθB2LB]ωFα(ω)=IαH^
For single-objective optimization H^H^hat(H)\hat{\mathcal{H}} is the Hessian of the loss, which is symmetric and has real EVals yielding parameter updates which form a conservative gradient vector field. However, in games with multiple objectives, H^H^hat(H)\hat{\mathcal{H}} is not symmetric and can have complex Evals, resulting in updates which form a non-conservative vector field.
对于单目标优化 H ^ H ^ hat(H)\hat{\mathcal{H}} ,损失的 Hessian 矩阵是对称的,并且具有真正的 EVals 产生参数更新,从而形成一个保守的梯度向量场。但是,在具有多个目标的游戏中, H ^ H ^ hat(H)\hat{\mathcal{H}} 不是对称的,并且可能具有复杂的 Evals,从而导致形成非保守向量场的更新。

2.3. Local Bifurcations and Separatrices
2.3. 局部分叉和分离

The goal of RR was to obtain a method for finding diverse solutions in minimization. There are multiple ways to try to find solutions with RR, but we focus on finding separatrices -i.e., boundaries between phase space regions with different dynamical behavior. Crossing such boundaries leads to different solutions under our updates flow-and branching over these boundaries. Such a behavior of crossing regions and changing behavior is in fact a local bifurcations and a qualitative change in the behavior of the solutions.
RR 的目标是获得一种在最小化中寻找各种解的方法。有多种方法可以尝试用 RR 找到解,但我们专注于寻找分离——即具有不同动力学行为的相空间区域之间的边界。跨越这些边界会导致我们的更新流程下出现不同的解决方案,并跨越这些边界。这种跨越区域和改变行为的行为实际上是局部分叉和解行为的质变。
When applying RR in conservative gradient systems, saddle points and their EVecs play a key role in the shape of the phase portraits, which are a geometrical representation of the underlying dynamics. The negative EVecs often align with unstable manifolds that are orthogonal to our separatrices [42], thus giving directions in which we can perturb to find different solutions (Lemma 14.3 [43]). However, in non-conservative systems there are a variety of other local bifurcations [44] besides saddle points [45]. For example, by Thm. 11.2 of [43] if all EVals of have negative real part, except a conjugate non-zero pair of purely imaginary EVals, then a Hopf bifurcation occurs when changing the parameters causing the pair to cross the imaginary axis. A Hopf bifurcation is a critical point where the stability of a system switches resulting in a periodic solution.
在保守梯度系统中应用 RR 时,鞍点及其 EVec 在相位图的形状中起着关键作用,相位图是潜在动力学的几何表示。负 EVec 通常与与我们的分离正交的不稳定流形对齐 [42],从而给出我们可以扰动以找到不同解的方向(引理 14.3 [43])。然而,在非保守系统中,除了鞍点[45]之外,还有各种其他局部分叉[44]。例如,根据 [43] 的 Thm. 11.2,如果所有 的 EVal 都具有负实部,除了一对共轭非零的纯虚部 EVal,那么当改变参数导致对穿过虚轴时,就会发生 Hopf 分叉。Hopf 分岔是系统稳定性切换的关键点,导致周期性解。

3. Our Method: Game Ridge Rider (GRR)
3. 我们的方法:Game Ridge Rider (GRR)

In this section we introduce Game Ridge Rider (GRR), a generalization of Game Ridge Rider for learning in games.