Neural circuit policies enabling auditable autonomy
实现可审计自主性的神经回路策略

Mathias Lechner $^{()^{1, 4 凶}}^{4}$ , Ramin Hasani $^{(^{2, 3, 4} ⊠}$ , Alexander Amini $^{3}$ , Thomas A. Henzinger $^{(0)}^{1}$ , Daniela Rus $^{3}$ and Radu Grosu $^{(1)}^{2}$
Mathias Lechner $^{()^{1, 4 凶}}^{4}$ ， Ramin Hasani $^{(^{2, 3, 4} ⊠}$ ， Alexander Amini $^{3}$ ， Thomas A. Henzinger $^{(0)}^{1}$ ， Daniela Rus $^{3}$ 和 Radu Grosu $^{(1)}^{2}$

Abstract 抽象

A central goal of artificial intelligence in high-stakes decision-making applications is to design a single algorithm that simultaneously expresses generalizability by learning coherent representations of their world and interpretable explanations of its dynamics. Here, we combine brain-inspired neural computation principles and scalable deep learning architectures to design compact neural controllers for task-specific compartments of a full-stack autonomous vehicle control system. We discover that a single algorithm with 19 control neurons, connecting 32 encapsulated input features to outputs by 253 synapses, learns to map high-dimensional inputs into steering commands. This system shows superior generalizability, interpretability and robustness compared with orders-of-magnitude larger black-box learning systems. The obtained neural agents enable high-fidelity autonomy for task-specific parts of a complex autonomous system.
人工智能在高风险决策应用中的一个中心目标是设计一种算法，通过学习其世界的连贯表示和对其动态的可解释解释，同时表达泛化性。在这里，我们将类脑神经计算原理和可扩展的深度学习架构相结合，为全栈式自动驾驶汽车控制系统的任务特定隔间设计了紧凑的神经控制器。我们发现，具有 19 个控制神经元的单个算法，通过 253 个突触将 32 个封装的输入特征连接到输出，学习将高维输入映射到转向命令。与数量级更大的黑盒学习系统相比，该系统表现出卓越的泛化性、可解释性和稳健性。获得的神经代理能够为复杂自主系统的任务特定部分实现高保真自主性。

We set out to design a brain-inspired intelligent agent that learns to control an autonomous vehicle directly from its camera inputs (end-to-end learning to control

^{1, 2}

). The agent has to learn a coherent representation of its world from multidimensional sensory information, and utilize it to generalize well in unseen situations. Surprisingly, animals as small as the nematode Caenorhabditis elegans have mastered such an ability, to perform locomotion

^{3}

, motor control

^{4}

and navigation

^{5}

, through their near-optimal nervous system structure

^{6, 7}

and their harmonious neural information-processing mechanisms

^{8}

. In complex real-world scenarios, for instance, autonomous driving, such neural computation inspiration

^{9, 10}

can lead to more expressive artificial intelligence agents with models that are simultaneously accurate and explainable

^{11}

.
我们着手设计一个类脑智能代理，它学习直接从摄像头输入控制自动驾驶汽车（端到端学习控制

^{1, 2}

）。代理必须从多维感官信息中学习对其世界的连贯表示，并利用它来在看不见的情况下很好地泛化。令人惊讶的是，像线虫秀丽隐杆线虫这样的动物已经掌握了这种能力，通过它们近乎最佳的神经系统结构和

^{6, 7}

和谐的神经信息处理机制

^{8}

来执行运动

^{3}

、运动控制和

^{4}

导航

^{5}

。在复杂的现实世界场景中，例如自动驾驶，这种神经计算灵感

^{9, 10}

可以导致更具表现力的人工智能代理，其模型同时准确且可

^{11}

解释。

Although deep learning algorithms have achieved noteworthy successes in various high-dimensional tasks

^{2, 12 - 16}

, there still are important representation-learning challenges

^{17 - 19}

that have to be addressed. For instance, the domain of end-to-end control is safety critical

^{20}

. This demands interpretable dynamics of the intelligent controllers, as a first step towards investigating their safety issues. Furthermore, while learned vehicle control agents often show great performance in offline testing and simulations, this considerably degrades during live driving. In addition, it is desirable that agents learn the true causal structure

^{21, 22}

between the observed driving scenes and their corresponding optimal-steering commands (the specific task of the agent). Ideally, for a lane-keeping task, we wish that the agent implicitly learns to attend to the road’s horizon when taking a current steering decision, while maintaining an attractive performance on short-term steering. However, in practice, performant models have been shown to learn a variety of unfair

^{23}

and suboptimal

^{22}

input-output causal structures

^{24, 25}

. Finally, within the processing pipeline of the high-dimensional data-stream input, the agent has to incorporate a short-term memory mechanism capturing temporal dependencies.
尽管深度学习算法在各种高维任务

^{2, 12 - 16}

中取得了显著的成功，但仍有重要的表示学习挑战

^{17 - 19}

需要解决。例如，端到端控制领域是安全关键

^{20}

。这需要智能控制器的可解释动力学，作为研究其安全问题的第一步。此外，虽然学习到的车辆控制代理通常在离线测试和仿真中表现出出色的性能，但在实时驾驶期间，性能会大大降低。此外，智能体最好了解观察到的驾驶场景与其相应的最佳转向命令（智能体的具体任务）之间的真实因果结构

^{21, 22}

。理想情况下，对于车道保持任务，我们希望代理在做出当前转向决策时隐含地学会关注道路的地平线，同时在短期转向方面保持有吸引力的表现。然而，在实践中，高性能模型已被证明会学习各种不公平

^{23}

和次优

^{22}

的输入-输出因果结构

^{24, 25}

。最后，在高维数据流输入的处理管道中，代理必须包含一个捕获时间依赖性的短期记忆机制。

The successful end-to-end autonomous-control approaches to lane-keeping

^{2, 26 - 28}

(Fig. 1) rely solely on deep convolutional neural
成功的端到端自主控制车道保持

^{2, 26 - 28}

方法（图 1）完全依赖于深度卷积神经
network architectures

^{29}

, steering a vehicle at a time

t

, based on the most recent camera frame

^{30}

(Fig. 2a). While such feedforward models can properly drive the vehicle in case of ideal input data, they often fail if the data are noisy. This is because they do not exploit the temporal nature of the task, enabling them to filter out transient disturbances. As a result, temporary corruptions of the input stream (that is, sudden sunlight, as illustrated in Fig. 2a) lead to unstable predictions. On the contrary, recurrent neural networks (RNNs)

^{31, 32}

are a class of artificial neural networks that take into account past observations at a current output decision, through a feedback mechanism. Thus, in principle, they should lead to more robust end-to-end controllers (Fig. 2b). RNNs are trained over finite-length labelled training sequences by the backpropagation algorithm

^{33}

applied to their unfolded feedforward representation

^{32}

(Figs. 2c,d). Historically, training RNNs has been challenging due to their elevated or vanishing gradients during the learning phase

^{31, 32}

. Owing to the development of advanced, gated RNNs, such as the long short-term memory (LSTM)

^{34}

, the challenge is tackled by enforcing a constant error flow, through the fixation of the recurrent weights to 1 and removing nonlinearities within the feedback path

^{31}

.
网络架构

^{29}

，一次转向一辆车

t

，基于最新的相机帧

^{30}

（图 2a）。虽然这种前馈模型可以在理想输入数据的情况下正确驾驶车辆，但如果数据有噪声，它们通常会失败。这是因为它们没有利用任务的时间性质，从而能够过滤掉瞬态干扰。因此，输入流的暂时损坏（即突然的阳光，如图 2a 所示）会导致预测不稳定。相反，递归神经网络（RNN）

^{31, 32}

是一类人工神经网络，它通过反馈机制在当前输出决策中考虑过去的观察结果。因此，原则上，它们应该会带来更健壮的端到端控制器（图 2b）。RNN 通过应用于其展开的前馈表示

^{32}

的反向传播算法

^{33}

在有限长度标记的训练序列上进行训练（图 2c、d）。从历史上看，训练 RNN 一直具有挑战性，因为它们在学习阶段

^{31, 32}

的梯度升高或消失。由于先进的门控 RNN 的发展，例如长短期记忆（LSTM）

^{34}

，通过将递归权重固定为 1 并消除反馈路径

^{31}

内的非线性，通过强制执行恒定的误差流来应对挑战。

From a time-series-modelling point of view, having a constant error flow is a desirable property, as arbitrary data sequences may have long-term relations (Fig. 2d, right). However, in the case of end-to-end autonomous driving, learning long-term dependencies can be detrimental, due to the short-term causality of the underlying task. When driving a vehicle to follow the lane, humans do not recall images of the road from more than a few seconds ago to operate the steering wheel

^{35}

. Consequently, LSTM networks may capture spurious long-term dependencies that may have been present in the training data, and thus learn inadequate causal models

^{21}

. On the contrary, vanishing of gradients prevents RNNs from learning correlations of events with long-term-dependencies

^{36 - 38}

. This property counterintuitively enhances the real-world control performance of a learned RNN agent, as it places a prior on the temporal attention span of the network, to the most recent few observations.
从时间序列建模的角度来看，具有恒定的误差流是一个理想的属性，因为任意数据序列可能具有长期关系（图 2d，右）。然而，在端到端自动驾驶的情况下，由于底层任务的短期因果关系，学习长期依赖关系可能是有害的。当驾驶车辆沿车道行驶时，人类不会回忆起几秒钟前的道路图像来操作方向盘

^{35}

。因此，LSTM 网络可能会捕获训练数据中可能存在的虚假长期依赖关系，从而学习不充分的因果模型

^{21}

。相反，梯度的消失会阻止 RNN 学习具有长期依赖

^{36 - 38}

性的事件的相关性。这个属性反直觉地增强了学习的 RNN 代理在现实世界中的控制性能，因为它在网络的时间注意力跨度上先验了最近的几个观察值。

Fig. 1 | End-to-end driving. The process starts by collecting a considerable amount of human driving experiences, in a car that is equipped with camera(s) and in-car computing units. The diverse set of training samples are then edited (green boxes) and are labelled by their corresponding steering angle. An end-to-end training algorithm trains and validates an artificial neural network agent, in a supervised learning fashion to directly turn camera inputs into steering decisions. The obtained network is then deployed on the high-performance computing units mounted inside the car to drive the car autonomously in real unseen environments.
图 1 |端到端驾驶。该过程首先在配备摄像头和车载计算单元的汽车中收集大量人类驾驶经验。然后编辑不同的训练样本集（绿色框）并按其相应的转向角进行标记。端到端训练算法以监督学习方式训练和验证人工神经网络代理，以直接将摄像头输入转化为转向决策。然后将获得的网络部署到安装在车内的高性能计算单元上，以便在真实看不见的环境中自动驾驶汽车。

Fig. 2 | Recurrent network modules are essential for the lane-keeping tasks. a, A feedforward CNN network computes its output,

P (y_{t} ∣ l_{t})

, by relying solely on the current observation,

I_{t}

. Consequently, inputs that are corrupted by transient perturbations (bottom) will result in high output variance and faulty decisions. b, An RNN has access to past observations at a current driving step, enabling it to filter out transient corruptions that are present in the input stream. c, Training RNNs by unrolling their state in time. d, Then, applying backpropagation through time in an unfolded RNN. Purple derivatives indicate the dependency of the loss function’s derivative with respect to an RNN’s state weights to the evolution of the RNN’s state,

x (t)

, in time. Blurred images depict weaker attention of the RNN when computing a current decision.

n

is the number of unfolding steps.
图 2 |循环网络模块对于车道保持任务至关重要。a，前馈 CNN 网络仅依靠当前观测

I_{t}

值

P (y_{t} ∣ l_{t})

来计算其输出。因此，被瞬态扰动破坏的 inputs （bottom）将导致高 output variance 和错误的决策。b，RNN 可以访问当前驱动步骤的过去观察结果，使其能够过滤掉输入流中存在的瞬态损坏。c，通过及时展开 RNN 的状态来训练 RNN。d，然后，在展开的 RNN 中应用随时间的反向传播。紫色导数表示损失函数的导数相对于 RNN 状态权重对 RNN 状态

x (t)

随时间演变的依赖性。模糊图像描绘了在计算当前决策时 RNN 的注意力较弱。

n

是展开步骤数。

The development of a single, task-specific algorithm that universally satisfies the representation-learning challenges described above has been a central goal of artificial intelligence

^{9, 10}

. To advance towards this goal, we draw inspiration from the neural computations known to happen in biological brains

^{6, 7, 39, 40}

and achieve a remarkable degree of controllability

^{3 - 5, 8}

. We develop compact representations called neural circuit policies (NCPs), where each neuron has increased computational capabilities

^{41}

compared with contemporary
开发一种普遍满足上述表示学习挑战的单一、特定于任务的算法一直是人工智能的一个中心目标

^{9, 10}

。为了朝着这个目标前进，我们从已知发生在生物大脑中的神经计算中汲取灵感

^{6, 7, 39, 40}

，并实现了显着的可

^{3 - 5, 8}

控性。我们开发了称为神经回路策略（NCP）的紧凑表示，其中每个神经元的计算能力

^{41}

都比当代
deep models. We show that NCPs lead to sparse networks that are more easily interpretable and demonstrate this in the context of autonomous driving. We discovered that for the lane-keeping task mentioned above, very small networks of brain-inspired neural models (that is, networks with a control compartment consisting of only 19 neurons), in combination with compact convolutional neural networks (CNNs)

^{29}

, achieved superior performance, compared with state-of-the-art models, in learning how to steer a
深度模型。我们表明 NCP 会导致更容易解释的稀疏网络，并在自动驾驶的背景下证明了这一点。我们发现，对于上述车道保持任务，非常小的类脑神经模型网络（即控制室仅由 19 个神经元组成的网络）与紧凑的卷积神经网络（CNN）

^{29}

相结合，与最先进的模型相比，在学习如何操纵 a

c
(1) Insert four neural layers
（1）插入四个神经层
(2) Initialize sparse synapses
（2）初始化稀疏突触
(3) Wire targets with no synapse
（3）无突触的连线靶标
(4) Insert recurrent synapses
（4）插入返突触

NCP de

N_{c} command

Fig. 3 | Designing NCP networks with an LTC neural model. a, Representation of the neural state,

x_{i} (t)

, of a postsynaptic LTC neuron

i

receiving input currents from presynaptic neuron,

j

. The neural state is determined by the aggregation of the inflows/outflows to/from the cell.

I_{in}

is the external input currents,

I_{leakage}

is the leakage current. Synaptic currents

(I_{s_{i j}})

are set by an input dependent nonlinearity

f

that is a function of the presynaptic neural state,

x_{j} (t)

and its synaptic parameters (see Methods for further details).

b

, Representation of an NCP end-to-end network; it perceives the camera inputs that are transformed by a set of convolutional layers to a latent representation, which is exploited by the designed NCP (based on the steps described in c) to produce control actions, to command control orders. c, NCP design procedure based on rules 1 to 4 in the main text (see algorithms 2 to 6 in Methods).
图 3 |使用 LTC 神经模型设计 NCP 网络。a，神经状态的表示，

x_{i} (t)

，突触后 LTC 神经元

i

接收来自突触前神经元的输入电流，

j

。神经状态由流入/流出细胞的聚合决定。

I_{in}

是外部输入电流，

I_{leakage}

是漏电流。突触电流

(I_{s_{i j}})

由输入依赖性非线性设置

f

，该非线性是突触前神经状态

x_{j} (t)

及其突触参数的函数（有关详细信息，请参阅方法）。

b

、NCP 端到端网络的表示;它感知到由一组卷积层转换为潜在表示的相机输入，设计的 NCP （基于 C 中描述的步骤）利用该表示来产生控制动作，以命令控制命令。c，基于正文中规则 1 至 4 的 NCP 设计程序（参见方法中的算法 2 至 6）。
vehicle directly from high-dimensional inputs. Here we use the representation-learning challenges as the main criteria for assessing the performance of autonomous-control agents.
车辆直接来自高维输入。在这里，我们使用表征学习挑战作为评估自主控制代理性能的主要标准。

Designing and learning NCPs
设计和学习 NCP

To address the representation-learning challenges and the complexity of autonomous lane-keeping, we design an end-to-end learning system that perceives the inputs by a set of convolutional layers

^{42}

, extracts image features and performs control by an RNN structure, termed an NCP.
为了解决表示学习挑战和自主车道保持的复杂性，我们设计了一个端到端的学习系统，该系统通过一组卷积层

^{42}

感知输入，提取图像特征并通过称为 NCP 的 RNN 结构进行控制。

The network structure of NCPs is inspired by the wiring diagram of the C. elegans nematode

^{43}

. Many neural circuits within the nematode’s nervous system are constructed by a distinct four-layer hierarchical network topology. They receive environmental observations through sensory neurons. These are passed on to inter-neurons and command neurons, which generate an output decision. Finally, this decision is passed to the motor neurons to actuate its muscles. The wiring diagram of C. elegans achieves a sparsity of around

90 %

(ref.

^{6}

), with predominantly feedforward connections from sensors to intermediate neurons, highly recurrent connections among inter-neurons and command neurons, and feedforward connections from command neurons to motor neurons. This specific topology was shown to have attractive computational advantages, such as, efficient distributed control, requiring a small number of neurons

^{6}

, hierarchical temporal dynamics

^{8}

, robot-learning capabilities

^{44}

and maximal information propagation in sparse-flow networks

^{45}

.
NCP 的网络结构受到秀丽隐杆线虫线虫的布线图的启发

^{43}

。线虫神经系统中的许多神经回路是由独特的四层分层网络拓扑结构构成的。它们通过感觉神经元接收环境观察。这些被传递给神经元间和命令神经元，从而生成输出决策。最后，这个决定被传递给运动神经元以驱动其肌肉。秀丽隐杆线虫的布线图实现了大约

90 %

（参考文献）

^{6}

的稀疏性，主要是从传感器到中间神经元的前馈连接，神经元间和命令神经元之间的高度递归连接，以及从命令神经元到运动神经元的前馈连接。这种特定的拓扑结构被证明具有吸引人的计算优势，例如，高效的分布式控制，需要少量神经元

^{6}

，分层时间动力学

^{8}

，机器人学习能力和

^{44}

稀疏流网络中

^{45}

的最大信息传播。

Neural dynamics of NCPs are given by continuous-time ordinary differential equations (ODEs), originally developed to capture the dynamics of the nervous system of small species, such as C. elegans

^{41}

(Fig. 3a). At their core, NCPs possess a nonlinear time-varying synaptic transmission mechanism that improves their expressive power in modelling time series, compared with their deep learning counterparts

^{41}

. The foundational neural building blocks of NCPs are called liquid time constant (LTC) networks

^{41}

. Further details about LTCs are given in Methods.
NCP 的神经动力学由连续时间常微分方程（ODE）给出，ODE 最初是为了捕捉小物种（如秀丽隐杆线虫）的神经系统动力学而开发的

^{41}

（图 3a）。NCP 的核心是非线性时变突触传递机制，与深度学习相比，NCP 在建模时间序列方面的表达能力更高

^{41}

。NCP 的基础神经构建块称为液体时间常数（LTC）网络

^{41}

。有关 LTC 的更多详细信息，请参见方法.

The architecture of an NCP network is determined by the design principles introduced in rules

1 - 4

, corresponding to the steps presented in Fig. 3c, as follows:
NCP 网络的架构由 rules

1 - 4

中引入的设计原则决定，对应于图 3c 中所示的步骤，如下所示：
(1) Insert four neural layers

- N_{s}

sensory neurons,

N_{i}

inter-neurons,

N_{c}

command neurons and

N_{m}

motor neurons ((1) in Fig. 3c).
（1）插入四个神经层

- N_{s}

：感觉神经元、

N_{i}

神经元间、

N_{c}

命令神经元和

N_{m}

运动神经元（图 3c 中的（1））。
(2) Between every two consecutive layers-

\forall

source neuron, insert

n_{so-t}

synapses (

n_{so-t} \leq N_{t}

), with synaptic polarity

\sim Bernoulli (p_{2})

, to

n_{so-t}

target neurons, randomly selected

\sim Binomial (n_{so-t}, p_{1})

((2) in Fig. 3c).

n_{so-t}

is the number of synapses from source to target.

p_{1}

and

p_{2}

are probabilities corresponding to their distributions.
（2）在每两个连续层之间 -

\forall

源神经元，插入

n_{so-t}

具有突触极性的

\sim Bernoulli (p_{2})

突触（

n_{so-t} \leq N_{t}

），以

n_{so-t}

随机选择

\sim Binomial (n_{so-t}, p_{1})

目标神经元（图 3c 中的（2））。

n_{so-t}

是从源到目标的突触数。

p_{1}

和

p_{2}

是对应于其分布的概率。
(3) Between every two consecutive layers

- \forall

target neuron

j

with no synapse, insert

m_{so-t}

synapses (

m_{so-t} \leq \frac{1}{N_{t}} \sum_{i = 1, i \neq j}^{N_{t}} L_{t_{i}}

), where

L_{t_{i}}

is the number of synapses to target neuron

i

, with synaptic polarity (being excitatory or inhibitory)

\sim Bernoulli (p_{2})

, from

m_{so-t}

source neurons, randomly selected from

\sim Binomial (m_{so-t}

p_{3}

) ((3) in Fig. 3c).

m_{so-t}

is the number of synapses from source to target neurons with no synaptic connections.
（3）在每两个连续层

- \forall

之间，在没有突触的目标神经元

j

之间，插入

m_{so-t}

突触（

m_{so-t} \leq \frac{1}{N_{t}} \sum_{i = 1, i \neq j}^{N_{t}} L_{t_{i}}

），其中

L_{t_{i}}

是目标神经元的突触数量

i

，具有突触极性（兴奋性或抑制性），

\sim Bernoulli (p_{2})

来自

m_{so-t}

源神经元，从

\sim Binomial (m_{so-t}

中随机选择）（

p_{3}

图 3c 中的（3））。

m_{so-t}

是从源神经元到目标神经元的突触数，没有突触连接。
(4) Recurrent connections of command neurons-

\forall

command neuron, insert

l_{so-t}

synapses (

l_{so-t} \leq N_{c}

), with synaptic polarity

\sim Bernoulli (p_{2})

, to

l_{so-t}

target command neurons, randomly selected from

\sim Binomial (l_{so-t}, p_{4}) ((4)

in Fig. 3c).

l_{so-t}

is the number of synapses from one interneuron to target neurons.
（4）命令神经元的循环连接 -

\forall

命令神经元，插入

l_{so-t}

突触（

l_{so-t} \leq N_{c}

），具有突触极性

\sim Bernoulli (p_{2})

，以

l_{so-t}

目标命令神经元，从

\sim Binomial (l_{so-t}, p_{4}) ((4)

图 3c）中随机选择。

l_{so-t}

是从单个中间神经元到目标神经元的突触数。

Applying the NCP design principles above results in very compact and sparse networks of LTC neurons (see the NCP design algorithms in Methods). The learning system corresponding to the lane-keeping task consists of the convolutional frontend, stacked with the NCP network (Fig. 3b). This system is trained in an end-to-end, supervised learning fashion. Given a designed NCP network, we apply a semi-implicit ODE solver to obtain a numerically accurate and stable solution of the system

^{41}

. We then recursively fold the ODE solver call, into an RNN cell and prepare the system’s training pipeline. Further details on the training setup are provided in Methods. From the gradient propagation perspective, our approach gives rise to a vanishing gradient phenomenon, which, as described in Fig. 2d, is the preferable setting for learning a real-world autonomous vehicle control (see the proof in Methods).
应用上述 NCP 设计原则会导致 LTC 神经元的非常紧凑和稀疏的网络（参见方法中的 NCP 设计算法）。与车道保持任务相对应的学习系统由卷积前端组成，与 NCP 网络堆叠在一起（图 3b）。该系统以端到端的监督学习方式进行训练。给定一个设计的 NCP 网络，我们应用一个半隐式 ODE 求解器来获得数值准确且稳定的系统

^{41}

解。然后，我们将 ODE 求解器调用递归折叠到 RNN 单元中，并准备系统的训练管道。有关训练设置的更多详细信息，请参阅方法.从梯度传播的角度来看，我们的方法产生了梯度消失现象，如图 2d 所示，这是学习真实世界自动驾驶汽车控制的首选设置（参见方法中的证明）。

A large-scale selection of labelled training data were collected by recording the observations and actions of a human driver (see Methods for more details). End-to-end driving is a feedback control
通过记录人类驾驶员的观察和动作，收集了大量标记的训练数据（有关详细信息，请参阅方法）。端到端驾驶是一种反馈控制

$^{1}$ Institute of Science and Technology Austria (IST Austria), Klosterneuburg, Austria. $^{2}$ Technische Universität Wien (TU Wien), Vienna, Austria. $^{3}$ Massachusetts Institute of Technology (MIT), Cambridge, USA. $^{4}$ These authors contributed equally: Mathias Lechner, Ramin Hasani.
$^{1}$ 奥地利科学技术研究所（IST Austria），奥地利克洛斯特新堡。 $^{2}$ Technische Universität Wien （TU Wien），维也纳，奥地利. $^{3}$ 麻省理工学院（MIT），美国剑桥。 $^{4}$ 这些作者的贡献相同：Mathias Lechner、Ramin Hasani。
凶e-mail: mathias.lechner@ist.ac.at; rhasani@mit.edu
电邮： mathias.lechner@ist.ac.at;rhasani@mit.edu

Neural circuit policies enabling auditable autonomy 实现可审计自主性的神经回路策略

Abstract 抽象

Designing and learning NCPs设计和学习 NCP

Neural circuit policies enabling auditable autonomy
实现可审计自主性的神经回路策略

Designing and learning NCPs
设计和学习 NCP