这是用户在 2024-4-23 18:13 为 https://ieeexplore.ieee.org/document/9211403 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

LATICS:基于任务的低开销自适应间歇计算系统 | IEEE 期刊杂志 | IEEE Xplore --- LATICS: A Low-Overhead Adaptive Task-Based Intermittent Computing System | IEEE Journals & Magazine | IEEE Xplore
Scheduled Maintenance: On Tuesday, 23 April, IEEE Xplore will undergo scheduled maintenance from 1:00-3:00 PM ET (1700-1900 UTC). During this time, there may be intermittent impact on performance. We apologize for any inconvenience.
计划维护:4 月 23 日星期二,IEEE Xplore 将于美国东部时间下午 1:00-3:00(世界协调时 17:00-19:00)进行计划维护。在此期间,性能可能会受到间歇性影响。不便之处,敬请谅解。

LATICS: A Low-Overhead Adaptive Task-Based Intermittent Computing System
LATICS:基于任务的低开销自适应间歇计算系统

Publisher: IEEE 出版商:电气和电子工程师学会
Songran Liu; Wei Zhang; Mingsong Lv; Qiulin Chen; Nan Guan
刘松然、张伟、吕明松、陈秋林、关楠

Abstract:Energy harvesting promises to power billions of Internet-of-Things devices without being restricted by battery life. The energy output of harvesters is typically tiny and...View more
Abstract: 摘要
Energy harvesting promises to power billions of Internet-of-Things devices without being restricted by battery life. The energy output of harvesters is typically tiny and highly unstable, so the computing system must store program states into nonvolatile memory frequently to preserve the execution progress in the presence of frequent power failures. Task-based intermittent computing is a promising paradigm to provide such capability, where each task executes atomically and only states across task boundaries need to be saved. This article presents LATICS, a low-overhead adaptive task-based intermittent computing system, which dynamically decides the granularity of atomic execution to avoid unnecessarily frequent state saving when energy supply is sufficient. The novel feature of LATICS is to drastically reduce the amount of states to be saved at task boundaries compared with existing solutions. Notably, we disclose that skipping state saving at some task boundary may cause the system to store more states at other places, and thus leads to higher overall overhead. Therefore, LATICS enforces mandatory state saving at certain task boundaries regardless of the current energy condition to reduce state saving overhead. We implement LATICS on a real energy-harvesting platform based on MSP430 and experimentally compare against the state-of-the-art under different settings. The experimental results show that LATICS significantly reduces state saving overhead and improves execution efficiency compared to existing solutions.
能量收集有望为数十亿台物联网设备提供动力,而不受电池寿命的限制。能量收集器的能量输出通常非常微小且极不稳定,因此计算系统必须频繁地将程序状态存储到非易失性存储器中,以便在频繁断电的情况下保持执行进度。基于任务的间歇计算是提供这种能力的一种很有前途的模式,在这种模式下,每个任务都是原子式执行,只需要保存跨越任务边界的状态。本文介绍的 LATICS 是一种低开销、自适应的基于任务的间歇计算系统,它能动态决定原子执行的粒度,从而在能源供应充足时避免不必要的频繁状态保存。与现有解决方案相比,LATICS 的新特点是大幅减少任务边界的状态保存量。值得注意的是,我们发现在某些任务边界跳过状态保存可能会导致系统在其他地方存储更多状态,从而导致整体开销增加。因此,无论当前能量状况如何,LATICS 都会在某些任务边界强制保存状态,以减少状态保存开销。我们在基于 MSP430 的真实能量收集平台上实现了 LATICS,并在不同设置下与最先进的技术进行了实验比较。实验结果表明,与现有解决方案相比,LATICS 显著减少了状态保存开销,提高了执行效率。
Published in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ( Volume: 39, Issue: 11, November 2020)
发表于电气和电子工程师学会集成电路与系统计算机辅助设计论文集 ( 卷:39,期:11,2020 年 11 月)
Page(s): 3711 - 3723 页码3711 - 3723
Date of Publication: 02 October 2020
出版日期:2020 年 10 月 02 日
ISSN Information: ISSN 信息:
Publisher: IEEE 出版商:电气和电子工程师学会
Funding Agency:  资助机构:

SECTION I. 第 I 节.

Introduction 导言

Energy-harvesting systems are powered by energy collected from their environment, and thus can sustain for a long time not restricted by battery life [1]–​[4]. As the energy output of harvesters is typically tiny and unstable, the device may experience frequent power failures. Therefore, energy-harvesting systems must be able to timely store program states and keep making progress in the presence of frequent power failures.
能量收集系统的动力来自从环境中收集的能量,因此不受电池寿命的限制,可以维持很长时间 [1] - [4] 。由于能量收集器的能量输出通常很小且不稳定,设备可能会经常出现断电现象。因此,能量收集系统必须能够及时存储程序状态,并在频繁断电的情况下继续前进。

To this end, checkpointing, a widely used technique in fault-tolerant computing, has been applied to energy-harvesting systems [5]–​[15]. When program execution reaches a checkpoint, system states are saved to nonvolatile memory (NVM) for future restoration. However, checkpointing is generally expensive, as all volatile states, including processor context and data on volatile memory (VM), have to be saved to NVM.
为此,容错计算中广泛使用的检查点技术已被应用于能量收集系统 [5] - [15] 。当程序执行到检查点时,系统状态会被保存到非易失性存储器(NVM)中,以便将来恢复。然而,检查点的成本通常很高,因为所有易失性状态,包括处理器上下文和易失性内存(VM)上的数据,都必须保存到非易失性内存中。

The task-based paradigm [16]–​[20] [16], [17], [18], [19], [20] offers a lightweight alternative to checkpointing. Take the state-of-the-art task-based intermittent computing systems InK [18] for example, an application is programmed as a collection of atomic execution blocks called tasks. Each task is written as a function by the programmer, and the control flow between tasks is also explicitly defined by the programmer. System states are saved at task boundaries, and when recovering from a power failure occurred during the execution of a task, the system resumes execution from the beginning of that task. At task boundaries, the system only needs to store the global data, i.e., the variables whose lifetimes cross task boundaries and shared by multiple tasks, but not the local variables accessed inside each task. Essentially, task boundaries play a similar role as checkpoints. However, as the programmer has full control to decide how to group related functionalities and variables into a task, typically, much fewer data need to be saved at task boundaries than the standard checkpointing approach.
基于任务的范例 [16] - [20] [16] , [17] , [18] , [19] , [20] 为检查点提供了一种轻量级的替代方案。以最先进的基于任务的间歇计算系统 InK [18] 为例,应用程序被编程为一系列称为任务的原子执行块。每个任务都由程序员编写为一个函数,任务之间的控制流也由程序员明确定义。系统状态保存在任务边界,当从任务执行过程中发生的断电中恢复时,系统会从该任务的起点重新开始执行。在任务边界,系统只需存储全局数据,即生命周期跨越任务边界并由多个任务共享的变量,而无需存储每个任务内部访问的局部变量。从本质上讲,任务边界的作用与检查点类似。不过,由于程序员完全可以决定如何将相关功能和变量分组到任务中,因此任务边界需要保存的数据通常比标准检查点方法要少得多。

Many modern ultralow-power processors are equipped with NVM, such as FRAM, which has good enough read/write speed and lifetime to serve as the main memory [21]. Therefore, the system can allocate data on NVM and programs can directly address them, which is more efficient than copying them back and forth between VM and NVM. However, directly operating data on NVM may cause memory inconsistency due to the so-called write-after-read (WAR) problem [6] in the presence of power failure (detailed in Section II). To solve the problem, the system should maintain a backup version for the WAR variables. When recovering from a power failure occurred in some task, the system restores the WAR variables from the backup version to main memory, and then correctly resumes execution from the beginning of the aborted task.
许多现代超低功耗处理器都配备了 NVM,如 FRAM,它具有足够好的读写速度和寿命,可以作为主存储器 [21] 。因此,系统可以在 NVM 上分配数据,程序可以直接寻址这些数据,这比在虚拟机和 NVM 之间来回复制数据更有效。不过,直接在 NVM 上操作数据可能会在断电情况下因所谓的 "写入后读取(WAR)"问题 [6] 而导致内存不一致(详见 Section II )。为了解决这个问题,系统应为 WAR 变量保留一个备份版本。当从某个任务的断电中恢复时,系统会将备份版本中的 WAR 变量恢复到主内存中,然后正确地从中止的任务开始重新执行。

In the task-based paradigm, task granularity plays an important role in system performance. In general, the larger size is the tasks, the higher amount of incomplete execution is aborted at power failures. In the extreme case, if a task is too large to be finished by the energy harvested until the next power failure, the system will trap in the reexecution of this task and never make any progress (the so-called “Sisyphus effect”). On the other hand, if tasks are too small, the execution flow will cross task boundaries and thus save program states frequently, which leads to high overhead. To address this dilemma, some systems adopt an adaptive execution approach, which can skip state saving at task boundaries under good energy conditions.
在基于任务的模式中,任务粒度对系统性能起着重要作用。一般来说,任务规模越大,断电时中止的未完成执行量就越高。在极端情况下,如果任务规模过大,在下一次断电之前所收集的能量无法完成,系统就会陷入重新执行该任务的困境,永远无法取得任何进展(即所谓的 "西西弗斯效应")。另一方面,如果任务太小,执行流程就会跨越任务边界,从而频繁保存程序状态,导致高开销。为了解决这一难题,一些系统采用了自适应执行方法,在良好的能耗条件下,可以跳过任务边界的状态保存。

Skipping state saving at task boundaries makes ensuring memory consistency a complex problem. On the one hand, the overhead is reduced as state saving is done less frequently; on the other hand, skipping state saving may introduce new WAR variables. To keep the memory consistent, newly introduced WAR variables have to be identified and saved, which may adversely increase state saving overhead. A related work, Coala [22], addressed the problem by a very conservative approach: copying used data pages on-demand to VM in case operating the variables in the pages may cause the WAR problem. During state saving, Coala pessimistically saves all modified data pages to NVM. As shown in our experiments (Section V), this approach makes program execution very inefficient.
在任务边界跳过状态保存会使确保内存一致性成为一个复杂的问题。一方面,由于状态保存的频率降低,开销减少;另一方面,跳过状态保存可能会引入新的 WAR 变量。为了保持内存一致性,必须识别并保存新引入的 WAR 变量,这可能会增加状态保存的开销。相关工作 Coala [22] 通过一种非常保守的方法解决了这个问题:在操作数据页中的变量可能导致 WAR 问题的情况下,按需将使用过的数据页复制到虚拟机中。在保存状态时,Coala 会悲观地将所有修改过的数据页保存到 NVM 中。正如我们的实验 ( Section V )所示,这种方法使得程序执行效率非常低。

In this work, we present LATICS, a low-overhead adaptive task-based intermittent computing system that saves much fewer states at task boundaries than existing approaches. It is generally believed that skipping state saving at task boundaries is always beneficial, so related systems try to skip as many as possible if the energy condition allows. However, our work discloses that skipping state saving at one task boundary may cause the system to save more data at other places, and thus leads to higher overall overhead. In light of this observation, we propose a novel task-based intermittent computing system that enforces mandatory state saving at certain task boundaries, regardless of the current energy conditions, and conducts analysis to decide the breaking points and the states to be saved to ensure correct execution. Experiments conducted on an MSP430 platform show that LATICS drastically reduces the amount of states to save at task transitions and improves execution efficiency, compared with existing solutions.
在这项工作中,我们提出了基于任务的低开销自适应间歇计算系统 LATICS,它在任务边界保存的状态比现有方法要少得多。一般认为,在任务边界跳过状态保存总是有益的,因此相关系统会在能量条件允许的情况下尽可能多地跳过状态保存。然而,我们的工作发现,在一个任务边界跳过状态保存可能会导致系统在其他地方保存更多数据,从而导致整体开销增加。有鉴于此,我们提出了一种新颖的基于任务的间歇计算系统,无论当前的能量条件如何,该系统都会在某些任务边界强制保存状态,并通过分析来决定断点和需要保存的状态,以确保正确执行。在 MSP430 平台上进行的实验表明,与现有解决方案相比,LATICS 大幅减少了任务转换时需要保存的状态数量,提高了执行效率。

SECTION II. 第 II 节.

Overview 概述

The main objective of LATICS is to reduce state saving overhead and improve the execution efficiency of adaptive task-based intermittent computing systems.
LATICS 的主要目标是减少状态保存开销,提高基于任务的自适应间歇计算系统的执行效率。

At runtime, LATICS will coalesce the execution of several consecutive tasks into a large execution block called transactional execution block (TEB ). The first task in the TEB is called the leading task. System states are only saved at the beginning of the leading task but not at any other task boundary inside the TEB . If a power failure occurs and aborts a TEB , the system will resume from its leading task. If the TEB successfully finishes execution, a new TEB will be decided. The above process repeats until the program is completed. TEB is now the atomic execution block of the system. LATICS tends to use a larger TEB and thus skip state saving on more task boundaries if it has high confidence that the current energy condition is good and the possibility of power failures is low. A unique feature of LATICS is that LATICS also forces TEB s to end at certain task boundaries and conduct state saving, in order to reduce the overall state saving overhead. To see its benefit, we first explain what kind of states must be saved at the beginning of a TEB .
在运行时,LATICS 会将多个连续任务的执行合并为一个大的执行块,称为事务执行块( TEB )。 TEB 中的第一个任务称为主导任务。系统状态只保存在前导任务的开始位置,而不保存在 TEB 内的任何其他任务边界。 如果发生断电并中止了 TEB ,系统将从其前导任务恢复。如果 TEB 顺利完成执行,将决定新的 TEB 。上述过程不断重复,直至程序完成。 TEB 现在是系统的原子执行块。如果 LATICS 确信当前能源状况良好,电源故障的可能性较低,则倾向于使用较大的 TEB ,从而跳过更多任务边界上的状态保存。LATICS 的一个独特之处在于,它还强制 TEB s 在某些任务边界结束并进行状态保存,以减少整体状态保存开销。为了了解它的好处,我们首先解释一下在 TEB 开始时必须保存哪些状态。

We begin by explaining the WAR problem [6]. Fig. 1(a) shows three sequentially executed tasks, T1,T2 , and T3 . Each task executes as an individual TEB , i.e., if power fails during the execution of a task, the system resumes execution at the beginning of that task. u,x,y , and z are variables allocated on NVM, and a is a variable allocated on VM. Suppose power fails after “z=a+1 ,” the system will resume execution from the beginning of T2 . The correct read to z at a=z should return −1 (assigned in T1 ). However, the value 0 is actually returned as a result of the partial execution before power failure. This is the so-called WAR problem. We say z is a WAR variable, and the read and write to z have WAR dependency. Note that y is a WAR variable in T3 (y=y+1 means y ’s value is first read, then incremented by 1, and finally written back to y ).
我们首先解释 WAR 问题 [6]Fig. 1(a) 显示了三个顺序执行的任务,分别是 T1,T2 T3 。每个任务都是单独执行的 TEB ,也就是说,如果在任务执行过程中断电,系统会从该任务的起点重新开始执行。 u,x,y , 和 z 是在 NVM 上分配的变量,而 a 是在 VM 上分配的变量。假设在" z=a+1 "之后断电,系统将从 T2 开始恢复执行。在 a=z 处正确读取 z 应该返回 -1 (分配在 T1 中)。然而,由于断电前的部分执行,实际返回值为 0。这就是所谓的 WAR 问题。我们说 z 是一个 WAR 变量,对 z 的读取和写入都与 WAR 有关。注意 y T3 中的 WAR 变量( y=y+1 表示 y 的值先被读取,然后递增 1,最后写回 y )。

Fig. 1. - WAR problem and breaking point insertion.
Fig. 1.  图 1.

WAR problem and breaking point insertion.
WAR 问题和断点插入。

To avoid the WAR problem, a system should save all WAR variables at the beginning of each task. As shown in Fig. 1(b), if the value of z is saved at the beginning of T2 , when the system recovers from a power failure, z ’s correct value can be restored and then T2 continues to execute consistently. Saving the WAR variables at task boundaries is a major overhead in task-based intermittent computing systems.
为避免 WAR 问题,系统应在每个任务开始时保存所有 WAR 变量。如 Fig. 1(b) 所示,如果在 T2 开始时保存了 z 的值,当系统从断电中恢复时,就可以恢复 z 的正确值,然后 T2 就能持续执行。在任务边界保存 WAR 变量是基于任务的间歇计算系统的主要开销。

Now, we consider an adaptive task-based intermittent computing system that only saves states at the beginning of a TEB and skips state saving at the task boundaries inside the TEB . In Fig. 1(c), assume at runtime T2 and T3 are coalesced into a TEB , so the state saving between T2 and T3 is skipped. As power may fail anywhere in T2 and T3 , WAR variables inside both tasks have to be saved at the beginning of the TEB . Notably, x now becomes a new WAR variable as a result of coalescing T2 and T3 . If power fails after x=u+1 and the system resumes execution from T2 , reading x at u=x may be incorrect as a result of x=u+1 in the incomplete execution. Saving x introduces new overhead, a result of task coalescing.
现在,我们考虑一个基于任务的自适应间歇计算系统,该系统只在 TEB 开始时保存状态,而跳过 TEB 内任务边界的状态保存。 在 Fig. 1(c) 中,假设运行时 T2 T3 被合并为 TEB ,因此跳过 T2 T3 之间的状态保存。由于电源可能在 T2 T3 中的任何位置发生故障,两个任务中的 WAR 变量必须在 TEB 开始时保存。值得注意的是,由于合并了 T2 T3 x 现在变成了一个新的 WAR 变量。 如果在 x=u+1 之后断电,系统从 T2 开始恢复执行,那么在 u=x 读取 x 的结果可能会因为 x=u+1 的不完整执行而不正确。保存 x 会带来新的开销,这是任务聚合的结果。

Reduce State Saving by Breaking Point Insertion: Assume x is a large-size array, the benefits of skipping state saving between T2 and T3 may well be covered by the extra overhead of saving x . To address this problem, LATICS inserts breaking points into the program on task boundaries. When program execution reaches a breaking point, state saving must be conducted immediately. In the above example, LATICS inserts a breaking point between T2 and T3 and breaks the read and write dependency of the WAR variable x . As a result, x is excluded from the state saving at the beginning of the TEB .
通过插入断点减少状态保存:假设 x 是一个大尺寸数组,在 T2 T3 之间跳过状态保存的好处很可能被保存 x 的额外开销所覆盖。为了解决这个问题,LATICS 在程序的任务边界插入了中断点。当程序执行到中断点时,必须立即进行状态保存。在上例中,LATICS 在 T2 T3 之间插入了一个中断点,并中断了 WAR 变量 x 的读写依赖关系。因此,在 TEB 开始时, x 被排除在状态保存之外。

The rest of this article details the two main components of LATICS: 1) a static analysis to decide the breaking points and compute a set of variables to save for each task if it leads a TEB and 2) a runtime system, which decides how to group tasks into a TEB , manages state saving and restoration, and most importantly enforces the breaking point mechanism.
本文的其余部分将详细介绍 LATICS 的两个主要组成部分:1)静态分析,用于决定断点,并计算一组变量,以便在每个任务导致 TEB 时保存这些变量;2)运行时系统,用于决定如何将任务分组为 TEB ,管理状态保存和恢复,以及最重要的执行断点机制。

SECTION III. 第 III 节.

Analysis 分析

Saving states at TEB boundaries guarantees memory consistency for adaptive task-based intermittent computing systems. The main objective of this work is to minimize the state saving overhead during the execution of TEB s that are decided at runtime. This section presents our static analysis to achieve this goal. First, we show it is more precise to compute a specific set of breaking points and a set of variables to save for each task that leads a TEB than for the whole program; second, we present an algorithm to efficiently decide the breaking points and compute the variables to save at each leading task.
TEB 边界保存状态可保证基于任务的自适应间歇计算系统的内存一致性。这项工作的主要目标是尽量减少运行时决定的 TEB s 执行过程中的状态保存开销。本节将介绍我们为实现这一目标而进行的静态分析。首先,我们证明了为每个引导 TEB 的任务计算一组特定的断点和一组需要保存的变量,比为整个程序计算更精确;其次,我们提出了一种算法,可以高效地决定断点,并为每个引导任务计算需要保存的变量。

A. Rationale A.理由

To tightly identify a set of variables to save at the leading task of a TEB requires two levels of information: 1) the WAR variables that need to be saved and 2) which tasks are grouped into a TEB .
要在 TEB 的首要任务中严格确定要保存的变量集,需要两层信息:1)需要保存的 WAR 变量;2)哪些任务被归入 TEB 组。

To precisely identify the WAR variables in a TEB , we need to consider the following cases. First, a WAR variable in a task remains a WAR variable in the TEB that includes the task [12], [22]. Second, WAR variables introduced by coalescing tasks, such as x in Fig. 1(d), should be saved [12], [22]. Third, there is no need to save the variables which are only read or written, as power failures do not make the values of such variables inconsistent. Last but not least, coalescing multiple tasks may as well exclude some WAR variables by destroying the WAR dependency of their reads and writes. Suppose a TEB groups task T1 and T2 , and v is a WAR variable local to T2 . If there is a write operation to v in T1 , v is no longer a WAR variable in the TEB . When a power failure occurs in the TEB , the system will resume execution from T1 (the leading task of the TEB ). The write to v in T1 overwrites any value of v written in the TEB ’s incomplete execution before the power fails, thus we do not have to save v at the beginning of the TEB .
为了精确识别 TEB 中的 WAR 变量,我们需要考虑以下情况。首先,任务中的 WAR 变量在包括任务 [12] , [22]TEB 中仍然是 WAR 变量。第二,凝聚任务引入的 WAR 变量,如 Fig. 1(d) 中的 x ,应保存 [12] , [22] 。第三,没有必要保存只读或只写的变量,因为断电不会使这些变量的值不一致。最后但并非最不重要的一点是,凝聚多个任务也可能会通过破坏其读写的 WAR 依赖性来排除某些 WAR 变量。假设 TEB 将任务 T1 T2 组合在一起,而 v T2 的本地 WAR 变量。 如果在 T1 中对 v 进行了写操作,那么 v 就不再是 TEB 中的 WAR 变量。当 TEB 中发生断电时,系统将从 T1 TEB 的领导任务)开始恢复执行。在 T1 中对 v 的写入会覆盖断电前 TEB 未完成执行中写入的 v 值,因此我们不必在 TEB 开始时保存 v

In adaptive task-based intermittent computing systems, each TEB is decided only at runtime. The variables that need to be saved depend on which tasks are grouped together. The grouping is affected by the current power condition and future program paths. Adaptively deciding TEB s brings difficulties in precise identification of the variables to be included in state saving.
在基于任务的自适应间歇计算系统中,每个 TEB 都只能在运行时决定。需要保存的变量取决于哪些任务被分组。分组受当前功率状况和未来程序路径的影响。自适应地决定 TEB s 会给精确识别需要保存状态的变量带来困难。

Computing the States to Save at the Leading Task of a TEB : Due to the dynamic feature on how TEB s are decided, to compute a fixed set of variables to save for a whole program will be too pessimistic. So in this work, we propose to compute an individual set of variables for each task. If a TEB starts, only the subset of variables associated with its leading task needs to be saved. This approach helps to exclude unnecessary state saving. Furthermore, for a given leading task, the decision on the scope of a TEB may vary according to different power conditions, so the computed set of variables should cover TEB s with all possible scopes.
计算在 TEB 的首要任务中需要保存的状态:由于 TEB 的决定方式是动态的,为整个程序计算一组固定的需要保存的变量将过于悲观。因此,在这项工作中,我们建议为每个任务计算一组单独的变量。如果 TEB 启动,只需保存与其主导任务相关的变量子集。这种方法有助于排除不必要的状态保存。此外,对于给定的主导任务, TEB 的范围决定可能因不同的功率条件而异,因此计算的变量集应涵盖所有可能范围的 TEB s。

Deciding Breaking Points for the Leading Task of a TEB : A major observation in our study is that forcing the TEB to end before a task and conducting an in-situ state saving may reduce the overall amount of states to save, as shown in Fig. 1(d). Now, we have two options: 1) to decide a set of breaking points for the whole program or 2) to decide an individual set of breaking points for each leading task. LATICS chooses the latter option, the reason of which is given as follows.
决定 TEB 主导任务的断点:我们研究中的一个重要发现是,强制 TEB 在任务之前结束并进行原地状态保存可能会减少需要保存的状态总量,如 Fig. 1(d) 所示。现在,我们有两个选择:1) 为整个程序确定一组中断点,或 2) 为每个主导任务确定一组单独的中断点。LATICS 选择了后者,原因如下。

Consider an example in Fig. 2 with three tasks T1 , T2 , and T3 . WAR(a) in T1 means a is a WAR variable local to T1 . R(d) in T1 and T2 are read operations to variable d , and W(d) in T3 is a write operation to d . Consider coalescing T1 and T3 into TEB1 , and T2 and T3 into TEB2 . We can compute the sets of variables involved in the state saving at the beginning of TEB1 and TEB2 , which are {a,d} and {b,c,d} , respectively. In LATICS, we associate with each leading task a memory copy range which is the minimal address region that covers the state variables. For example, given the memory layout in Fig. 2(b), the memory copy ranges for T1 and T2 are [1, 6] and [0, 5], respectively. The main objective is to leverage the performance benefit of bulk copying (further discussed in Section V-D).
考虑 Fig. 2 中有三个任务 T1 T2 T3 的例子。 T1 中的 WAR (a) 表示 a T1 的本地 WAR 变量。 T1 中的 R(d) T2 是对变量 d 的读操作,而 T3 中的 W(d) 是对 d 的写操作。考虑将 T1 T3 合并为 TEB1 ,将 T2 T3 合并为 TEB2 。我们可以计算出 TEB1 TEB2 开始时的状态保存所涉及的变量集,它们分别是 {a,d} {b,c,d} 。在 LATICS 中,我们为每个前导任务关联了一个内存复制范围,即覆盖状态变量的最小地址区域。例如,考虑到 Fig. 2(b) 中的内存布局, T1 T2 的内存复制范围分别为 [1, 6] 和 [0, 5]。主要目的是利用批量复制的性能优势(在 Section V-D 中进一步讨论)。

Fig. 2. - Deciding breaking points for each leading task.
Fig. 2.  图 2.

Deciding breaking points for each leading task.
决定每项领导任务的突破点。

In order to reduce state saving overhead for T1 and T2 , let us consider inserting breaking points on the incoming edges to T3 . As a result, d is no longer a WAR variable of either TEB1 or TEB2 . The variables to save at the beginning of T1 is changed to {a} with the corresponding memory copy range [5, 6]. For TEB2 , the variables to save at the beginning of T2 is now {b,c} , but the memory copy range that covers {b,c} remains to be [0, 5]. Therefore, inserting a breaking point for TEB2 is pointless in reducing the total amount of data copying. Motivated by this observation, LATICS chooses to decide an individual set of breaking points for each task.
为了减少 T1 T2 的状态保存开销,让我们考虑在进入 T3 的边上插入断点。因此, d 不再是 TEB1 TEB2 的 WAR 变量。在 T1 开头要保存的变量改为 {a} ,相应的内存复制范围为 [5, 6]。对于 TEB2 ,要保存在 T2 开头的变量现在是 {b,c} ,但覆盖 {b,c} 的内存复制范围仍然是 [0, 5]。因此,为 TEB2 插入一个中断点对于减少数据复制总量毫无意义。受此启发,LATICS 为每个任务选择了一组单独的断点。

To summarize, to reduce state saving overhead, we propose to decide an individual set of breaking points for each leading task and also compute the variables to be saved at the leading task. The analysis is detailed in the following sections.
总之,为了减少状态保存开销,我们建议为每个主导任务确定一组单独的中断点,同时计算主导任务需要保存的变量。下文将详细分析。

B. Analysis Target and the Definition of Breaking Point
B.分析目标和突破点的定义

The analysis target is the WAR variables that are either local to a task or introduced by task coalescing (inserting breaking points may exclude WAR variables). A program is a directed graph, denoted by G , where each node is a task and the edges are control flows among tasks. Each task T can be further modeled by a control flow graph CFGT , comprising basic blocks (denoted by BB ) connected by control flow edges.
分析目标是任务本地的或由任务凝聚引入的 WAR 变量(插入断点可能会排除 WAR 变量)。程序是一个有向图,用 G 表示,其中每个节点是一个任务,边是任务间的控制流。每个任务 T 可以由控制流图 CFGT 进一步建模,控制流图由控制流边连接的基本模块(用 BB 表示)组成。

The WAR variables in a task can be defined and identified over the CFG of the task.
任务中的 WAR 变量可在任务的 CFG 中定义和识别。

Definition 1 (WAR Variable in a Task):

If a variable x is a WAR variable in task Ti , there must exist a basic block BBR which reads x , such that:


定义 1(任务中的 WAR 变量):如果变量 x 是任务 Ti 中的 WAR 变量,则必须存在一个读取 x 的基本模块 BBR ,这样才能实现:
  1. Condition 1: BBR is reachable from the starting basic block of CFGTi ;
    条件 1: BBR 可以从 CFGTi 的起始基本区块到达;

  2. Condition 2: along one of the paths satisfying condition 1 there is no basic block that writes x ;
    条件 2:在满足条件 1 的一条路径上,没有写入 x 的基本区块;

  3. Condition 3: there exists a basic block BBW that writes x and BBW is reachable from BBR in CFGTi .
    条件 3:存在一个写入 x 的基本区块 BBW ,并且 BBW 可以从 BBR 到达 CFGTi

By the above definition, WAR variables in a task can be obtained by analyzing the reachability between the basic blocks in CFGT . A widely used polynomial time algorithm to find reachability between the nodes on a directed graph is to compute the graph’s transitive closure. We also apply this algorithm in our approach to find the WAR variables in a task Ti and represent them with a set WARTi .
根据上述定义,任务中的 WAR 变量可通过分析 CFGT 中基本块之间的可达性获得。在有向图中,一种广泛使用的多项式时间算法是计算图的传递闭包,从而找到节点之间的可达性。在我们的方法中,我们也采用这种算法来查找任务 Ti 中的 WAR 变量,并用集合 WARTi 表示它们。

As discussed above, given a task Ti , TEB s with different scopes can be decided with Ti as the leading task, and each TEB may require a different set of variables to save at its beginning. To overcome this uncertainty, we propose to compute a set of variables for state saving considering TEB s with all possible scopes and led by Ti . We call the set of variables the possible WAR set of task Ti , denoted by PWSTi .
如上文所述,给定一个任务 Ti ,不同范围的 TEB s 可以由 Ti 作为主导任务来决定,而每个 TEB 在开始时可能需要一组不同的变量来保存。为了克服这种不确定性,我们建议在考虑所有可能范围的 TEB s 并由 Ti 主导的情况下,计算一组用于保存状态的变量。我们称这组变量为任务 Ti 的可能 WAR 集,用 PWSTi 表示。

Definition 2 (Possible WAR Set):

If variable xPWSTi , there must exist a basic block BBR in some task that reads x , such that:


定义 2(可能的 WAR 集):如果变量 xPWSTi ,在某个读取 x 的任务中,一定存在一个基本区块 BBR ,这样:
  1. Condition 1: BBR in reachable from the starting basic block of Ti in G ;
    条件 1:从 Ti 的起始基本区块 G 可以到达 BBR

  2. Condition 2: along one of the paths satisfying condition 1 there is no basic block of any task that writes x ;
    条件 2:在满足条件 1 的一条路径上,没有任何任务的基本区块写入 x

  3. Condition 3: there exists a basic block BBW in some task that writes x and BBW is reachable from BBR on G .
    条件 3:在某个任务中存在一个基本区块 BBW ,该区块会写入 x ,且 BBW G 上可从 BBR 到达。

Note that PWSTi is defined over G , so BBR and BBW in Definition 2 can reside in tasks other than Ti .
注意 PWSTi 是在 G 上定义的,因此 Definition 2 中的 BBR BBW 可以位于 Ti 以外的任务中。

Definition 3 (Breaking Point)1:

A breaking point is associated with an edge on G such that when the execution of a TEB reaches a breaking point, the PWS for the outgoing task of the edge must be saved.


定义 3(断点) 1 :断点与 G 上的一条边相关联,当 TEB 的执行到达断点时,必须保存这条边的出站任务 PWS

An important feature is, inserting breaking points does not introduce new WAR variables. First, the WAR variables in the tasks before the breaking point are not changed and should be saved at the beginning of the TEB . Second, the WAR variables in the tasks after the breaking point are not changed and should be saved at the leading task after the breaking point. Third, the WAR variables, whose read occurs before the breaking point and write occurs after the breaking point, need not be saved, as their WAR dependency is destroyed.
一个重要的特点是,插入中断点不会引入新的 WAR 变量。首先,断点前任务中的 WAR 变量不会改变,应保存在 TEB 开始处。其次,断点后任务中的 WAR 变量不会改变,应保存在断点后的前导任务中。第三,读取发生在中断点之前、写入发生在中断点之后的 WAR 变量无需保存,因为它们的 WAR 依赖关系已被销毁。

C. Computing PWS and Inserting Breaking Points
C.计算 PWS 和插入断点

Now, we present an algorithm to compute the PWS for each leading task, the complexity of which comes from breaking point insertion. As a breaking point may destroy the WAR dependency of some WAR variables, inserting a new breaking point to G may cause a recomputation of PWS for all leading tasks. Finding the optimal breaking point assignment will be computationally intractable, as one needs to explore all possible edge combinations. For efficiency, we present a heuristic algorithm which incrementally computes the PWS for a leading task and inserts breaking points on-the-fly. For a leading task Ti , the algorithm starts a traversal of an induced subgraph of G whose nodes are all reachable from Ti . During each step, the algorithm collects read and write operations to data allocated on NVM and identifies WAR variables. At each step to process a node, an evaluation is performed to decide whether breaking point(s) should be inserted on its incoming edge(s). If an insertion is decided, its impact on the PWS computation will take effect immediately.
现在,我们提出一种算法来计算每个领先任务的 PWS ,其复杂性来自于断点插入。由于断点可能会破坏某些 WAR 变量的 WAR 依赖关系,因此向 G 插入新的断点可能会导致重新计算所有领先任务的 PWS 。由于需要探索所有可能的边缘组合,因此寻找最佳断点分配在计算上是非常困难的。为了提高效率,我们提出了一种启发式算法,该算法可以增量计算领先任务的 PWS 并即时插入断点。对于领先任务 Ti ,该算法开始遍历 G 的诱导子图,其节点均可从 Ti 到达。在每一步中,算法都会收集对 NVM 上分配的数据的读写操作,并识别 WAR 变量。在处理节点的每个步骤中,都要进行评估,以决定是否要在其传入边上插入断点。如果决定插入,其对 PWS 计算的影响将立即生效。

Program Transformation: If a program graph G has loop structures, the result of PWS depends on the order of node traversal, and it is, in general, unclear when the traversal would terminate. To address this problem, we propose to transform G into G by the following unroll approach and compute PWS over G . We build a new construct for a loop structure which contains two copies of the loop body serially connected and replace the original loop structure with the new one. We say the first and second loop bodies are at level 1 and level 2, respectively. Each loop structure in G will be replaced by the unrolled structure to form G . If there are nested loops, the unroll is conducted from the innermost level to the outermost level. The resulting G is now a directed acyclic graph which is much easier to process for PWS computation. By the unroll operation, nodes in the same loop body may be replicated multiple times. If the nodes in G are generated from the same replication, we say they are in the same unroll level, e.g., T1 and T2 are in the same unroll level in the example of Fig. 3.
程序转换:如果程序图 G 有循环结构,那么 PWS 的结果取决于节点遍历的顺序,一般来说,遍历何时结束并不清楚。为了解决这个问题,我们建议通过以下展开方法将 G 转化为 G ,并在 G 上计算 PWS 。我们为循环结构构建了一个新的结构,它包含两个串行连接的循环体副本,并用新的循环结构替换原来的循环结构。我们说第一个和第二个循环体分别处于第 1 层和第 2 层。 G 中的每个循环结构都将被展开的结构替换,形成 G 。 如果有嵌套循环,则从最内层到最外层进行展开。这样得到的 G 现在是一个有向无环图,在 PWS 计算中更容易处理。通过展开操作,同一循环体中的节点可以被多次复制。如果 G 中的节点是由相同的复制产生的,我们就说它们处于相同的展开层,例如,在 Fig. 3 的例子中, T1 T2 就处于相同的展开层。

Fig. 3. - An example to explain 
${\textsf {PWS}}$
 computation and breaking points insertion.
Fig. 3.  图 3.

An example to explain PWS computation and breaking points insertion.
举例说明 PWS 计算和断点插入。

As for the structure of task-based programs, there can be doubts that “goto” style control flows may be produced as programmers are allowed to freely specify the control flow among tasks by task transition directives. But this situation will rarely occur. A program is a set of instructions to implement the functionality of an application, regardless of whether the program progresses continuously or intermittently. The task is only a construct allowing a programmer to specify places where system states are saved to survive power failures. Thus, a task-based program should be naturally programmed by inserting task boundaries into a correctly implemented ordinary program. As most programs are well structured, the task-based version obtained as above will remain well structured.
至于基于任务的程序结构,由于允许程序员通过任务转换指令自由指定任务间的控制流,因此可能会产生 "goto "式的控制流。但这种情况很少发生。无论程序是连续进行还是间歇进行,程序都是实现应用程序功能的一组指令。任务只是一种结构,允许程序员指定保存系统状态的位置,以便在断电后继续运行。因此,在正确执行的普通程序中插入任务边界,就能自然地编制出基于任务的程序。由于大多数程序都具有良好的结构,因此上述基于任务的版本也将保持良好的结构。

Definition 4 (Image and Image Set):

In G , each replica of a node/edge v is an image of v , and all the images of node/edge v , including v , form an image set of v , denoted by ISv .


定义 4(映像和映像集):在 G 中,节点/边 v 的每个副本都是 v 的映像,节点/边 v 的所有映像(包括 v )构成 v 的映像集,用 ISv 表示。

For example, T1 and T1 in Fig. 3 form an image set IST1 .
例如, Fig. 3 中的 T1 T1 构成一个图像集 IST1

From now on, we compute the PWS for each leading task and insert breaking point over G instead of G . We will prove any WAR variables in G will be definitely identified in G .
从现在起,我们将计算每个主导任务的 PWS 并在 G 而不是 G 上插入断点。我们将证明 G 中的任何 WAR 变量都将在 G 中被确定。

Definition 5 (Reachability Over Image Set):

In G , an image set ISB is reachable from another image set ISA , if there is at least one node y from ISB and one node x from ISA such that y is reachable from x in G .


定义 5(图像集上的可达性):在 G 中,如果至少有一个来自 ISB 的节点 y 和一个来自 ISA 的节点 x ,使得 y G 中可以从 x 到达,则图像集 ISB 可以从另一个图像集 ISA 到达。

Lemma 1 (Reachability Maintenance):

If node B is reachable from node A in G , ISB is reachable from ISA in G .


定理 1(可达性维护):如果节点 B 可以从 G 中的节点 A 到达,则 ISB 可以从 G 中的节点 ISA 到达。

Proof:

We prove this lemma by distinguishing two cases.


证明:我们通过区分两种情况来证明这个 Lemma。
  • Case 1:

    If A and B are in the same loop body in G , there are two subcases: 1) B is reachable from A by a path that does not include the back edge of the loop body, this path belongs to the loop body and is always retained in G according to our unroll method and 2) B is reachable from A by a path that includes the back edge, then there must be a path from A (at level 1) to B (at level 2) in G . So, we can conclude ISB is reachable from ISA in G .


    情况 1:如果 A B G 的同一个循环体中,则有两种子情况:1)根据我们的展开方法,这条路径属于循环体,并且总是保留在 G 中;2) B 可以通过一条包含后边缘的路径从 A 到达,那么在 G 中一定有一条路径从 A (第 1 层)到 B (第 2 层)。因此,我们可以得出结论,在 G 中,从 ISA 可以到达 ISB
  • Case 2:

    If A and B are in different loop bodies in G , without loss of generality, we assume A is in the outer loop and B is in the inner loop. There must be a path p1 from A to the entry node NB of B ’s loop body and a path p2 from NB to B . Note that p1 belongs to the outer loop and p2 belongs to the inner loop. Both paths are retained in G , so ISB is reachable from ISA . If otherwise A is in the inner loop and B is in the outer loop, then A ’s exit node will be used to bridge the reachability from A to B . The above property can be extended to nodes that span across more than two levels of loop structures, by repeating the above reasoning for adjacent loop levels.


    情况 2:如果 A B G 的不同循环体中,在不失一般性的前提下,我们假设 A 在外层循环中,而 B 在内层循环中。从 A B 循环体的入口节点 NB 必须有一条路径 p1 ,从 NB B 必须有一条路径 p2 。注意 p1 属于外循环,而 p2 属于内循环。这两条路径都保留在 G 中,因此 ISB 可以从 ISA 到达。 如果 A 在内层循环中,而 B 在外层循环中,那么 A 的出口节点将被用来桥接从 A B 的可达性。通过对相邻循环层重复上述推理,上述特性可扩展到跨越两层以上循环结构的节点。

Lemma 2 (Reachability Maintenance With Node Removal):

By removing node X from G and at the same time removing ISX from G , if node B is reachable from node A in G{X} , then ISB is reachable from ISA in G{ISX} .


定理 2(删除节点后的可达性维护):通过从 G 中删除节点 X ,同时从 G 中删除 ISX ,如果节点 B 可以从 G{X} 中的节点 A 到达,那么 ISB 可以从 G{ISX} 中的节点 ISA 到达。

Proof:

If node B is reachable from node A in G{X} , there must exist a path p in G from A to B and p does not contain X . By the construction of G , p also exists in G , making ISB reachable from ISA in G by path(s) in ISp . Removing ISX from G does not destroy ISp , as X is not on p in G . So ISB must be reachable from ISA in G{ISX} .


证明:如果节点 B 可以从 G{X} 中的节点 A 到达,那么 G 中一定存在一条从 A B 的路径 p ,并且 p 中不包含 X 。根据 G 的构造, p 也存在于 G 中,使得 ISB 可以通过 ISp 中的路径从 G 中的 ISA 到达。从 G 中移除 ISX 并不会破坏 ISp ,因为 X 不在 G 中的 p 上。所以 ISB 必须可以从 G{ISX} 中的 ISA 到达。

Theorem 1 (PWS Maintenance):

Any variable which is in PWST over G is in PWST over G .


定理 1 ( PWS 维持):任何变量在 PWST 上,在 G 上,在 PWST 上,在 G 上。

Proof:

Recall that if a variable xPWST , x must satisfy three conditions. Note that conditions 1 and 3 in the definition of PWS are essentially reachability between nodes; and condition 2 can be checked by removing all write nodes to x from G and then explore reachability between nodes. By Lemma 1 and Lemma 2, if x satisfies all three conditions over G , x must satisfy all the conditions over G , which means x is also in PWST if it is evaluated over G .


证明:回想一下,如果一个变量 xPWST x 必须满足三个条件。请注意, PWS 定义中的条件 1 和 3 本质上是节点间的可达性;而条件 2 可以通过从 G 中删除所有写入 x 的节点,然后探索节点间的可达性来检验。根据 Lemma 1Lemma 2 ,如果 x 满足 G 上的所有三个条件,那么 x 必须满足 G 上的所有条件,这意味着如果 x G 上求值,那么 x 也在 PWST 中。

PWS Computation and Breaking Point Insertion: Before introducing the algorithm to compute PWS , some definitions and notions used in the analysis are given here. We use WINT to denote the set of variables written in task T . Furthermore, we define read-through variable and write-break variable for task T used in the analysis.
PWS 计算和断点插入:在介绍计算 PWS 的算法之前,先给出分析中用到的一些定义和概念。我们用 WINT 表示在任务 T 中写入的变量集。此外,我们还定义了分析中使用的任务 T 的直读变量和写入中断变量。

Definition 6 (Read-Through Variable and Write-Break Variable):

A variable x is a read-through variable of task T , if there exists a path from the starting BB to the end BB of T such that there is a BBR on the path that read x , and there is no write to x on the subpath from the starting point of T to BBR . The set of read-through variables of T is denoted by RTT . A variable x is a write-break variable of task T , if for all paths from the starting BB to the end BB of T , there is a write to x on the path. The set of write-break variables of T is denoted by WBT .


定义 6(直读变量和断写变量):如果存在一条从 T 的起点 BB 到终点 BB 的路径,且路径上有一个 BBR 读取 x ,并且从 T 的起点到 BBR 的子路径上没有写入 x ,则变量 x 是任务 T 的直读变量。 T 的通读变量集合用 RTT 表示。如果从 BB 的起点 BB T 的终点 BB 的所有路径上都有写到 x 的内容,那么变量 x 就是任务 T 的写中断变量。 T 的写中断变量集用 WBT 表示。

Computing PWS: Algorithm 1 gives the proposed analysis which takes the graph G as input, computes the PWS for leading task Ti , and decides a set of edges to insert breaking points for Ti . The analysis starts with node Ti (line 4) and iteratively explores all the task nodes reachable from Ti (lines 5 and 18–20). Note that before computing PWS , the WART , RTT , WBT , and WINT are precomputed for each task T .
计算 PWS: Algorithm 1 给出了建议的分析,该分析将图 G 作为输入,计算主导任务 Ti PWS ,并决定为 Ti 插入断点的边集。分析从节点 Ti 开始(第 4 行),迭代探索 Ti 可以到达的所有任务节点(第 5 行和第 18-20 行)。请注意,在计算 PWS 之前,每个任务 T WART RTT WBT WINT 都已预先计算。

Algorithm 1 Compute PWS and Insert Breaking Points for a Leading Task
算法 1 计算 PWS 并插入领先任务的突破点

Input: 输入:

Program graph G , Leading task Ti
程序图 G , 领导任务 Ti

Output: 输出:

PWS for Ti ; BPS , the set of breaking points for Ti
PWS Ti BPS ,为 Ti 的突破点集合

1:

PWS= , BPS=

2:

for Each direct preceding node Tp of Ti do
对于 Ti 的每个直向节点 Tp

3:

RTp=,WTp=

4:

Ready queue RdyQ = { Ti }
就绪队列 RdyQ = { Ti }

5:

while RdyQ is not empty do

6:

T= Dequeue(RdyQ ) /* to evaluate task T */
T= Dequeue( RdyQ ) /* 评估任务 T */

7:

for each incoming edge E of T do
T 的每条传入边 E

8:

let TE be the starting node of E
TE 成为 E 的起始节点

9:

PWS=PWS(WARTWTE)(RTEWINT)

10:

if Range(PWS ) - Range(PWS ) >α×Cost then

11:

Insert a dummy task TBE on each edge in ISE
ISE 中的每条边上插入一个虚拟任务 TBE

12:

BPS=BPSTBE

13:

else 不然

14:

PWS=PWS

15:

RT=(p=pre(T)RpWBT)(RTTp=pre(T)Wp)

16:

WT=(p=pre(T)Wp)WBT

17:

mark T as VISITED T 标记为已访问

18:

for each node Tn NOT VISITED do
对于 Tn 未访问的每个节点,执行

19:

if all its direct preceding nodes are VISITED then
如果其所有直接前节点都被 VISITED,那么

20:

Enqueue(RdyQ , Tn )

In the evaluation of each task T , the PWS for Ti is updated by adding the newly identified WAR variables (line 9), which includes two cases.
在对每个任务 T 进行评估时,通过添加新识别的 WAR 变量(第 9 行)来更新 Ti PWS ,这包括两种情况。

  1. (WARTWTE) : If a variable is already a WAR variable in T , then it is potentially a WAR variable in a TEB led by Ti . If the variable is also a write-break variable in the task preceding T , the variable is no longer a WAR variable in the TEB , which has been explained in Section III-A.2
    (WARTWTE) :如果一个变量在 T 中已经是一个 WAR 变量,那么它在 Ti 引导的 TEB 中就有可能是一个 WAR 变量。 如果该变量在 T 之前的任务中也是一个断写变量,那么该变量在 TEB 中就不再是一个 WAR 变量,这在 Section III-A 中已经解释过了。 2

  2. RTEWINT : If there is a read to a variable in the preceding node of T and a write to the variable in T , a new WAR variable resulted by coalescing tasks is identified.
    RTEWINT :如果在 T 的前一个节点中存在对变量的读取,而在 T 中存在对该变量的写入,则会识别出一个由合并任务产生的新 WAR 变量。

The above evaluation is performed for all the incoming edges of T (line 7).
上述评估针对 T 的所有输入边进行(第 7 行)。

Inserting Breaking Points: Once the PWS for Ti is updated (the new PWS ), a breaking point decision will be made immediately (lines 10–14). Here, Range(PWS ) refers to the memory copy range for the set of variables in PWS . If the increase in the amount of memory copy exceeds a threshold, α×Cost , inserting a breaking point here is considered beneficial to reduce state saving overhead, and thus a breaking point is inserted. If a breaking point is inserted on edge E , Ti ’s TEB will not be updated, because: 1) the breaking point breaks the WAR dependency of the variables that fall into the set RTEWINT and 2) a state saving is forced at the beginning of T, so Ti ’s local WAR variables WART need not be saved at the beginning of Ti . The result of the new breaking point to the computation of Ti ’s PWS will take effect right away and be carried into the computation in later iterations.
插入断点:一旦 Ti PWS 被更新(新的 PWS ),就会立即做出断点决定(第 10-14 行)。这里,Range( PWS ) 指的是 PWS 中变量集的内存拷贝范围。 如果内存拷贝量的增加超过了阈值 α×Cost ,在这里插入中断点被认为有利于减少状态保存开销,因此会插入一个中断点。如果在边 E 上插入中断点, Ti TEB 将不会更新,因为1)断点打破了属于集合 RTEWINT 变量的 WAR 依赖关系;2)在 T 开始时强制保存状态,因此 Ti 的本地 WAR 变量 WART Ti 开始时无需保存。在计算 Ti PWS 时,新的中断点的结果将立即生效,并在以后的迭代中继续计算。

Note that when a breaking point is decided, it is inserted on all edges in the image set of E on G (line 11). This is because if we allow breaking points to be inserted in only one level of the loop in G , the runtime system has to distinguish which iteration the loop is executing to implement such a breaking point decision, which will introduce considerable runtime overhead. For runtime efficiency, we choose not to distinguish loop iterations when inserting breaking points.
请注意,当断点决定后,它将被插入 G E 图像集中的所有边上(第 11 行)。这是因为如果我们只允许在 G 循环的一个层级插入断点,运行时系统就必须区分循环执行的是哪一次迭代来执行断点决策,这将带来相当大的运行时开销。为了提高运行效率,我们选择在插入断点时不区分循环迭代。

The breaking point operation may bring a new issue. Consider an edge E from A to B in G . It is possible a breaking point is decided when evaluating E , which means in the earlier evaluation to E , the effect of inserting the breaking point on E is not considered. Now, we prove not considering the breaking point effect when evaluating E will not cause an actual WAR variable to be missed in the computed PWS .
断点操作可能会带来新的问题。考虑 G 中从 A B 的一条边 E ,有可能在评估 E 时决定了一个断点,这意味着在之前评估 E 时,没有考虑在 E 上插入断点的影响。现在,我们证明在计算 E 时不考虑断点效应不会导致在计算 PWS 时漏掉一个实际的 WAR 变量。

Not considering a breaking point for an edge E is equivalent to including a set of new paths starting from E and the nodes along these paths in PWS computation. So we prove including the new paths does not cause WAR variables to be missed in the PWS computation, by Lemma 3.
不考虑边 E 的断点,就等于在 PWS 计算中包含了一组从 E 开始的新路径以及这些路径上的节点。因此,我们通过 Lemma 3 证明,在 PWS 计算中,包含新路径不会导致遗漏 WAR 变量。

Lemma 3:

If a variable x is a WAR variable in graph G , then x is still a WAR variable in graph GN , where GN is produced by adding new edges and nodes into G .


定理 3:如果变量 x 是图 G 中的一个 WAR 变量,那么 x 仍然是图 GN 中的一个 WAR 变量,其中 GN 是通过在 G 中添加新的边和节点产生的。

Proof:

By the definition of PWS , there must exist a path p1 satisfying conditions 1 and 2 and a path p2 satisfying condition 3 to make x a WAR variable. No matter what read and write operations may occur on a new path pn , adding pn will not destroy p1 or p2 in GN , so all the conditions are still met and x is a WAR variable in GN .


证明:根据 PWS 的定义,必须存在满足条件 1 和 2 的路径 p1 和满足条件 3 的路径 p2 才能使 x 成为一个 WAR 变量。无论在新路径 pn 上进行怎样的读写操作,添加 pn 都不会破坏 GN 中的 p1 p2 ,所以所有条件仍然满足, x GN 中的一个 WAR 变量。

Then, we discuss the cost evaluation. Whether to insert a breaking point is decided by comparing the benefit of reducing state saving and the cost of conducting a state saving operation.
然后,我们讨论成本评估。通过比较减少状态保存的收益和进行状态保存操作的成本,来决定是否插入断点。

The benefit is reduced memory copying by the breaking point, i.e., Range(PWS )-Range(PWS ). The state saving operation cost, denoted by Cost , is the overhead of the common operations to prepare the execution environment of the next TEB , such as updating the scope of the next TEB and setting the pointer of the leading task for the runtime system. Note that an important feature of breaking points is that inserting a breaking point will not introduce new WAR variables. In LATICS, we multiply a factor α on the cost side, meaning that a breaking point will be inserted if the benefit is larger enough than Cost . The value of α should be set by the system designer, and different values of α may have different results in inserting breaking points. In this work, we set α to 5. Further exploration of other α settings is left to future work.
这样做的好处是减少了断点处的内存复制,即 Range( PWS )-Range( PWS )。状态保存操作成本(用 Cost 表示)是为准备下一个 TEB 的执行环境而进行的普通操作的开销,如更新下一个 TEB 的作用域和为运行时系统设置前导任务的指针。请注意,断点的一个重要特点是,插入断点不会引入新的 WAR 变量。在 LATICS 中,我们在成本端乘以系数 α ,这意味着如果收益大于 Cost ,就会插入一个突破点。 α 的值应由系统设计者设定,不同的 α 值可能会对插入突破点产生不同的结果。在本文中,我们将 α 设为 5。对其他 α 设置的进一步探索将留待今后的工作中进行。

In our implementation, we add a dummy task TBE on the edge with a breaking point to avoid runtime maintenance of program paths. Once TBE is reached, the PWS of the immediate succeeding task of TBE will be saved.
在我们的实现过程中,我们在边缘添加了一个虚拟任务 TBE 并设置了一个断点,以避免程序路径的运行维护。一旦到达 TBE ,紧接着的 TBE 任务的 PWS 将被保存。

Updating Read and Write Sets: After the PWS is computed, RT and WT are updated for task T (line 15–16). The purpose of maintaining RT and WT is to transfer the read and write behaviors of the data on NVM to T ’s succeeding nodes to be involved in identifying new WAR variables in future iterations.
更新读写集:计算 PWS 后,为任务 T 更新 RT WT (第 15-16 行)。维护 RT WT 的目的是将 NVM 上数据的读写行为转移到 T 的后续节点,以便在未来的迭代中参与识别新的 WAR 变量。

RT first includes the reading set for all incoming nodes (p=pre(T)Rp ), but if a write to a variable x in RT is guaranteed to occur for all paths in T , x is excluded, as the read to x no longer causes WAR dependency in future nodes to be explored. The read-through variables of T are also included in RT . But if a write to a read-through variable is guaranteed at all possible incoming edges, the variable will not cause WAR dependency in the future and is thus excluded. (RTTp=pre(T)Wp) implements the latter case.
RT 首先包括所有进入节点的读取集 ( p=pre(T)Rp ),但如果 RT 中变量 x 的写入保证 T 中的所有路径都会发生,则 x 被排除在外,因为 x 的读取不再导致未来节点中的 WAR 依赖性被探索。 T 的直读变量也包含在 RT 中。但是,如果在所有可能的传入边上都保证了对直读变量的写入,那么该变量在未来就不会引起 WAR 依赖性,因此被排除在外。 (RTTp=pre(T)Wp) 实现了后一种情况。

The update of WT is relatively simple. If a write-breaking variable appears in all incoming nodes, it is included in WT . The write-breaking variables in T are also included in WT and carried to the evaluation of future nodes.
WT 的更新相对简单。如果一个破写变量出现在所有进入的节点中,它就会被包含在 WT 中。 T 中的破写变量也会包含在 WT 中,并被用于对未来节点的评估。

The algorithm iterates until all nodes reachable from Ti are explored. The computational complexity of the proposed algorithm is O(N) , where N is the number of nodes in G .
该算法不断迭代,直到探索完 Ti 可到达的所有节点。建议算法的计算复杂度为 O(N) ,其中 N G 中的节点数。

A Running Example: To illustrate how Algorithm 1 works, we use an example in Fig. 3 to show the computation of PWS and breaking point insertion. For simplicity, we omit the evaluation to T1 . After T1 is evaluated, we have:
运行示例:为了说明 Algorithm 1 的工作原理,我们用 Fig. 3 中的一个例子来说明 PWS 和断点插入的计算。为简单起见,我们省略了对 T1 的计算。对 T1 进行求值后,我们得到

PWS = {a}

RdyQ={T2,T3} , RT1={b} , WT1= .

Step 1: Dequeue T2 and compute PWS :
第 1 步: 删除 T2 并计算 PWS

PWS={a} , PWS={a,b}

RdyQ={T3} , RT2={b} , WT2={d} .

In this step, there is a path from T1 to T2 on which T1 reads b and T2 writes b , so b is a new WAR variable and is included in PWS . Given α=5 and Cost=1 , Range(PWS )-Range(PWS ) < 5, so no breaking point is inserted, and PWS is updated as PWS . Then, RT2 and WT2 are also updated with the information in Fig. 3(b).
在这一步中,有一条从 T1 T2 的路径,在这条路径上 T1 b T2 b ,所以 b 是一个新的 WAR 变量,并包含在 PWS 中。给定 α=5 Cost=1 ,Range( PWS )-Range( PWS ) < 5,所以没有插入断点, PWS 更新为 PWS 。然后, RT2 WT2 也根据 Fig. 3(b) 中的信息进行更新。

Step 2: Dequeue T3 and compute PWS :
第 2 步: 删除 T3 并计算 PWS

PWS={a,b} , PWS={a,b}

RdyQ={T4} , RT3={b,c} , WT3={d} .

Similar to step 1, RT3 , WT3 , and PWS are updated. After the analysis of T3 , T4 is inserted into RdyQ as all of its preceding nodes (T2 and T3 ) have been visited.
与步骤 1 类似,更新 RT3 WT3 PWS 。在对 T3 进行分析后,由于 T4 前面的所有节点( T2 T3 )都已被访问过,因此将 T4 插入到 RdyQ 中。

Step 3: Dequeue T4 and evaluate the edge from T2 to T4 :
步骤 3:删除 T4 并评估从 T2 T4 的边:

PWS={a,b} , PWS={a,b} .

Then, evaluate the edge from T3 to T4 :
然后,评估从 T3 T4 的边:

PWS={a,b} , PWS={a,b,c} .

Since Range(PWS )-Range(PWS ) = 20 > 5, we insert a breaking point on this edge and do not update PWS . RT4 , WT4 , and RdyQ are updated as follows:
由于 Range( PWS )-Range( PWS ) = 20 > 5,我们在这条边上插入一个断点,不更新 PWS 。对 RT4 WT4 RdyQ 的更新如下:

RdyQ={T1} , RT4={b} , WT4={c,d} .

After all the nodes in G have been explored, the final PWST1 is {a,b} . A breaking point is inserted on the edge from T3 to T4 . Each time this breaking point is reached and T1 is the leading task of a TEB , PWST4 will be saved.
在探索了 G 中的所有节点后,最终的 PWST1 {a,b} 。在 T3 T4 的边上插入了一个断点。每次到达这个断点,并且 T1 TEB 的引导任务时, PWST4 就会被保存下来。

SECTION IV. 第 IV 节.

Runtime System 运行系统

We build a runtime system to implement the core functionalities of the proposed approach. The runtime system makes dynamic decisions on task coalescing, manages data buffers, conducts state saving and restoration, and most importantly enforces mandatory state saving at breaking points. We will first give the workflow of LATICS’ runtime system and then detail its main components.
我们建立了一个运行时系统来实现所建议方法的核心功能。运行时系统会对任务凝聚做出动态决策,管理数据缓冲区,进行状态保存和恢复,最重要的是在中断点强制保存状态。我们将首先介绍 LATICS 运行时系统的工作流程,然后详细介绍其主要组成部分。

A. Main Work Flow A.主要工作流程

The behavior of the runtime system is specified by Algorithm 2, and we use the example in Fig. 4 to explain how it works. A possible execution trace may be as follows.
运行时系统的行为由 Algorithm 2 规定,我们用 Fig. 4 中的例子来解释它是如何工作的。可能的执行轨迹如下

  1. The program starts and the runtime system decides a TEB led by task T1 and containing three tasks.
    程序启动后,运行系统判定由任务 T1 领导的 TEB 包含三个任务。

  2. System states are saved by copying the data in memory copy range [0, 5] covering PWST1 from the working buffer to the backup buffer.
    保存系统状态的方法是,将内存复制范围 [0, 5] 内涵盖 PWST1 的数据从工作缓冲区复制到备份缓冲区。

  3. T1 , T2 , and T3 execute until T3 is finished.
    执行 T1 T2 T3 直到 T3 结束。

  4. As the current TEB is done, the runtime system creates a new TEB at this point and makes a decision to let the TEB coalesce six tasks.
    由于当前的 TEB 已完成,运行时系统会在此时创建一个新的 TEB 并决定让 TEB 凝聚六个任务。

  5. As T5 is the leading task of the new TEB , data in the memory copy range [2, 5] covering PWST5 are copied from the working buffer to the backup buffer.
    由于 T5 是新的 TEB 的领导任务,因此内存拷贝范围 [2, 5] 中涵盖 PWST5 的数据会从工作缓冲区拷贝到备份缓冲区。

  6. A power failure occurs in the execution of T2 (after T5 ).
    在执行 T2 时( T5 之后)发生断电。

  7. After recovery, the runtime system identifies the power failure, therefore, it restores the system states at the beginning of the aborted TEB by copying data in the memory copy range [2, 5] from the backup buffer to the working buffer.
    恢复后,运行时系统会识别出电源故障,因此会通过将内存复制范围 [2, 5] 内的数据从备份缓冲区复制到工作缓冲区来恢复中断的 TEB 开始时的系统状态。

  8. The runtime system creates a new TEB , the TEB coalesces four tasks, and execution resumes from T5 . The new TEB continues and the program takes the right branch.
    运行时系统会创建一个新的 TEB TEB 聚合四个任务,然后从 T5 开始继续执行。新的 TEB 继续执行,程序进入右分支。

  9. After T2 is finished, the runtime system encounters a breaking point on the incoming edge to T4 . Although the current TEB is not finished yet, a state saving is forced at this point, saving the data in memory copy range [2, 5] covering PWST4 to the backup buffer. As a result, if a power failure occurs later, the system will resume execution from T4 instead of the original leading task T5 .
    T2 完成后,运行时系统会在 T4 的入边上遇到一个断点。虽然当前的 TEB 尚未完成,但此时会强制保存状态,将覆盖 PWST4 的内存拷贝范围 [2, 5] 中的数据保存到备份缓冲区。因此,如果稍后发生断电,系统将从 T4 而不是原来的主导任务 T5 开始恢复执行。

  10. T5 is finished, and a new TEB starting from T2 will be decided by the runtime system.
    T5 结束,新的 TEB 将从 T2 开始,由运行系统决定。

  11. Execution continues until the program is completed.
    程序将继续执行,直至结束。

Fig. 4. - A running example of the runtime system.
Fig. 4.  图 4.

A running example of the runtime system.
运行系统的运行示例

Algorithm 2 Runtime System
算法 2 运行时系统

1:

Let currTEB be a newly created TEB
currTEB 成为新创建的 TEB

2:

Let T0 be the leading task of currTEB
T0 成为 currTEB 的主导任务

3:

if recovered from power failure then
如果从断电中恢复,则

4:

RESTORE(T0 )

5:

currTEB = DECISION();  currTEB = decision();

6:

while not end of the program do
如果程序未结束,则执行

7:

Ti=T0

8:

if no power failure then
如果没有断电,则

9:

SAVE(Ti ) 保存( Ti )

10:

while currTEB is not finished do
currTEB 未完成时

11:

if Ti is a breaking task then
如果 Ti 是一个中断任务,那么

12:

T0=Ti

13:

SAVE(Ti ) 保存( Ti )

14:

EXECUTE(Ti )

15:

UPDATE_HISTORY()

16:

Ti = NEXT(Ti )

17:

currTEB = DECISION()

B. Core Components B.核心组成部分

LATICS allocates local variables that are only accessed in a single task on SRAM and allocates the variables shared among tasks on FRAM. LATICS maintains two buffers for the shared data: 1) a working buffer and 2) a backup buffer. The working buffer serves as part of the main memory and stores all the shared data. The backup buffer stores a backup copy for the set union of the PWS for all leading tasks, which is a subset of the shared data. Note that the PWS for each leading task is computed from the shared data.
LATICS 在 SRAM 上分配仅在单个任务中访问的局部变量,在 FRAM 上分配任务之间共享的变量。LATICS 为共享数据维护两个缓冲区:1) 工作缓冲区和 2) 备份缓冲区。工作缓冲区作为主内存的一部分,存储所有共享数据。备份缓冲区存储所有主导任务的 PWS 集合的备份副本,这是共享数据的一个子集。请注意,每个领先任务的 PWS 是根据共享数据计算得出的。

The SAVE() function conducts state saving. It is invoked when the program starts for the first time, and when a TEB successfully finishes execution and the runtime system decides a new TEB . The main operation is to save the variables in the memory copy range covering the PWS of the leading task from the working buffer to the backup buffer.
SAVE() 函数用于保存状态。当程序首次启动时,以及当 TEB 成功结束执行并由运行时系统决定新的 TEB 时,都会调用该函数。主要操作是将内存拷贝范围内的变量从工作缓冲区保存到备份缓冲区,这些变量覆盖了前导任务 PWS 的内存拷贝范围。

The RESTORE() function restores states and is only invoked after the system recovers from a power failure. States saved at the beginning of the aborted TEB are restored. The main operation is to copy the variables in the memory copy range covering the PWS of the leading task from the backup buffer to the working buffer. SAVE() and RESTORE() are the key operations to ensure memory consistency for the system.
RESTORE() 函数用于恢复状态,只有在系统从断电中恢复后才会调用。在中断的 TEB 开始时保存的状态将被恢复。主要操作是将内存拷贝范围内的变量从备份缓冲区拷贝到工作缓冲区,这些变量覆盖了前导任务的 @1 #。SAVE() 和 RESTORE() 是确保系统内存一致性的关键操作。

DECISION() is the function to realize adaptive execution by deciding the TEB s. The function is invoked either after a TEB successfully finishes or after the system recovers from a power failure. In LATICS, DECISION() predicts the available energy considering the execution history recorded by UPDATE_HISTORY(), and then makes a decision on the number of tasks to be coalesced into the next TEB .
DECISION() 是通过决定 TEB s 来实现自适应执行的函数。该函数在 TEB 成功完成后或系统从断电中恢复后被调用。在 LATICS 中,DECISION() 会根据 UPDATE_HISTORY() 所记录的执行历史来预测可用能量,然后决定下一个 TEB 要合并的任务数量。

Enforcing the Breaking Point Mechanism: During execution, if the runtime system identifies a breaking point before a task T , it ends the current TEB and conducts state saving by invoking SAVE(T ). From now on, the rest of the original TEB forms a new TEB and continues execution. If power fails in the new TEB , the system will later resume from T , instead of the leading task of the original TEB .
执行中断点机制:在执行过程中,如果运行时系统在任务 T 前发现了断点,就会结束当前的 TEB 并调用 SAVE( T ) 进行状态保存。从现在起,原 TEB 的其余部分组成新的 TEB 并继续执行。如果新的 TEB 断电,系统随后将从 T 开始恢复,而不是执行原 TEB 的主导任务。

Note that the state save/restore mechanism in LATICS differs from the state-of-the-art task-based system InK [18]. In both LATICS and InK, task execution operates on the working buffer. After a task is finished, InK changes the role of its working buffer and backup buffer by a pointer swap, and right before starting the next task, the runtime system copies all data in the new backup buffer to the new working buffer for memory consistency. While LATICS maintains only a subset of the data (the set union of the PWS for all leading tasks) in the backup buffer. By this means, LATICS reduces memory usage and avoids unnecessary state saving.
需要注意的是,LATICS 的状态保存/恢复机制不同于最先进的基于任务的系统 InK [18] 。 在 LATICS 和 InK 中,任务执行都是在工作缓冲区上进行的。任务完成后,InK 会通过指针交换改变工作缓冲区和备份缓冲区的角色,并在启动下一个任务前,运行时系统会将新备份缓冲区中的所有数据复制到新工作缓冲区,以保持内存一致性。而 LATICS 只在备份缓冲区中保留数据的一个子集(所有前导任务的 PWS 的集合联合)。通过这种方式,LATICS 可以减少内存使用量,避免不必要的状态保存。

SECTION V. 第 V 节.

Experiments and Evaluation
实验与评估

A. Experimental Setup A.实验装置

LATICS provides a solution for efficient adaptive execution of task-based intermittent computing systems. Currently, LATICS is implemented based on the state-of-the-art task-based system InK [18]. In principle, it can be applied to other task-based systems as well. The LATICS runtime system and the application program are compiled into the binary executable using the TI v18.12.4 compiler by TI Code Composer Studio.
LATICS 为基于任务的间歇计算系统的高效自适应执行提供了解决方案。目前,LATICS 是基于最先进的基于任务的系统 InK [18] 实现的。 原则上,它也可应用于其他基于任务的系统。LATICS 运行时系统和应用程序由 TI Code Composer Studio 使用 TI v18.12.4 编译器编译成二进制可执行文件。

We deploy LATICS on TI’s MSP430FR5994 launchpad equipped with a microcontroller, 8KB VM (SRAM) and 256KB NVM (FRAM) all clocked at their highest speed. We allocate local variables of each task on SRAM, and the data that are shared by tasks on FRAM. Bulk-copying between memories uses DMA. The hardware board is powered by a programmable power supplier by which we can generate different power traces to evaluate LATICS under different power conditions.
我们在 TI 的 MSP430FR5994 发射板上部署了 LATICS,该发射板配备了微控制器、8KB VM(SRAM)和 256KB NVM(FRAM),均以最高速度运行。我们在 SRAM 上分配每个任务的局部变量,在 FRAM 上分配任务共享的数据。存储器之间的批量复制使用 DMA。硬件电路板由可编程电源供应器供电,通过它我们可以生成不同的功率轨迹,以评估不同功率条件下的 LATICS。

LATICS is compared against two state-of-the-art task-based intermittent computing systems: 1) InK [18] and 2) Coala [22]. InK is a task-based system that saves states at all task boundaries. A working buffer and a backup buffer are implemented to manage the data allocated on NVM. LATICS differs with InK, in that LATICS is an adaptive task-based intermittent computing system which may skip state saving when the power supply is sufficient, and more importantly, with the support of static analysis, LATICS saves only the PWS of a task that leads a TEB which is a small subset of the data on NVM. Coala is a recently proposed adaptive task-based intermittent computing system. At runtime, Coala dynamically predicts the number of tasks to coalesce from execution history. Coala implements a paging system and copies used data pages on-demand to VM, and at the end of a TEB , saves all modified data pages to a backup buffer. Comparably, LATICS adopts the breaking point mechanism to reduce unnecessary state saving, and only saves PWS for the leading task of a TEB .
LATICS 与两个最先进的基于任务的间歇计算系统进行了比较:InK 是一个基于任务的系统,可在所有任务边界保存状态。系统采用工作缓冲区和备份缓冲区来管理分配到 NVM 上的数据。LATICS 与 InK 的不同之处在于,LATICS 是一种基于任务的自适应间歇计算系统,当电力供应充足时,它可以跳过状态保存,更重要的是,在静态分析的支持下,LATICS 只保存任务的 PWS ,而该任务会导致 TEB ,这是 NVM 上数据的一小部分。Coala 是最近提出的基于任务的自适应间歇计算系统。在运行时,Coala 会根据执行历史动态预测需要合并的任务数量。Coala 实现了分页系统,按需将使用过的数据页复制到虚拟机,并在 TEB 结束时将所有修改过的数据页保存到备份缓冲区。相比之下,LATICS 采用了断点机制来减少不必要的状态保存,只保存 TEB 任务的前导任务 PWS

We use eight benchmark programs adopted in InK and Coala to evaluate system performance. The benchmarks are cold-chain equipment monitoring (CEM), cuckoo filter (CK), cyclic redundancy check (CRC), bitcount (BC), Dijkstra shortest path algorithm (DIJ), selection sort algorithm (SRT), RSA encryption (RSA), and activity recognition (AR).
我们使用 InK 和 Coala 中采用的八个基准程序来评估系统性能。这些基准程序包括:冷链设备监控 (CEM)、布谷鸟过滤器 (CK)、循环冗余校验 (CRC)、比特计数 (BC)、Dijkstra 最短路径算法 (DIJ)、选择排序算法 (SRT)、RSA 加密 (RSA) 和活动识别 (AR)。

In the benchmarks, the read/write to the global variables shared among tasks is explicitly programmed by two macros, __GET(v ) and __SET(v ), where v is a global variable. The transitions between tasks are programmed by the NEXT(Ti ) macro indicating the next task is Ti . The CFG for each task is extracted from the task’s source code and the program graph G is constructed by connecting the tasks’ CFG with task transitions. WAR dependency is obtained by exploring G and tracking the read/write operations.
在基准测试中,任务间共享全局变量的读/写是通过两个宏__GET( v )和__SET( v )明确编程的,其中 v 是一个全局变量。任务之间的转换由 NEXT( Ti ) 宏编程,表示下一个任务是 Ti 。每个任务的 CFG 都是从任务的源代码中提取的,而程序图 G 则是通过连接任务的 CFG 和任务转换来构建的。通过探索 G 并跟踪读/写操作,可获得 WAR 依赖性。

B. Evaluation Methods B.评估方法

LATICS’ performance is evaluated and compared with InK and Coala with the above benchmarks. We run each benchmark program 20 times and report the averaged execution time for a single run. The total execution time for each benchmark is recorded from the start to the end of its execution and is further divided into state saving time and task execution time. The state saving time is the time spent on state saving at the beginning of TEB s. The task execution time refers to the time to execute the task code. Both parts include the partial execution in the aborted TEB s resulted by power failure.
我们用上述基准对 LATICS 的性能进行了评估,并与 InK 和 Coala 进行了比较。我们将每个基准程序运行 20 次,并报告单次运行的平均执行时间。每个基准程序从开始执行到结束的总执行时间被记录下来,并进一步分为状态保存时间和任务执行时间。状态保存时间是指 TEB s 开始时用于保存状态的时间。这两部分时间都包括因断电而中止的 TEB s 中的部分执行时间。

For task coalescing, we adopt Coala’s weighted energy-guided coalescing strategy for a fair comparison. The strategy essentially collects the execution history to obtain an energy capacity indicating how many tasks can be coalesced within the energy capacity. Once a TEB successfully finishes, the capacity will be halved, and once a power failure occurs, the capacity will be updated according to the past execution history. All the parameters are set to the same with Coala.
在任务聚合方面,我们采用 Coala 的加权能量引导聚合策略进行公平比较。该策略主要是通过收集执行历史记录来获得一个能量容量,表明在能量容量范围内可以凝聚多少个任务。一旦 TEB 成功完成,容量将减半;一旦发生断电,容量将根据过去的执行历史进行更新。所有参数设置与 Coala 相同。

To test the compared systems under different power supplies, we conduct experiments with periodic power traces in which power failure occurs periodically during system execution with a given period called power cycle. Table II reports the results under periodic power traces, where the second column is the power cycle. Experiments are conducted for eight power cycle configurations 1,2,3,,8 ms, which are in the typical range of power cycles in intermittent systems powered by capacitors [17]. Due to limited space, only the results for power cycle 1, 2, 4, and 8ms are listed in Table II. We also conducted experiments on random power traces that reflect real-life scenarios. In these traces, the time gap between two consecutive power failures is randomly chosen from a range [1ms, 8ms]. Fig. 5 reports the experimental results under random power traces.
为了测试比较系统在不同电源下的运行情况,我们使用周期性电源轨迹进行了实验,即在系统执行过程中定期发生电源故障,给定周期称为电源周期。 Table II 报告了周期性电源跟踪下的结果,其中第二列是电源周期。实验针对 1,2,3,,8 ms 的八种电源周期配置进行,这些配置属于由电容器供电的间歇式系统中电源周期的典型范围 [17] 。由于篇幅有限,在 Table II 中只列出了功率周期 1、2、4 和 8ms 的结果。我们还对反映现实生活场景的随机功率轨迹进行了实验。在这些轨迹中,连续两次断电之间的时间间隔是从 [1ms, 8ms] 范围内随机选择的。 Fig. 5 报告了随机电源轨迹下的实验结果。

TABLE I State Saving Size for all Tasks (in Bytes)
表 I 所有任务的状态保存大小(以字节为单位)
Table I- 
State Saving Size for all Tasks (in Bytes)
TABLE II Execution Time Results Under Periodic Power Traces With Different Power Cycles
表 II 不同功率周期的周期性功率跟踪下的执行时间结果
Table II- 
Execution Time Results Under Periodic Power Traces With Different Power Cycles
Fig. 5. - Execution time results under random power traces (normalized to InK’s total execution time).
Fig. 5.  图 5.

Execution time results under random power traces (normalized to InK’s total execution time).
随机功率轨迹下的执行时间结果(与 InK 的总执行时间进行归一化)。

We also further discuss: 1) the effectiveness of our proposed approach by comparing it with an exhaustive search method; 2) the impact of α to breaking point decision; and 3) the performance of different state copying methods.
我们还进一步讨论了1) 将我们提出的方法与穷举搜索方法进行比较,以确定其有效性;2) α 对断点决策的影响;3) 不同状态复制方法的性能。

C. Empirical Results and Evaluation
C.实证结果与评估

For all experiments under all power traces, LATICS outperforms InK and Coala with respect to the state saving time and the total execution time, which shows that LATICS has low state saving overhead and high execution efficiency. We now give a detailed evaluation of the results.
在所有功率跟踪下的所有实验中,LATICS 的状态保存时间和总执行时间都优于 InK 和 Coala,这表明 LATICS 的状态保存开销低,执行效率高。现在我们对结果进行详细评估。

First, we investigate how much state saving can be reduced by our proposed approach. Table I lists the results of InK and LATICS. The numbers for InK are the total state saving size for all tasks, i.e., a product of the size of system states and the number of tasks in the program. For LATICS, we sum up the sizes of the memory copy ranges covering the PWS for all tasks (assuming each task can serve as a leading task) computed by our proposed method. The results show that by inserting breaking points and by the computation of PWS , very few data need to be saved at the leading tasks of TEB s, so state saving overhead is significantly reduced in LATICS.
首先,我们研究了我们提出的方法可以减少多少状态节省。 Table I 列出了 InK 和 LATICS 的结果。InK 的数字是所有任务的总状态保存大小,即系统状态大小与程序中任务数量的乘积。对于 LATICS,我们将用我们提出的方法计算出的覆盖所有任务 PWS 的内存复制范围的大小(假设每个任务都可以作为主导任务)相加。结果表明,通过插入断点和计算 PWS ,在 TEB s 的主导任务中需要保存的数据非常少,因此 LATICS 的状态保存开销显著减少。

Second, the performance of LATICS under random power traces is evaluated, the results of which are shown in Fig. 5. The figure illustrates the total execution time for LATICS, InK, and Coala, normalized to that of InK. The gray part in each bar refers to the state saving time, and the slashed part refers to the task execution time. For all benchmarks, LATICS has the smallest total execution time. InK consumes a large portion of time on state saving as it pessimistically saves all system states at all task boundaries. Coala saves at the end of a TEB the data pages that have been modified since the beginning of the TEB . Coala’s state saving is not efficient, since: 1) writes to non-WAR variables may also cause a data page to be saved and 2) as Coala conducts memory copy in pages, the actual amount of data copied in state saving is larger than the total size of all modified variables. LATICS outperforms InK and Coala because LATICS adaptively group tasks into TEB s and by adopting the breaking point mechanism and the PWS computation, unnecessary state saving at the beginning of the TEB is avoided as much as possible. Similar comparison results can also be witnessed in the experiments on periodic power traces, shown in Table II.
其次,评估了 LATICS 在随机功率轨迹下的性能,结果如 Fig. 5 所示。图中显示了 LATICS、InK 和 Coala 的总执行时间,并将 InK 的执行时间归一化。每个条形图中的灰色部分指的是状态保存时间,斜线部分指的是任务执行时间。在所有基准中,LATICS 的总执行时间最小。InK 悲观地在所有任务边界保存所有系统状态,因此在状态保存上消耗了大量时间。Coala 会在 TEB 结束时保存 TEB 开始后修改过的数据页。Coala 的状态保存并不高效,因为:1)对非 WAR 变量的写入也可能导致数据页被保存;2)由于 Coala 以页为单位进行内存复制,因此状态保存时实际复制的数据量大于所有修改变量的总大小。LATICS 的性能优于 InK 和 Coala,这是因为 LATICS 自适应地将任务分组为 TEB s,并通过采用断点机制和 PWS 计算,尽可能避免在 TEB 开始时进行不必要的状态保存。类似的比较结果也可以在周期性功率跟踪的实验中看到,如 Table II 所示。

To further explain the results, we explore in detail the CEM benchmark which has significant performance improvement. The program graph of CEM (tasks’ CFG omitted, only showing intertask transitions) is given in Fig. 6, which contains 12 tasks. There are read operations to a large-size array A in T5 , T6 , and T7 and write operations to A in T8 . Breaking points were inserted on the incoming edges to T8 , which breaks the WAR dependency of A and thus considerably reduces state saving. Some benchmarks, such as RSA, do not exhibit significant improvement in Fig. 5 because they spend execution time in loops that do not access the variables that are in InK’s states but excluded by the breaking points in LATICS.
为了进一步解释结果,我们详细探讨了性能显著提高的 CEM 基准。CEM 的程序图(省略任务的 CFG,只显示任务间的转换)如 Fig. 6 所示,其中包含 12 个任务。在 T5 T6 T7 中有对一个大数组 A 的读操作,在 T8 中有对 A 的写操作。在进入 T8 的边上插入了断点,这就打破了 A 的 WAR 依赖性,从而大大减少了状态保存。一些基准(如 RSA)在 Fig. 5 中没有表现出明显的改进,因为它们在循环中花费了执行时间,而这些循环并不访问 InK 状态中的变量,但被 LATICS 中的断点排除在外。

Fig. 6. - Program graph of CEM.
Fig. 6.  图 6.

Program graph of CEM. CEM 的程序图。

Third, experimental results for periodic power traces are shown in Table II (dashes in the table mean the program failed to progress). Regarding task execution time, LATICS provides comparable performance to InK, only slightly larger in some benchmarks. As tasks in LATICS are coalesced into larger TEB s, in the presence of power failures, LATICS may undergo more reexecution time with larger atomic execution blocks (i.e., TEB s). Note that this is a general phenomenon for all adaptive task-based intermittent computing systems. Coala performs worst in task execution time. To avoid the WAR problem, Coala adopts an inefficient paging system that copies the requested data page on-demand to VM. Coala’s paging system considerably sacrifices execution efficiency. Also shown in Table II, when the power cycle increases, LATICS performs better with less state saving overhead, in that the benefit of skipping state saving within the TEB s increases with larger power cycles. We also list the number of tasks that are executed in one power cycle for all benchmarks in Table III. LATICS executes more tasks in a power cycle than InK and Coala, which also shows that LATICS has less state saving overhead and thus generally makes better progress.
第三,周期性功率跟踪的实验结果显示在 Table II 中(表中的破折号表示程序未能取得进展)。在任务执行时间方面,LATICS 的性能与 InK 相当,只是在某些基准测试中略微超出。由于 LATICS 中的任务被凝聚成更大的 TEB s,因此在电源故障的情况下,LATICS 可能会在更大的原子执行块(即 TEB s)上经历更多的重新执行时间。请注意,这是所有基于任务的自适应间歇计算系统的普遍现象。Coala 在任务执行时间方面表现最差。为了避免 WAR 问题,Coala 采用了低效的分页系统,按需将请求的数据页复制到虚拟机。Coala 的分页系统大大牺牲了执行效率。同样如 Table II 所示,当功率周期增加时,LATICS 的性能更好,状态保存开销更少,因为在 TEB s 内跳过状态保存的好处随着功率周期的增加而增加。我们还在 Table III 中列出了所有基准在一个功率周期内执行的任务数。与 InK 和 Coala 相比,LATICS 在一个电源周期内执行的任务数量更多,这也表明 LATICS 的状态保存开销更少,因此通常能取得更好的进展。

TABLE III Number of Tasks Finished in Each Power Cycle
表 III 每个电源周期完成的任务数
Table III- 
Number of Tasks Finished in Each Power Cycle

D. Further Discussion D.进一步讨论

Effectiveness of the Breaking Point Insertion Approach: To justify the effectiveness of our proposed breaking point insertion approach (Algorithm 1), we implemented an exhaustive search method (abbreviated ES) to find the “optimal” solution and compared it with our approach.
断点插入法的有效性:为了证明我们提出的断点插入法 ( Algorithm 1 ) 的有效性,我们采用了穷举搜索法(简称 ES)来寻找 "最优 "解决方案,并与我们的方法进行了比较。

In fact, the optimal breaking point locations depend on the program paths and the power traces, and an optimal solution only exists with known program path and power trace. As the executed program path and the actual power trace is only known at runtime, a clairvoyant method is required to search the optimal solution.
事实上,最佳断点位置取决于程序路径和功率轨迹,只有在已知程序路径和功率轨迹的情况下才存在最佳解决方案。由于只有在运行时才能知道已执行的程序路径和实际功率轨迹,因此需要一种 "千里眼 "方法来搜索最优解。

To this end, we run each benchmark program once with a periodic power trace (1ms power cycle) and obtain an execution history. Then, an induced program graph is generated by excluding the paths not taken in the execution history. An exhaustive search is conducted on the induced program graph by enumerating whether a breaking point is inserted on each edge of the graph. Note that as data inputs for all benchmarks are fixed in our experiment, the program is guaranteed to take the same path each time it is executed. In each experiment, we run the benchmark program with the same power trace and record the total runtime. If the search does not finish after 20 hours, we stop the search and report the best solution so far. Table IV gives the results for ES and our approach.
为此,我们用周期性功率跟踪(1 毫秒功率周期)运行每个基准程序一次,并获得执行历史记录。然后,通过排除执行历史中未走过的路径,生成一个诱导程序图。在诱导程序图上进行穷举搜索,枚举图中每条边是否插入了断点。请注意,在我们的实验中,所有基准的数据输入都是固定的,因此可以保证程序每次执行时都会走相同的路径。在每次实验中,我们都以相同的功率跟踪运行基准程序,并记录总运行时间。如果搜索在 20 小时后仍未结束,我们就会停止搜索,并报告迄今为止的最佳解决方案。 Table IV 给出了 ES 和我们方法的结果。

TABLE IV Minimal Execution Times (in ms) Obtained by ES and LATICS Under Periodic Power Trace with 1ms Power Cycle
表 IV ES 和 LATICS 在 1 毫秒功率周期的周期性功率跟踪下获得的最短执行时间(单位:毫秒
Table IV- 
Minimal Execution Times (in ms) Obtained by ES and LATICS Under Periodic Power Trace with 1ms Power Cycle

We compare the minimal total runtime obtained by ES and the total runtime by LATICS. The exhaustive search for CRC, DIJ, and SRT were finished and the results obtained are optimal. It can be seen that the total runtimes of LATICS for the three benchmarks are slightly larger than those of ES. The reasons are twofold: first, ES is clairvoyant while LATICS is not; second, our approach, i.e., Algorithm 1, is a heuristic method which only considers a subset of the problem state space. The exhaustive search for the other 5 benchmarks did not finish within 20 hours. This is because even the program path is known, searching for an optimal breaking point assignment still needs to explore a huge state space. Note that 20 hours is still not enough for ES to find sufficiently good results so the execution times obtained by ES are larger than those of LATICS for some benchmarks.
我们比较了 ES 获得的最小总运行时间和 LATICS 获得的总运行时间。我们完成了对 CRC、DIJ 和 SRT 的穷举搜索,得到的结果均为最优。可以看出,LATICS 对三个基准的总运行时间略大于 ES。原因有二:第一,ES 是千里眼,而 LATICS 不是;第二,我们的方法,即 Algorithm 1 ,是一种启发式方法,只考虑了问题状态空间的一个子集。其他 5 个基准的穷举搜索并没有在 20 小时内完成。这是因为,即使程序路径已知,搜索最优断点分配仍然需要探索巨大的状态空间。需要注意的是,对于 ES 来说,20 个小时仍然不足以找到足够好的结果,因此对于某些基准,ES 所获得的执行时间要大于 LATICS 所获得的执行时间。

Impact of α to Breaking Point Insertion: Parameter α is involved in deciding whether a breaking point will be inserted when Algorithm 1 explores the unrolled graph (line 10). Only when the difference in the size of the memory copy range exceeds α×Cost , a breaking point is inserted. We conducted experiments with α=2,5,10 under a power trace with 1ms power cycle to explore how α will affect the results.
α 对插入断点的影响:参数 α 参与决定在 Algorithm 1 探索未滚动图形时是否插入断点(第 10 行)。只有当内存复制范围的大小差异超过 α×Cost 时,才会插入断点。我们用 α=2,5,10 在电源周期为 1ms 的电源跟踪下进行了实验,以探索 α 对结果的影响。

The total runtime and backup time are shown in Table V. First, the total runtimes for all three configurations are very close, which shows that the values of α within the explored range do not significantly affect the results for the benchmarks. The total runtime is the lowest for the α=5 configuration. If the value of α is too small, e.g., α=2 , too many breaking points will be inserted, so the benefit of task coalescing is decreased. If the value of α is too large, e.g., α=10 , too few breaking points will be inserted, which compromises the benefit of breaking point.
总运行时间和备份时间以 Table V 表示。首先,所有三种配置的总运行时间都非常接近,这表明在探索范围内的 α 值不会对基准测试结果产生重大影响。 α=5 配置的总运行时间最少。如果 α 的值太小,例如 α=2 ,就会插入过多的断点,因此任务聚合的好处就会减少。如果 α 的值过大,例如 α=10 ,插入的中断点就会过少,从而影响中断点的效益。

TABLE V Results with Different α Values Under Periodic Power Trace With 1ms Power Cycle
表 V 在 1 毫秒功率周期的周期性功率跟踪下不同 α 值的结果
Table V- 
Results with Different 
$\alpha$
 Values Under Periodic Power Trace With 1ms Power Cycle

Performance of State Copying: In LATICS, the PWS is copied to a backup buffer at the beginning of a leading task. Instead of copying the exact PWS , we copy a memory copy range that covers the PWS . The main motivation is that bulk copying by DMA is much faster than copying the variables one by one. As the variables of a PWS may not be stored in continuous addresses in the memory, a memory copy range may contain variables other than the PWS to save at a point. Even though, copying the data in the memory copy range in bulk is in most cases more efficient.
状态复制的性能:在 LATICS 中, PWS 会在引导任务开始时复制到备份缓冲区。我们不复制准确的 PWS ,而是复制一个涵盖 PWS 的内存复制范围。主要原因是通过 DMA 进行批量复制比逐个复制变量要快得多。由于 PWS 的变量不一定以连续地址存储在内存中,因此内存拷贝范围可能包含除 PWS 以外的变量,以保存在某个点上。尽管如此,在大多数情况下,批量复制内存复制范围内的数据会更有效率。

In Table VI, we report the time overhead of three copying methods running on our MSP430 board, bulk copying by DMA (DMA bulk), variable copying by DMA (DMA copy), and variable copying by CPU [CPU copy, implemented by memcpy()], for different data sizes. We provide results for copying 2 to 512B data. For variable copying, we set the variable size to be 2B, the size of an integer on MSP430.
Table VI 中,我们报告了在我们的 MSP430 板上运行的三种复制方法在不同数据大小下的时间开销,即通过 DMA 进行批量复制(DMA 批量复制)、通过 DMA 进行变量复制(DMA 复制)和通过 CPU 进行变量复制[CPU 复制,由 memcpy() 实现]。我们提供了复制 2 到 512B 数据的结果。对于变量复制,我们将变量大小设置为 2B,即 MSP430 上的整数大小。

TABLE VI Time Cost (in μs ) for Different State Copying Methods Under Different Data Sizes (in Bytes)
表 VI 不同数据大小(以字节为单位)下不同状态复制方法的时间成本(以 μs 为单位)
Table VI- 
Time Cost (in 
$\mu s$
) for Different State Copying Methods Under Different Data Sizes (in Bytes)

For variable copying, the time overhead is linearly proportionally to the copied data size, with DMA slightly faster than CPU. For bulk copying by DMA, the copy speed continues to increase from 2 to 512B. From 16–32B data size, bulk copying by DMA is one order of magnitude faster than variable-copying approaches. Seventy memory copy ranges exist in all benchmarks, 13 of them are sized 2–15B, 43 of them are sized 16–64B, and 14 of them are sized 65–284B. So in most cases, conducting bulk copying is more efficient. The results explain why we choose bulk copying in LATICS even if extra variables may be included in a memory copy range. In general, large programs with large state copying size will benefit more from bulk copying.
对于可变复制,时间开销与复制的数据大小成线性比例,DMA 略快于 CPU。对于 DMA 的批量复制,复制速度从 2B 到 512B 持续增加。在 16-32B 数据大小范围内,DMA 的批量复制速度比可变复制方法快一个数量级。所有基准测试中存在 70 个内存复制范围,其中 13 个为 2-15B 大小,43 个为 16-64B 大小,14 个为 65-284B 大小。因此,在大多数情况下,批量复制的效率更高。这些结果解释了为什么我们在 LATICS 中选择批量复制,即使内存复制范围中可能包含额外的变量。一般来说,状态复制量大的大型程序会从批量复制中获益更多。

The actual memory layout of the state variables may impact the size of the memory copy range and thus the state copying overhead. In general, there does not exist an optimal layout in reality, as the efficiency of a memory layout is related to the actual occurrence of leading tasks which is further affected by the actual power traces. We leave the systematic investigation in memory layout optimization for future work.
状态变量的实际内存布局可能会影响内存复制范围的大小,从而影响状态复制开销。一般来说,现实中并不存在最佳布局,因为内存布局的效率与主导任务的实际发生率有关,而主导任务的发生率又受到实际功率轨迹的影响。我们将在未来的工作中对内存布局优化进行系统研究。

SECTION VI. 第 VI 节.

Related Work 相关工作

Intermittent Computing: Intermittent computing systems are powered by energy harvested from their working environment [1]–​[4]. As ambient energy sources are highly unstable, a system may experience frequent power failures and has to execute intermittently. The general requirements are to ensure program progress and logical correctness in the presence of power failures [23], which are achieved by decomposing a program into small atomic execution blocks and frequently saving system states to NVM.
间歇式计算:间歇式计算系统的能源来自其工作环境 [1] - [4] 。由于环境能源极不稳定,系统可能会频繁断电,因此必须间歇执行。一般要求是在断电情况下确保程序进度和逻辑正确性 [23] ,具体做法是将程序分解成小的原子执行块,并经常将系统状态保存到 NVM 中。

Computing Paradigms: Checkpointing is adopted since early research to decompose a program into atomic blocks and conduct state saving [5]–​[15]. When power fails, the system rolls back to the latest checkpoint after energy is replenished. The checkpoint is well known for its large state saving overhead: execution context including all or part of the main memory and the register files will be saved at a checkpoint for future roll back. Task-based intermittent computing systems [16]–​[20] were proposed as an alternative to overcome the problems of the checkpoint-based paradigm. Programmers write programs with explicit tasks, and system states are saved at task boundaries to survive power failures. The task-based paradigm provides programmers more flexibility to decompose programs and generally saves less memory contents at task boundaries. LATICS pushed the state-of-the-art task-based intermittent computing with a significant reduction in state saving overhead.
计算范式:检查点(Checkpointing)从早期研究开始就被采用,将程序分解成原子块,并进行状态保存 [5] - [15] 。当电源失效时,系统会在能量补充后回滚到最新的检查点。众所周知,检查点具有较大的状态保存开销:包括全部或部分主内存和寄存器文件在内的执行上下文将保存在检查点,以便将来回滚。基于任务的间歇计算系统 [16] - [20] 被作为一种替代方案提出,以克服基于检查点模式的问题。程序员通过明确的任务编写程序,系统状态保存在任务边界,以便在断电后继续运行。基于任务的范例为程序员提供了更多分解程序的灵活性,通常在任务边界保存的内存内容较少。LATICS 推动了最先进的基于任务的间歇计算,显著减少了状态保存开销。

Dynamic Task Coalescing: As tasks are conservatively designed to be very small to ensure system progress, coalescing multiple tasks will reduce state saving overhead as long as the energy is sufficient. A recently proposed system, Coala [22], coalesces tasks at runtime to reduce unnecessary state saving. However, coalescing tasks introduces new WAR variables, and precisely identifying them is a major challenge. To avoid this problem, Coala implements an inefficient paging system which copies used data pages on-demand to VM, and pessimistically saves all modified data pages to NVM, which impacts the overall execution efficiency. LATICS improves the state-of-the-art by two main contributions: 1) a novel breaking point mechanism to reduce unnecessary state saving for TEB s and 2) a static analysis to compute the PWS for each leading task of a TEB which further reduces the amount of state saving.
动态任务凝聚:由于任务被保守地设计得非常小,以确保系统的进度,因此只要能量充足,将多个任务凝聚在一起就能减少状态保存开销。最近提出的一个系统 Coala [22] 在运行时凝聚任务,以减少不必要的状态保存。然而,凝聚任务会引入新的 WAR 变量,而精确识别这些变量是一大挑战。为了避免这个问题,Coala 实现了一个低效的分页系统,按需将使用过的数据页复制到虚拟机,并悲观地将所有修改过的数据页保存到 NVM,这影响了整体执行效率。LATICS 通过两个主要贡献改进了最新技术:1)一种新颖的断点机制,可减少 TEB s 的不必要状态保存;2)一种静态分析方法,可计算 TEB 的每个主导任务的 PWS ,从而进一步减少状态保存量。

Peripherals and Concurrency: Cyber–physical systems typically interact with their environment through peripherals. For such systems, maintaining consistent states across power failures is complex. Without persistent peripheral states, a program may get stuck in an infinite loop of state polling due to power failures [13]. Reoperation of peripherals across power failures may produce inconsistent system states [24]. Timekeeping is also a problem [13]: without knowledge on the power downtime, timely tasks may behave incorrectly. Interrupts may also violate the atomicity of tasks, and techniques [20] were proposed to either delay the interrupt service routine or discard interrupted tasks to ensure atomicity.
外围设备和并发性:网络物理系统通常通过外设与环境互动。对于这类系统来说,在断电时保持状态一致非常复杂。如果没有持久的外设状态,程序可能会因断电而陷入状态轮询的无限循环 [13] 。断电后重新操作外设可能会产生不一致的系统状态 [24] 。计时也是一个问题 [13] :如果不知道断电时间,及时任务的行为可能不正确。中断也可能会破坏任务的原子性,因此有人提出了一些技术 [20] 来延迟中断服务例程或丢弃被中断的任务,以确保任务的原子性。

In intermittent systems with concurrent task-based applications, new correctness challenges emerge. First, synchronous communication among concurrent tasks may violate their atomicity. In InK [18], concurrent applications communicate asynchronously and switching between applications is only allowed at task boundaries. Second, the correctness of result commits (to NVM) by multiple concurrent tasks depends on the commit sequence. In [25], a method was proposed to track conflicts among concurrent tasks during commit, the price of which is considerable runtime overhead and abortion of tasks due to conflict data accesses.
在基于并发任务应用的间歇系统中,出现了新的正确性挑战。首先,并发任务间的同步通信可能会违反它们的原子性。在 InK [18] 中,并发应用以异步方式通信,应用之间的切换只允许在任务边界进行。其次,多个并发任务将结果提交(到 NVM)的正确性取决于提交顺序。在 [25] 中,提出了一种在提交过程中跟踪并发任务间冲突的方法,其代价是相当大的运行时开销和因冲突数据访问导致的任务流产。

SECTION VII. 第 VII 节.

Conclusion 结论

This article presents LATICS, a low-overhead adaptive task-based intermittent computing system. The major challenge addressed by LATICS is how to reduce the amount of saved states in the presence of dynamic decision of how many tasks are grouped together for atomic execution. We observed that it is not always beneficial to skip state saving at task boundaries as it may sometime cause a higher amount of saved states at other places and thus leads to higher overall runtime overhead. LATICS is implemented on top of the state-of-the-art task-based intermittent computing system InK [18] and evaluated on hardware based on MSP430. The experimental results show that LATICS significantly reduces state saving overhead and improves execution efficiency compared to the state of the art.
本文介绍了基于任务的低开销自适应间歇计算系统 LATICS。LATICS 面临的主要挑战是如何在动态决定将多少任务组合在一起进行原子执行的情况下减少保存状态的数量。我们注意到,在任务边界跳过状态保存并不总是有益的,因为这有时会导致其他地方保存的状态量增加,从而导致整体运行时开销增加。LATICS 是在最先进的基于任务的间歇计算系统 InK [18] 上实现的,并在基于 MSP430 的硬件上进行了评估。实验结果表明,与最新技术相比,LATICS 显著减少了状态保存开销,提高了执行效率。

References

1.
X. Bao, H. Liang, Y. Liu and F. Zhang, "A stochastic game approach for collaborative beamforming in SDN-based energy harvesting wireless sensor networks", IEEE Internet Things J., vol. 6, no. 6, pp. 9583-9595, Dec. 2019.
2.
M.-L. Ku, W. Li, Y. Chen and K. J. R. Liu, "Advances in energy harvesting communications: Past present and future challenges", IEEE Commun. Surveys Tuts., vol. 18, no. 2, pp. 1384-1412, 2nd Quart. 2016.
3.
R. V. Prasad, S. Devasenapathy, V. S. Rao and J. Vazifehdan, "Reincarnation in the ambiance: Devices and networks with energy harvesting", IEEE Commun. Surveys Tuts., vol. 16, no. 1, pp. 195-213, 1st Quart. 2014.
4.
S. N. Patel and J. R. Smith, "Powering pervasive computing systems", IEEE Pervasive Comput., vol. 16, no. 3, pp. 32-38, Jul. 2017.
5.
B. Ransford, J. Sorber and K. Fu, "Mementos: System support for long-running computation on RFID-scale devices", Proc. 16th ASPLOS, pp. 159-170, 2011.
6.
B. Lucia and B. Ransford, "A simpler safer programming and execution model for intermittent systems", Proc. 36th PLDI, pp. 575-585, 2015.
7.
J. van der Woude and M. Hicks, "Intermittent computation without hardware support or programmer intervention", Proc. 12th OSDI, pp. 17-32, 2016.
8.
H. Jayakumar, A. Raha and V. Raghunathan, "QUICKRECALL: A low overhead HW/SW approach for enabling computations across power cycles in transiently powered computers", Proc. 27th VLSI, pp. 330-335, 2014.
9.
D. Balsamo et al., "Hibernus++: A self-calibrating and adaptive system for transiently-powered embedded devices", IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 35, no. 12, pp. 1968-1980, Mar. 2016.
10.
M. Hicks, "Clank: Architectural support for intermittent computation", Proc. 44th ISCA, pp. 228-240, 2017.
11.
N. A. Bhatti and L. Mottola, "HarvOS: Efficient code instrumentation for transiently-powered embedded sensing", Proc. 16th IPSN, pp. 209-219, 2017.
12.
K. Maeng and B. Lucia, "Adaptive dynamic checkpointing for safe efficient intermittent computing", Proc. 13th OSDI, pp. 129-144, 2018.
13.
M. Kiwan and L. Brandon, "Supporting peripherals in intermittent systems with just-in-time checkpoints", Proc. 40th PLDI, pp. 1101-1116, 2019.
14.
J. Choi, H. Joe, Y. Kim and C. Jung, "Achieving stagnation-free intermittent computation with boundary-free adaptive execution", Proc. 25th RTAS, pp. 331-344, 2019.
15.
V. Kortbeek, K. S. Yildirim, A. Bakar, J. Sorber, J. Hester and P. Pawelczak, "Time-sensitive intermittent computing meets legacy software", Proc. 25th ASPLOS, pp. 85-99, 2020.
16.
A. Colin and B. Lucia, "Chain: Tasks and channels for reliable intermittent programs", Proc. OOPSLA, pp. 514-530, 2016.
17.
K. Maeng, A. Colin and B. Lucia, "Alpaca: Intermittent Execution without Checkpoints", Proc. OOPSLA, pp. 1-30, 2017.
18.
K. S. Yildirim, A. Y. Majid, D. Patoukas, K. Schaper, P. Pawelczak and J. D. Hester, "InK: Reactive kernel for tiny batteryless sensors", Proc. 16th SenSys, pp. 41-53, 2018.
19.
J. D. Hester, K. M. Storer and J. Sorber, "Timely execution on intermittently powered batteryless sensors", Proc. 15th SenSys, pp. 1-13, 2017.
20.
E. Ruppel and B. Lucia, "Transactional concurrency control for intermittent energy-harvesting computing systems", Proc. 40th PLDI, pp. 1085-1100, 2019.
21.
MSP430 Ultra-Low-Power Sensing and Measurement MCUs, Apr. 2020, [online] Available: http://www.ti.com/microcontrollers/msp430-ultra-low-power-mcus/overview.html.
22.
A. Y. Majid et al., "Dynamic task-based intermittent execution for energy-harvesting devices", ACM Trans. Sensor Netw., vol. 16, no. 1, pp. 1-24, 2020.
23.
B. Lucia, V. Balaji, A. Colin, K. Maeng and E. Ruppel, "Intermittent computing: Challenges and opportunities", Proc. 2nd Summit Adv. Program. Lang. (SNAPL), pp. 1-14, 2017.
24.
M. Surbatovich, L. Jia and B. Lucia, "I/O dependent idempotence bugs in intermittent systems", Proc. OOPSLA, pp. 183, 2019.
25.
W.-M. Chen, P.-C. Hsiu and T.-W. Kuo, "Enabling failure-resilient intermittently-powered systems without runtime checkpointing", Proc. 56th DAC, pp. 104, 2019.