Introduction 导言
Energy-harvesting systems are powered by energy collected from their environment, and thus can sustain for a long time not restricted by battery life [1]–[4]. As the energy output of harvesters is typically tiny and unstable, the device may experience frequent power failures. Therefore, energy-harvesting systems must be able to timely store program states and keep making progress in the presence of frequent power failures.
能量收集系统的动力来自从环境中收集的能量,因此不受电池寿命的限制,可以维持很长时间 [1] - [4] 。由于能量收集器的能量输出通常很小且不稳定,设备可能会经常出现断电现象。因此,能量收集系统必须能够及时存储程序状态,并在频繁断电的情况下继续前进。
To this end, checkpointing, a widely used technique in fault-tolerant computing, has been applied to energy-harvesting systems [5]–[15]. When program execution reaches a checkpoint, system states are saved to nonvolatile memory (NVM) for future restoration. However, checkpointing is generally expensive, as all volatile states, including processor context and data on volatile memory (VM), have to be saved to NVM.
为此,容错计算中广泛使用的检查点技术已被应用于能量收集系统 [5] - [15] 。当程序执行到检查点时,系统状态会被保存到非易失性存储器(NVM)中,以便将来恢复。然而,检查点的成本通常很高,因为所有易失性状态,包括处理器上下文和易失性内存(VM)上的数据,都必须保存到非易失性内存中。
The task-based paradigm [16]–[20] [16], [17], [18], [19], [20] offers a lightweight alternative to checkpointing. Take the state-of-the-art task-based intermittent computing systems InK [18] for example, an application is programmed as a collection of atomic execution blocks called tasks. Each task is written as a function by the programmer, and the control flow between tasks is also explicitly defined by the programmer. System states are saved at task boundaries, and when recovering from a power failure occurred during the execution of a task, the system resumes execution from the beginning of that task. At task boundaries, the system only needs to store the global data, i.e., the variables whose lifetimes cross task boundaries and shared by multiple tasks, but not the local variables accessed inside each task. Essentially, task boundaries play a similar role as checkpoints. However, as the programmer has full control to decide how to group related functionalities and variables into a task, typically, much fewer data need to be saved at task boundaries than the standard checkpointing approach.
基于任务的范例 [16] - [20] [16] , [17] , [18] , [19] , [20] 为检查点提供了一种轻量级的替代方案。以最先进的基于任务的间歇计算系统 InK [18] 为例,应用程序被编程为一系列称为任务的原子执行块。每个任务都由程序员编写为一个函数,任务之间的控制流也由程序员明确定义。系统状态保存在任务边界,当从任务执行过程中发生的断电中恢复时,系统会从该任务的起点重新开始执行。在任务边界,系统只需存储全局数据,即生命周期跨越任务边界并由多个任务共享的变量,而无需存储每个任务内部访问的局部变量。从本质上讲,任务边界的作用与检查点类似。不过,由于程序员完全可以决定如何将相关功能和变量分组到任务中,因此任务边界需要保存的数据通常比标准检查点方法要少得多。
Many modern ultralow-power processors are equipped with NVM, such as FRAM, which has good enough read/write speed and lifetime to serve as the main memory [21]. Therefore, the system can allocate data on NVM and programs can directly address them, which is more efficient than copying them back and forth between VM and NVM. However, directly operating data on NVM may cause memory inconsistency due to the so-called write-after-read (WAR) problem [6] in the presence of power failure (detailed in Section II). To solve the problem, the system should maintain a backup version for the WAR variables. When recovering from a power failure occurred in some task, the system restores the WAR variables from the backup version to main memory, and then correctly resumes execution from the beginning of the aborted task.
许多现代超低功耗处理器都配备了 NVM,如 FRAM,它具有足够好的读写速度和寿命,可以作为主存储器 [21] 。因此,系统可以在 NVM 上分配数据,程序可以直接寻址这些数据,这比在虚拟机和 NVM 之间来回复制数据更有效。不过,直接在 NVM 上操作数据可能会在断电情况下因所谓的 "写入后读取(WAR)"问题 [6] 而导致内存不一致(详见 Section II )。为了解决这个问题,系统应为 WAR 变量保留一个备份版本。当从某个任务的断电中恢复时,系统会将备份版本中的 WAR 变量恢复到主内存中,然后正确地从中止的任务开始重新执行。
In the task-based paradigm, task granularity plays an important role in system performance. In general, the larger size is the tasks, the higher amount of incomplete execution is aborted at power failures. In the extreme case, if a task is too large to be finished by the energy harvested until the next power failure, the system will trap in the reexecution of this task and never make any progress (the so-called “Sisyphus effect”). On the other hand, if tasks are too small, the execution flow will cross task boundaries and thus save program states frequently, which leads to high overhead. To address this dilemma, some systems adopt an adaptive execution approach, which can skip state saving at task boundaries under good energy conditions.
在基于任务的模式中,任务粒度对系统性能起着重要作用。一般来说,任务规模越大,断电时中止的未完成执行量就越高。在极端情况下,如果任务规模过大,在下一次断电之前所收集的能量无法完成,系统就会陷入重新执行该任务的困境,永远无法取得任何进展(即所谓的 "西西弗斯效应")。另一方面,如果任务太小,执行流程就会跨越任务边界,从而频繁保存程序状态,导致高开销。为了解决这一难题,一些系统采用了自适应执行方法,在良好的能耗条件下,可以跳过任务边界的状态保存。
Skipping state saving at task boundaries makes ensuring memory consistency a complex problem. On the one hand, the overhead is reduced as state saving is done less frequently; on the other hand, skipping state saving may introduce new WAR variables. To keep the memory consistent, newly introduced WAR variables have to be identified and saved, which may adversely increase state saving overhead. A related work, Coala [22], addressed the problem by a very conservative approach: copying used data pages on-demand to VM in case operating the variables in the pages may cause the WAR problem. During state saving, Coala pessimistically saves all modified data pages to NVM. As shown in our experiments (Section V), this approach makes program execution very inefficient.
在任务边界跳过状态保存会使确保内存一致性成为一个复杂的问题。一方面,由于状态保存的频率降低,开销减少;另一方面,跳过状态保存可能会引入新的 WAR 变量。为了保持内存一致性,必须识别并保存新引入的 WAR 变量,这可能会增加状态保存的开销。相关工作 Coala [22] 通过一种非常保守的方法解决了这个问题:在操作数据页中的变量可能导致 WAR 问题的情况下,按需将使用过的数据页复制到虚拟机中。在保存状态时,Coala 会悲观地将所有修改过的数据页保存到 NVM 中。正如我们的实验 ( Section V )所示,这种方法使得程序执行效率非常低。
In this work, we present LATICS, a low-overhead adaptive task-based intermittent computing system that saves much fewer states at task boundaries than existing approaches. It is generally believed that skipping state saving at task boundaries is always beneficial, so related systems try to skip as many as possible if the energy condition allows. However, our work discloses that skipping state saving at one task boundary may cause the system to save more data at other places, and thus leads to higher overall overhead. In light of this observation, we propose a novel task-based intermittent computing system that enforces mandatory state saving at certain task boundaries, regardless of the current energy conditions, and conducts analysis to decide the breaking points and the states to be saved to ensure correct execution. Experiments conducted on an MSP430 platform show that LATICS drastically reduces the amount of states to save at task transitions and improves execution efficiency, compared with existing solutions.
在这项工作中,我们提出了基于任务的低开销自适应间歇计算系统 LATICS,它在任务边界保存的状态比现有方法要少得多。一般认为,在任务边界跳过状态保存总是有益的,因此相关系统会在能量条件允许的情况下尽可能多地跳过状态保存。然而,我们的工作发现,在一个任务边界跳过状态保存可能会导致系统在其他地方保存更多数据,从而导致整体开销增加。有鉴于此,我们提出了一种新颖的基于任务的间歇计算系统,无论当前的能量条件如何,该系统都会在某些任务边界强制保存状态,并通过分析来决定断点和需要保存的状态,以确保正确执行。在 MSP430 平台上进行的实验表明,与现有解决方案相比,LATICS 大幅减少了任务转换时需要保存的状态数量,提高了执行效率。
Overview 概述
The main objective of LATICS is to reduce state saving overhead and improve the execution efficiency of adaptive task-based intermittent computing systems.
LATICS 的主要目标是减少状态保存开销,提高基于任务的自适应间歇计算系统的执行效率。
At runtime, LATICS will coalesce the execution of several consecutive tasks into a large execution block called transactional execution block (
在运行时,LATICS 会将多个连续任务的执行合并为一个大的执行块,称为事务执行块(
We begin by explaining the WAR problem [6]. Fig. 1(a) shows three sequentially executed tasks,
我们首先解释 WAR 问题 [6] 。 Fig. 1(a) 显示了三个顺序执行的任务,分别是
To avoid the WAR problem, a system should save all WAR variables at the beginning of each task. As shown in Fig. 1(b), if the value of
为避免 WAR 问题,系统应在每个任务开始时保存所有 WAR 变量。如 Fig. 1(b) 所示,如果在
Now, we consider an adaptive task-based intermittent computing system that only saves states at the beginning of a
现在,我们考虑一个基于任务的自适应间歇计算系统,该系统只在
Reduce State Saving by Breaking Point Insertion: Assume
通过插入断点减少状态保存:假设
The rest of this article details the two main components of LATICS: 1) a static analysis to decide the breaking points and compute a set of variables to save for each task if it leads a
本文的其余部分将详细介绍 LATICS 的两个主要组成部分:1)静态分析,用于决定断点,并计算一组变量,以便在每个任务导致
Analysis 分析
Saving states at
在
A. Rationale A.理由
To tightly identify a set of variables to save at the leading task of a
要在
To precisely identify the WAR variables in a
为了精确识别
In adaptive task-based intermittent computing systems, each
在基于任务的自适应间歇计算系统中,每个
Computing the States to Save at the Leading Task of a
计算在
Deciding Breaking Points for the Leading Task of a
决定
Consider an example in Fig. 2 with three tasks
考虑 Fig. 2 中有三个任务
In order to reduce state saving overhead for
为了减少
To summarize, to reduce state saving overhead, we propose to decide an individual set of breaking points for each leading task and also compute the variables to be saved at the leading task. The analysis is detailed in the following sections.
总之,为了减少状态保存开销,我们建议为每个主导任务确定一组单独的中断点,同时计算主导任务需要保存的变量。下文将详细分析。
B. Analysis Target and the Definition of Breaking Point
B.分析目标和突破点的定义
The analysis target is the WAR variables that are either local to a task or introduced by task coalescing (inserting breaking points may exclude WAR variables). A program is a directed graph, denoted by
分析目标是任务本地的或由任务凝聚引入的 WAR 变量(插入断点可能会排除 WAR 变量)。程序是一个有向图,用
The WAR variables in a task can be defined and identified over the
任务中的 WAR 变量可在任务的
Definition 1 (WAR Variable in a Task):
If a variable
定义 1(任务中的 WAR 变量):如果变量
Condition 1:
is reachable from the starting basic block ofBBR ;CFGTi
条件 1: 可以从BBR 的起始基本区块到达;CFGTi Condition 2: along one of the paths satisfying condition 1 there is no basic block that writes
;x
条件 2:在满足条件 1 的一条路径上,没有写入 的基本区块;x Condition 3: there exists a basic block
that writesBBW andx is reachable fromBBW inBBR .CFGTi
条件 3:存在一个写入 的基本区块x ,并且BBW 可以从BBW 到达BBR 。CFGTi
By the above definition, WAR variables in a task can be obtained by analyzing the reachability between the basic blocks in
根据上述定义,任务中的 WAR 变量可通过分析
As discussed above, given a task
如上文所述,给定一个任务
Definition 2 (Possible WAR Set):
If variable
定义 2(可能的 WAR 集):如果变量
Condition 1:
in reachable from the starting basic block ofBBR inTi ;G
条件 1:从 的起始基本区块Ti 可以到达G ;BBR Condition 2: along one of the paths satisfying condition 1 there is no basic block of any task that writes
;x
条件 2:在满足条件 1 的一条路径上,没有任何任务的基本区块写入 ;x Condition 3: there exists a basic block
in some task that writesBBW andx is reachable fromBBW onBBR .G
条件 3:在某个任务中存在一个基本区块 ,该区块会写入BBW ,且x 在BBW 上可从G 到达。BBR
Note that
注意
Definition 3 (Breaking Point)1:
A breaking point is associated with an edge on
定义 3(断点) 1 :断点与
An important feature is, inserting breaking points does not introduce new WAR variables. First, the WAR variables in the tasks before the breaking point are not changed and should be saved at the beginning of the
一个重要的特点是,插入中断点不会引入新的 WAR 变量。首先,断点前任务中的 WAR 变量不会改变,应保存在
C. Computing PWS
and Inserting Breaking Points
C.计算 PWS
和插入断点
Now, we present an algorithm to compute the
现在,我们提出一种算法来计算每个领先任务的
Program Transformation: If a program graph
程序转换:如果程序图
An example to explain
举例说明
As for the structure of task-based programs, there can be doubts that “goto” style control flows may be produced as programmers are allowed to freely specify the control flow among tasks by task transition directives. But this situation will rarely occur. A program is a set of instructions to implement the functionality of an application, regardless of whether the program progresses continuously or intermittently. The task is only a construct allowing a programmer to specify places where system states are saved to survive power failures. Thus, a task-based program should be naturally programmed by inserting task boundaries into a correctly implemented ordinary program. As most programs are well structured, the task-based version obtained as above will remain well structured.
至于基于任务的程序结构,由于允许程序员通过任务转换指令自由指定任务间的控制流,因此可能会产生 "goto "式的控制流。但这种情况很少发生。无论程序是连续进行还是间歇进行,程序都是实现应用程序功能的一组指令。任务只是一种结构,允许程序员指定保存系统状态的位置,以便在断电后继续运行。因此,在正确执行的普通程序中插入任务边界,就能自然地编制出基于任务的程序。由于大多数程序都具有良好的结构,因此上述基于任务的版本也将保持良好的结构。
Definition 4 (Image and Image Set):
In
定义 4(映像和映像集):在
For example,
例如, Fig. 3 中的
From now on, we compute the
从现在起,我们将计算每个主导任务的
Definition 5 (Reachability Over Image Set):
In
定义 5(图像集上的可达性):在
Lemma 1 (Reachability Maintenance):
If node
定理 1(可达性维护):如果节点
Proof:
We prove this lemma by distinguishing two cases.
证明:我们通过区分两种情况来证明这个 Lemma。
Case 1:
If
andA are in the same loop body inB , there are two subcases: 1)G is reachable fromB by a path that does not include the back edge of the loop body, this path belongs to the loop body and is always retained inA according to our unroll method and 2)G′ is reachable fromB by a path that includes the back edge, then there must be a path fromA (at level 1) toA (at level 2) inB′ . So, we can concludeG′ is reachable fromISB inISA .G′
情况 1:如果 和A 在B 的同一个循环体中,则有两种子情况:1)根据我们的展开方法,这条路径属于循环体,并且总是保留在G 中;2)G′ 可以通过一条包含后边缘的路径从B 到达,那么在A 中一定有一条路径从G′ (第 1 层)到A (第 2 层)。因此,我们可以得出结论,在B′ 中,从G′ 可以到达ISA 。ISB Case 2:
If
andA are in different loop bodies inB , without loss of generality, we assumeG is in the outer loop andA is in the inner loop. There must be a pathB fromp1 to the entry nodeA ofNB ’s loop body and a pathB fromp2 toNB . Note thatB belongs to the outer loop andp1 belongs to the inner loop. Both paths are retained inp2 , soG′ is reachable fromISB . If otherwiseISA is in the inner loop andA is in the outer loop, thenB ’s exit node will be used to bridge the reachability fromA toA . The above property can be extended to nodes that span across more than two levels of loop structures, by repeating the above reasoning for adjacent loop levels.B
情况 2:如果 和A 在B 的不同循环体中,在不失一般性的前提下,我们假设G 在外层循环中,而A 在内层循环中。从B 到A 循环体的入口节点B 必须有一条路径NB ,从p1 到NB 必须有一条路径B 。注意p2 属于外循环,而p1 属于内循环。这两条路径都保留在p2 中,因此G′ 可以从ISB 到达。 如果ISA 在内层循环中,而A 在外层循环中,那么B 的出口节点将被用来桥接从A 到A 的可达性。通过对相邻循环层重复上述推理,上述特性可扩展到跨越两层以上循环结构的节点。B
Lemma 2 (Reachability Maintenance With Node Removal):
By removing node
定理 2(删除节点后的可达性维护):通过从
Proof:
If node
证明:如果节点
Theorem 1 (PWS
Maintenance):
Any variable which is in
定理 1 (
Proof:
Recall that if a variable
证明:回想一下,如果一个变量
PWS Computation and Breaking Point Insertion: Before introducing the algorithm to compute
PWS 计算和断点插入:在介绍计算
Definition 6 (Read-Through Variable and Write-Break Variable):
A variable
定义 6(直读变量和断写变量):如果存在一条从
Computing PWS: Algorithm 1 gives the proposed analysis which takes the graph
计算 PWS: Algorithm 1 给出了建议的分析,该分析将图
Algorithm 1 Compute PWS
and Insert Breaking Points for a Leading Task
算法 1 计算 PWS
并插入领先任务的突破点
Program graph
程序图
for Each direct preceding node
对于
Ready queue
就绪队列
while
for each incoming edge
对
let
让
if Range(
Insert a dummy task
在
else 不然
mark
for each node
对于
if all its direct preceding nodes are VISITED then
如果其所有直接前节点都被 VISITED,那么
Enqueue(
In the evaluation of each task
在对每个任务
: If a variable is already a WAR variable in(WART−WTE) , then it is potentially a WAR variable in aT led byTEB . If the variable is also a write-break variable in the task precedingTi , the variable is no longer a WAR variable in theT , which has been explained in Section III-A.2TEB
:如果一个变量在(WART−WTE) 中已经是一个 WAR 变量,那么它在T 引导的Ti 中就有可能是一个 WAR 变量。 如果该变量在TEB 之前的任务中也是一个断写变量,那么该变量在T 中就不再是一个 WAR 变量,这在 Section III-A 中已经解释过了。 2TEB : If there is a read to a variable in the preceding node ofRTE∩WINT and a write to the variable inT , a new WAR variable resulted by coalescing tasks is identified.T
:如果在RTE∩WINT 的前一个节点中存在对变量的读取,而在T 中存在对该变量的写入,则会识别出一个由合并任务产生的新 WAR 变量。T
The above evaluation is performed for all the incoming edges of
上述评估针对
Inserting Breaking Points: Once the
插入断点:一旦
Note that when a breaking point is decided, it is inserted on all edges in the image set of
请注意,当断点决定后,它将被插入
The breaking point operation may bring a new issue. Consider an edge
断点操作可能会带来新的问题。考虑
Not considering a breaking point for an edge
不考虑边
Lemma 3:
If a variable
定理 3:如果变量
Proof:
By the definition of
证明:根据
Then, we discuss the cost evaluation. Whether to insert a breaking point is decided by comparing the benefit of reducing state saving and the cost of conducting a state saving operation.
然后,我们讨论成本评估。通过比较减少状态保存的收益和进行状态保存操作的成本,来决定是否插入断点。
The benefit is reduced memory copying by the breaking point, i.e., Range(
这样做的好处是减少了断点处的内存复制,即 Range(
In our implementation, we add a dummy task
在我们的实现过程中,我们在边缘添加了一个虚拟任务
Updating Read and Write Sets: After the
更新读写集:计算
The update of
The algorithm iterates until all nodes reachable from
该算法不断迭代,直到探索完
A Running Example: To illustrate how Algorithm 1 works, we use an example in Fig. 3 to show the computation of
运行示例:为了说明 Algorithm 1 的工作原理,我们用 Fig. 3 中的一个例子来说明
Step 1: Dequeue
第 1 步: 删除
In this step, there is a path from
在这一步中,有一条从
Step 2: Dequeue
第 2 步: 删除
Similar to step 1,
与步骤 1 类似,更新
Step 3: Dequeue
步骤 3:删除
Then, evaluate the edge from
然后,评估从
Since Range(
由于 Range(
After all the nodes in
在探索了
Runtime System 运行系统
We build a runtime system to implement the core functionalities of the proposed approach. The runtime system makes dynamic decisions on task coalescing, manages data buffers, conducts state saving and restoration, and most importantly enforces mandatory state saving at breaking points. We will first give the workflow of LATICS’ runtime system and then detail its main components.
我们建立了一个运行时系统来实现所建议方法的核心功能。运行时系统会对任务凝聚做出动态决策,管理数据缓冲区,进行状态保存和恢复,最重要的是在中断点强制保存状态。我们将首先介绍 LATICS 运行时系统的工作流程,然后详细介绍其主要组成部分。
A. Main Work Flow A.主要工作流程
The behavior of the runtime system is specified by Algorithm 2, and we use the example in Fig. 4 to explain how it works. A possible execution trace may be as follows.
运行时系统的行为由 Algorithm 2 规定,我们用 Fig. 4 中的例子来解释它是如何工作的。可能的执行轨迹如下
The program starts and the runtime system decides a
led by taskTEB and containing three tasks.T1
程序启动后,运行系统判定由任务 领导的T1 包含三个任务。TEB System states are saved by copying the data in memory copy range [0, 5] covering
from the working buffer to the backup buffer.PWST1
保存系统状态的方法是,将内存复制范围 [0, 5] 内涵盖 的数据从工作缓冲区复制到备份缓冲区。PWST1 ,T1 , andT2 execute untilT3 is finished.T3
执行 、T1 和T2 直到T3 结束。T3 As the current
is done, the runtime system creates a newTEB at this point and makes a decision to let theTEB coalesce six tasks.TEB
由于当前的 已完成,运行时系统会在此时创建一个新的TEB 并决定让TEB 凝聚六个任务。TEB As
is the leading task of the newT5 , data in the memory copy range [2, 5] coveringTEB are copied from the working buffer to the backup buffer.PWST5
由于 是新的T5 的领导任务,因此内存拷贝范围 [2, 5] 中涵盖TEB 的数据会从工作缓冲区拷贝到备份缓冲区。PWST5 A power failure occurs in the execution of
(afterT2 ).T5
在执行 时(T2 之后)发生断电。T5 After recovery, the runtime system identifies the power failure, therefore, it restores the system states at the beginning of the aborted
by copying data in the memory copy range [2, 5] from the backup buffer to the working buffer.TEB
恢复后,运行时系统会识别出电源故障,因此会通过将内存复制范围 [2, 5] 内的数据从备份缓冲区复制到工作缓冲区来恢复中断的 开始时的系统状态。TEB The runtime system creates a new
, theTEB coalesces four tasks, and execution resumes fromTEB . The newT5 continues and the program takes the right branch.TEB
运行时系统会创建一个新的 ,TEB 聚合四个任务,然后从TEB 开始继续执行。新的T5 继续执行,程序进入右分支。TEB After
is finished, the runtime system encounters a breaking point on the incoming edge toT2 . Although the currentT4 is not finished yet, a state saving is forced at this point, saving the data in memory copy range [2, 5] coveringTEB to the backup buffer. As a result, if a power failure occurs later, the system will resume execution fromPWST4 instead of the original leading taskT4 .T5
在 完成后,运行时系统会在T2 的入边上遇到一个断点。虽然当前的T4 尚未完成,但此时会强制保存状态,将覆盖TEB 的内存拷贝范围 [2, 5] 中的数据保存到备份缓冲区。因此,如果稍后发生断电,系统将从PWST4 而不是原来的主导任务T4 开始恢复执行。T5 is finished, and a newT5 starting fromTEB will be decided by the runtime system.T2
结束,新的T5 将从TEB 开始,由运行系统决定。T2 Execution continues until the program is completed.
程序将继续执行,直至结束。
Algorithm 2 Runtime System
算法 2 运行时系统
Let
让
Let
让
if recovered from power failure then
如果从断电中恢复,则
RESTORE(
while not end of the program do
如果程序未结束,则执行
if no power failure then
如果没有断电,则
SAVE(
while
当
if
如果
SAVE(
EXECUTE(
UPDATE_HISTORY()
B. Core Components B.核心组成部分
LATICS allocates local variables that are only accessed in a single task on SRAM and allocates the variables shared among tasks on FRAM. LATICS maintains two buffers for the shared data: 1) a working buffer and 2) a backup buffer. The working buffer serves as part of the main memory and stores all the shared data. The backup buffer stores a backup copy for the set union of the
LATICS 在 SRAM 上分配仅在单个任务中访问的局部变量,在 FRAM 上分配任务之间共享的变量。LATICS 为共享数据维护两个缓冲区:1) 工作缓冲区和 2) 备份缓冲区。工作缓冲区作为主内存的一部分,存储所有共享数据。备份缓冲区存储所有主导任务的
The SAVE() function conducts state saving. It is invoked when the program starts for the first time, and when a
SAVE() 函数用于保存状态。当程序首次启动时,以及当
The RESTORE() function restores states and is only invoked after the system recovers from a power failure. States saved at the beginning of the aborted
RESTORE() 函数用于恢复状态,只有在系统从断电中恢复后才会调用。在中断的
DECISION() is the function to realize adaptive execution by deciding the
DECISION() 是通过决定
Enforcing the Breaking Point Mechanism: During execution, if the runtime system identifies a breaking point before a task
执行中断点机制:在执行过程中,如果运行时系统在任务
Note that the state save/restore mechanism in LATICS differs from the state-of-the-art task-based system InK [18]. In both LATICS and InK, task execution operates on the working buffer. After a task is finished, InK changes the role of its working buffer and backup buffer by a pointer swap, and right before starting the next task, the runtime system copies all data in the new backup buffer to the new working buffer for memory consistency. While LATICS maintains only a subset of the data (the set union of the
需要注意的是,LATICS 的状态保存/恢复机制不同于最先进的基于任务的系统 InK [18] 。 在 LATICS 和 InK 中,任务执行都是在工作缓冲区上进行的。任务完成后,InK 会通过指针交换改变工作缓冲区和备份缓冲区的角色,并在启动下一个任务前,运行时系统会将新备份缓冲区中的所有数据复制到新工作缓冲区,以保持内存一致性。而 LATICS 只在备份缓冲区中保留数据的一个子集(所有前导任务的
Experiments and Evaluation
实验与评估
A. Experimental Setup A.实验装置
LATICS provides a solution for efficient adaptive execution of task-based intermittent computing systems. Currently, LATICS is implemented based on the state-of-the-art task-based system InK [18]. In principle, it can be applied to other task-based systems as well. The LATICS runtime system and the application program are compiled into the binary executable using the TI v18.12.4 compiler by TI Code Composer Studio.
LATICS 为基于任务的间歇计算系统的高效自适应执行提供了解决方案。目前,LATICS 是基于最先进的基于任务的系统 InK [18] 实现的。 原则上,它也可应用于其他基于任务的系统。LATICS 运行时系统和应用程序由 TI Code Composer Studio 使用 TI v18.12.4 编译器编译成二进制可执行文件。
We deploy LATICS on TI’s MSP430FR5994 launchpad equipped with a microcontroller, 8KB VM (SRAM) and 256KB NVM (FRAM) all clocked at their highest speed. We allocate local variables of each task on SRAM, and the data that are shared by tasks on FRAM. Bulk-copying between memories uses DMA. The hardware board is powered by a programmable power supplier by which we can generate different power traces to evaluate LATICS under different power conditions.
我们在 TI 的 MSP430FR5994 发射板上部署了 LATICS,该发射板配备了微控制器、8KB VM(SRAM)和 256KB NVM(FRAM),均以最高速度运行。我们在 SRAM 上分配每个任务的局部变量,在 FRAM 上分配任务共享的数据。存储器之间的批量复制使用 DMA。硬件电路板由可编程电源供应器供电,通过它我们可以生成不同的功率轨迹,以评估不同功率条件下的 LATICS。
LATICS is compared against two state-of-the-art task-based intermittent computing systems: 1) InK [18] and 2) Coala [22]. InK is a task-based system that saves states at all task boundaries. A working buffer and a backup buffer are implemented to manage the data allocated on NVM. LATICS differs with InK, in that LATICS is an adaptive task-based intermittent computing system which may skip state saving when the power supply is sufficient, and more importantly, with the support of static analysis, LATICS saves only the
LATICS 与两个最先进的基于任务的间歇计算系统进行了比较:InK 是一个基于任务的系统,可在所有任务边界保存状态。系统采用工作缓冲区和备份缓冲区来管理分配到 NVM 上的数据。LATICS 与 InK 的不同之处在于,LATICS 是一种基于任务的自适应间歇计算系统,当电力供应充足时,它可以跳过状态保存,更重要的是,在静态分析的支持下,LATICS 只保存任务的
We use eight benchmark programs adopted in InK and Coala to evaluate system performance. The benchmarks are cold-chain equipment monitoring (CEM), cuckoo filter (CK), cyclic redundancy check (CRC), bitcount (BC), Dijkstra shortest path algorithm (DIJ), selection sort algorithm (SRT), RSA encryption (RSA), and activity recognition (AR).
我们使用 InK 和 Coala 中采用的八个基准程序来评估系统性能。这些基准程序包括:冷链设备监控 (CEM)、布谷鸟过滤器 (CK)、循环冗余校验 (CRC)、比特计数 (BC)、Dijkstra 最短路径算法 (DIJ)、选择排序算法 (SRT)、RSA 加密 (RSA) 和活动识别 (AR)。
In the benchmarks, the read/write to the global variables shared among tasks is explicitly programmed by two macros, __GET(
在基准测试中,任务间共享全局变量的读/写是通过两个宏__GET(
B. Evaluation Methods B.评估方法
LATICS’ performance is evaluated and compared with InK and Coala with the above benchmarks. We run each benchmark program 20 times and report the averaged execution time for a single run. The total execution time for each benchmark is recorded from the start to the end of its execution and is further divided into state saving time and task execution time. The state saving time is the time spent on state saving at the beginning of
我们用上述基准对 LATICS 的性能进行了评估,并与 InK 和 Coala 进行了比较。我们将每个基准程序运行 20 次,并报告单次运行的平均执行时间。每个基准程序从开始执行到结束的总执行时间被记录下来,并进一步分为状态保存时间和任务执行时间。状态保存时间是指
For task coalescing, we adopt Coala’s weighted energy-guided coalescing strategy for a fair comparison. The strategy essentially collects the execution history to obtain an energy capacity indicating how many tasks can be coalesced within the energy capacity. Once a
在任务聚合方面,我们采用 Coala 的加权能量引导聚合策略进行公平比较。该策略主要是通过收集执行历史记录来获得一个能量容量,表明在能量容量范围内可以凝聚多少个任务。一旦
To test the compared systems under different power supplies, we conduct experiments with periodic power traces in which power failure occurs periodically during system execution with a given period called power cycle. Table II reports the results under periodic power traces, where the second column is the power cycle. Experiments are conducted for eight power cycle configurations
为了测试比较系统在不同电源下的运行情况,我们使用周期性电源轨迹进行了实验,即在系统执行过程中定期发生电源故障,给定周期称为电源周期。 Table II 报告了周期性电源跟踪下的结果,其中第二列是电源周期。实验针对
表 II 不同功率周期的周期性功率跟踪下的执行时间结果
Execution time results under random power traces (normalized to InK’s total execution time).
随机功率轨迹下的执行时间结果(与 InK 的总执行时间进行归一化)。
We also further discuss: 1) the effectiveness of our proposed approach by comparing it with an exhaustive search method; 2) the impact of
我们还进一步讨论了1) 将我们提出的方法与穷举搜索方法进行比较,以确定其有效性;2)
C. Empirical Results and Evaluation
C.实证结果与评估
For all experiments under all power traces, LATICS outperforms InK and Coala with respect to the state saving time and the total execution time, which shows that LATICS has low state saving overhead and high execution efficiency. We now give a detailed evaluation of the results.
在所有功率跟踪下的所有实验中,LATICS 的状态保存时间和总执行时间都优于 InK 和 Coala,这表明 LATICS 的状态保存开销低,执行效率高。现在我们对结果进行详细评估。
First, we investigate how much state saving can be reduced by our proposed approach. Table I lists the results of InK and LATICS. The numbers for InK are the total state saving size for all tasks, i.e., a product of the size of system states and the number of tasks in the program. For LATICS, we sum up the sizes of the memory copy ranges covering the
首先,我们研究了我们提出的方法可以减少多少状态节省。 Table I 列出了 InK 和 LATICS 的结果。InK 的数字是所有任务的总状态保存大小,即系统状态大小与程序中任务数量的乘积。对于 LATICS,我们将用我们提出的方法计算出的覆盖所有任务
Second, the performance of LATICS under random power traces is evaluated, the results of which are shown in Fig. 5. The figure illustrates the total execution time for LATICS, InK, and Coala, normalized to that of InK. The gray part in each bar refers to the state saving time, and the slashed part refers to the task execution time. For all benchmarks, LATICS has the smallest total execution time. InK consumes a large portion of time on state saving as it pessimistically saves all system states at all task boundaries. Coala saves at the end of a
其次,评估了 LATICS 在随机功率轨迹下的性能,结果如 Fig. 5 所示。图中显示了 LATICS、InK 和 Coala 的总执行时间,并将 InK 的执行时间归一化。每个条形图中的灰色部分指的是状态保存时间,斜线部分指的是任务执行时间。在所有基准中,LATICS 的总执行时间最小。InK 悲观地在所有任务边界保存所有系统状态,因此在状态保存上消耗了大量时间。Coala 会在
To further explain the results, we explore in detail the CEM benchmark which has significant performance improvement. The program graph of CEM (tasks’ CFG omitted, only showing intertask transitions) is given in Fig. 6, which contains 12 tasks. There are read operations to a large-size array
为了进一步解释结果,我们详细探讨了性能显著提高的 CEM 基准。CEM 的程序图(省略任务的 CFG,只显示任务间的转换)如 Fig. 6 所示,其中包含 12 个任务。在
Third, experimental results for periodic power traces are shown in Table II (dashes in the table mean the program failed to progress). Regarding task execution time, LATICS provides comparable performance to InK, only slightly larger in some benchmarks. As tasks in LATICS are coalesced into larger
第三,周期性功率跟踪的实验结果显示在 Table II 中(表中的破折号表示程序未能取得进展)。在任务执行时间方面,LATICS 的性能与 InK 相当,只是在某些基准测试中略微超出。由于 LATICS 中的任务被凝聚成更大的
D. Further Discussion D.进一步讨论
Effectiveness of the Breaking Point Insertion Approach: To justify the effectiveness of our proposed breaking point insertion approach (Algorithm 1), we implemented an exhaustive search method (abbreviated ES) to find the “optimal” solution and compared it with our approach.
断点插入法的有效性:为了证明我们提出的断点插入法 ( Algorithm 1 ) 的有效性,我们采用了穷举搜索法(简称 ES)来寻找 "最优 "解决方案,并与我们的方法进行了比较。
In fact, the optimal breaking point locations depend on the program paths and the power traces, and an optimal solution only exists with known program path and power trace. As the executed program path and the actual power trace is only known at runtime, a clairvoyant method is required to search the optimal solution.
事实上,最佳断点位置取决于程序路径和功率轨迹,只有在已知程序路径和功率轨迹的情况下才存在最佳解决方案。由于只有在运行时才能知道已执行的程序路径和实际功率轨迹,因此需要一种 "千里眼 "方法来搜索最优解。
To this end, we run each benchmark program once with a periodic power trace (1ms power cycle) and obtain an execution history. Then, an induced program graph is generated by excluding the paths not taken in the execution history. An exhaustive search is conducted on the induced program graph by enumerating whether a breaking point is inserted on each edge of the graph. Note that as data inputs for all benchmarks are fixed in our experiment, the program is guaranteed to take the same path each time it is executed. In each experiment, we run the benchmark program with the same power trace and record the total runtime. If the search does not finish after 20 hours, we stop the search and report the best solution so far. Table IV gives the results for ES and our approach.
为此,我们用周期性功率跟踪(1 毫秒功率周期)运行每个基准程序一次,并获得执行历史记录。然后,通过排除执行历史中未走过的路径,生成一个诱导程序图。在诱导程序图上进行穷举搜索,枚举图中每条边是否插入了断点。请注意,在我们的实验中,所有基准的数据输入都是固定的,因此可以保证程序每次执行时都会走相同的路径。在每次实验中,我们都以相同的功率跟踪运行基准程序,并记录总运行时间。如果搜索在 20 小时后仍未结束,我们就会停止搜索,并报告迄今为止的最佳解决方案。 Table IV 给出了 ES 和我们方法的结果。
表 IV ES 和 LATICS 在 1 毫秒功率周期的周期性功率跟踪下获得的最短执行时间(单位:毫秒
We compare the minimal total runtime obtained by ES and the total runtime by LATICS. The exhaustive search for CRC, DIJ, and SRT were finished and the results obtained are optimal. It can be seen that the total runtimes of LATICS for the three benchmarks are slightly larger than those of ES. The reasons are twofold: first, ES is clairvoyant while LATICS is not; second, our approach, i.e., Algorithm 1, is a heuristic method which only considers a subset of the problem state space. The exhaustive search for the other 5 benchmarks did not finish within 20 hours. This is because even the program path is known, searching for an optimal breaking point assignment still needs to explore a huge state space. Note that 20 hours is still not enough for ES to find sufficiently good results so the execution times obtained by ES are larger than those of LATICS for some benchmarks.
我们比较了 ES 获得的最小总运行时间和 LATICS 获得的总运行时间。我们完成了对 CRC、DIJ 和 SRT 的穷举搜索,得到的结果均为最优。可以看出,LATICS 对三个基准的总运行时间略大于 ES。原因有二:第一,ES 是千里眼,而 LATICS 不是;第二,我们的方法,即 Algorithm 1 ,是一种启发式方法,只考虑了问题状态空间的一个子集。其他 5 个基准的穷举搜索并没有在 20 小时内完成。这是因为,即使程序路径已知,搜索最优断点分配仍然需要探索巨大的状态空间。需要注意的是,对于 ES 来说,20 个小时仍然不足以找到足够好的结果,因此对于某些基准,ES 所获得的执行时间要大于 LATICS 所获得的执行时间。
Impact of
The total runtime and backup time are shown in Table V. First, the total runtimes for all three configurations are very close, which shows that the values of
总运行时间和备份时间以 Table V 表示。首先,所有三种配置的总运行时间都非常接近,这表明在探索范围内的
表 V 在 1 毫秒功率周期的周期性功率跟踪下不同
Performance of State Copying: In LATICS, the
状态复制的性能:在 LATICS 中,
In Table VI, we report the time overhead of three copying methods running on our MSP430 board, bulk copying by DMA (DMA bulk), variable copying by DMA (DMA copy), and variable copying by CPU [CPU copy, implemented by memcpy()], for different data sizes. We provide results for copying 2 to 512B data. For variable copying, we set the variable size to be 2B, the size of an integer on MSP430.
在 Table VI 中,我们报告了在我们的 MSP430 板上运行的三种复制方法在不同数据大小下的时间开销,即通过 DMA 进行批量复制(DMA 批量复制)、通过 DMA 进行变量复制(DMA 复制)和通过 CPU 进行变量复制[CPU 复制,由 memcpy() 实现]。我们提供了复制 2 到 512B 数据的结果。对于变量复制,我们将变量大小设置为 2B,即 MSP430 上的整数大小。
表 VI 不同数据大小(以字节为单位)下不同状态复制方法的时间成本(以
For variable copying, the time overhead is linearly proportionally to the copied data size, with DMA slightly faster than CPU. For bulk copying by DMA, the copy speed continues to increase from 2 to 512B. From 16–32B data size, bulk copying by DMA is one order of magnitude faster than variable-copying approaches. Seventy memory copy ranges exist in all benchmarks, 13 of them are sized 2–15B, 43 of them are sized 16–64B, and 14 of them are sized 65–284B. So in most cases, conducting bulk copying is more efficient. The results explain why we choose bulk copying in LATICS even if extra variables may be included in a memory copy range. In general, large programs with large state copying size will benefit more from bulk copying.
对于可变复制,时间开销与复制的数据大小成线性比例,DMA 略快于 CPU。对于 DMA 的批量复制,复制速度从 2B 到 512B 持续增加。在 16-32B 数据大小范围内,DMA 的批量复制速度比可变复制方法快一个数量级。所有基准测试中存在 70 个内存复制范围,其中 13 个为 2-15B 大小,43 个为 16-64B 大小,14 个为 65-284B 大小。因此,在大多数情况下,批量复制的效率更高。这些结果解释了为什么我们在 LATICS 中选择批量复制,即使内存复制范围中可能包含额外的变量。一般来说,状态复制量大的大型程序会从批量复制中获益更多。
The actual memory layout of the state variables may impact the size of the memory copy range and thus the state copying overhead. In general, there does not exist an optimal layout in reality, as the efficiency of a memory layout is related to the actual occurrence of leading tasks which is further affected by the actual power traces. We leave the systematic investigation in memory layout optimization for future work.
状态变量的实际内存布局可能会影响内存复制范围的大小,从而影响状态复制开销。一般来说,现实中并不存在最佳布局,因为内存布局的效率与主导任务的实际发生率有关,而主导任务的发生率又受到实际功率轨迹的影响。我们将在未来的工作中对内存布局优化进行系统研究。
Related Work 相关工作
Intermittent Computing: Intermittent computing systems are powered by energy harvested from their working environment [1]–[4]. As ambient energy sources are highly unstable, a system may experience frequent power failures and has to execute intermittently. The general requirements are to ensure program progress and logical correctness in the presence of power failures [23], which are achieved by decomposing a program into small atomic execution blocks and frequently saving system states to NVM.
间歇式计算:间歇式计算系统的能源来自其工作环境 [1] - [4] 。由于环境能源极不稳定,系统可能会频繁断电,因此必须间歇执行。一般要求是在断电情况下确保程序进度和逻辑正确性 [23] ,具体做法是将程序分解成小的原子执行块,并经常将系统状态保存到 NVM 中。
Computing Paradigms: Checkpointing is adopted since early research to decompose a program into atomic blocks and conduct state saving [5]–[15]. When power fails, the system rolls back to the latest checkpoint after energy is replenished. The checkpoint is well known for its large state saving overhead: execution context including all or part of the main memory and the register files will be saved at a checkpoint for future roll back. Task-based intermittent computing systems [16]–[20] were proposed as an alternative to overcome the problems of the checkpoint-based paradigm. Programmers write programs with explicit tasks, and system states are saved at task boundaries to survive power failures. The task-based paradigm provides programmers more flexibility to decompose programs and generally saves less memory contents at task boundaries. LATICS pushed the state-of-the-art task-based intermittent computing with a significant reduction in state saving overhead.
计算范式:检查点(Checkpointing)从早期研究开始就被采用,将程序分解成原子块,并进行状态保存 [5] - [15] 。当电源失效时,系统会在能量补充后回滚到最新的检查点。众所周知,检查点具有较大的状态保存开销:包括全部或部分主内存和寄存器文件在内的执行上下文将保存在检查点,以便将来回滚。基于任务的间歇计算系统 [16] - [20] 被作为一种替代方案提出,以克服基于检查点模式的问题。程序员通过明确的任务编写程序,系统状态保存在任务边界,以便在断电后继续运行。基于任务的范例为程序员提供了更多分解程序的灵活性,通常在任务边界保存的内存内容较少。LATICS 推动了最先进的基于任务的间歇计算,显著减少了状态保存开销。
Dynamic Task Coalescing: As tasks are conservatively designed to be very small to ensure system progress, coalescing multiple tasks will reduce state saving overhead as long as the energy is sufficient. A recently proposed system, Coala [22], coalesces tasks at runtime to reduce unnecessary state saving. However, coalescing tasks introduces new WAR variables, and precisely identifying them is a major challenge. To avoid this problem, Coala implements an inefficient paging system which copies used data pages on-demand to VM, and pessimistically saves all modified data pages to NVM, which impacts the overall execution efficiency. LATICS improves the state-of-the-art by two main contributions: 1) a novel breaking point mechanism to reduce unnecessary state saving for
动态任务凝聚:由于任务被保守地设计得非常小,以确保系统的进度,因此只要能量充足,将多个任务凝聚在一起就能减少状态保存开销。最近提出的一个系统 Coala [22] 在运行时凝聚任务,以减少不必要的状态保存。然而,凝聚任务会引入新的 WAR 变量,而精确识别这些变量是一大挑战。为了避免这个问题,Coala 实现了一个低效的分页系统,按需将使用过的数据页复制到虚拟机,并悲观地将所有修改过的数据页保存到 NVM,这影响了整体执行效率。LATICS 通过两个主要贡献改进了最新技术:1)一种新颖的断点机制,可减少
Peripherals and Concurrency: Cyber–physical systems typically interact with their environment through peripherals. For such systems, maintaining consistent states across power failures is complex. Without persistent peripheral states, a program may get stuck in an infinite loop of state polling due to power failures [13]. Reoperation of peripherals across power failures may produce inconsistent system states [24]. Timekeeping is also a problem [13]: without knowledge on the power downtime, timely tasks may behave incorrectly. Interrupts may also violate the atomicity of tasks, and techniques [20] were proposed to either delay the interrupt service routine or discard interrupted tasks to ensure atomicity.
外围设备和并发性:网络物理系统通常通过外设与环境互动。对于这类系统来说,在断电时保持状态一致非常复杂。如果没有持久的外设状态,程序可能会因断电而陷入状态轮询的无限循环 [13] 。断电后重新操作外设可能会产生不一致的系统状态 [24] 。计时也是一个问题 [13] :如果不知道断电时间,及时任务的行为可能不正确。中断也可能会破坏任务的原子性,因此有人提出了一些技术 [20] 来延迟中断服务例程或丢弃被中断的任务,以确保任务的原子性。
In intermittent systems with concurrent task-based applications, new correctness challenges emerge. First, synchronous communication among concurrent tasks may violate their atomicity. In InK [18], concurrent applications communicate asynchronously and switching between applications is only allowed at task boundaries. Second, the correctness of result commits (to NVM) by multiple concurrent tasks depends on the commit sequence. In [25], a method was proposed to track conflicts among concurrent tasks during commit, the price of which is considerable runtime overhead and abortion of tasks due to conflict data accesses.
在基于并发任务应用的间歇系统中,出现了新的正确性挑战。首先,并发任务间的同步通信可能会违反它们的原子性。在 InK [18] 中,并发应用以异步方式通信,应用之间的切换只允许在任务边界进行。其次,多个并发任务将结果提交(到 NVM)的正确性取决于提交顺序。在 [25] 中,提出了一种在提交过程中跟踪并发任务间冲突的方法,其代价是相当大的运行时开销和因冲突数据访问导致的任务流产。
Conclusion 结论
This article presents LATICS, a low-overhead adaptive task-based intermittent computing system. The major challenge addressed by LATICS is how to reduce the amount of saved states in the presence of dynamic decision of how many tasks are grouped together for atomic execution. We observed that it is not always beneficial to skip state saving at task boundaries as it may sometime cause a higher amount of saved states at other places and thus leads to higher overall runtime overhead. LATICS is implemented on top of the state-of-the-art task-based intermittent computing system InK [18] and evaluated on hardware based on MSP430. The experimental results show that LATICS significantly reduces state saving overhead and improves execution efficiency compared to the state of the art.
本文介绍了基于任务的低开销自适应间歇计算系统 LATICS。LATICS 面临的主要挑战是如何在动态决定将多少任务组合在一起进行原子执行的情况下减少保存状态的数量。我们注意到,在任务边界跳过状态保存并不总是有益的,因为这有时会导致其他地方保存的状态量增加,从而导致整体运行时开销增加。LATICS 是在最先进的基于任务的间歇计算系统 InK [18] 上实现的,并在基于 MSP430 的硬件上进行了评估。实验结果表明,与最新技术相比,LATICS 显著减少了状态保存开销,提高了执行效率。