这是用户在 2024-6-22 11:26 为 https://itnext.io/leaky-abstractions-and-a-rusty-pin-fbf3b84eea1f 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Leaky Abstractions and a Rusty Pin
泄漏的抽象和生锈的大头针

Razieh Behjati, PhD
ITNEXT
Published in
17 min readApr 3, 2024

TL;DR: This article is a deep dive into the concept of pinning in Rust. I argue that while ownership and transfer of ownership are powerful abstractions, they are somewhat leaky. This should not come as a surprise, as all non-trivial abstractions are leaky! However, knowing where an abstraction is leaky helps us use it more effectively. I further explain why Pin is such an effective construct for dealing with the leak!
TL;DR:本文深入探讨了 Rust 中的固定(pinning)概念。我认为,虽然所有权和所有权转移是强大的抽象概念,但它们在某种程度上是有缺陷的。这并不奇怪,因为所有非平凡的抽象都是有缺陷的!然而,知道一个抽象在哪里有缺陷有助于我们更有效地使用它。我进一步解释了为什么固定(Pin)是处理这种缺陷如此有效的构造!

If you have any experience with Async Rust, you must have come across Pin and pinning. Pinning is a complex topic and many Rust programmers find it confusing. If you feel uncomfortable around pinning, or like me, have had too many “why” questions about it, this article is for you!
如果您有 Async Rust 的经验,您一定遇到过 Pin 和 pinning。Pinning 是一个复杂的主题,许多 Rust 程序员发现它令人困惑。如果您对 pinning 感到不舒服,或者像我一样对它有太多“为什么”问题,那么这篇文章就是为您准备的!

All the articles (e.g., [1], [2], [3]) I have read on the topic of pinning state that the need for pinning is due to the existence of self-referential objects. However, that cannot be the only reason, as we see self-referential objects in other programming languages too, but nothing similar to Rust’s pinning.
我所阅读的关于固定状态的所有文章(例如[1],[2],[3])都指出,需要固定的原因是由于存在自引用对象。然而,这不可能是唯一的原因,因为我们在其他编程语言中也有自引用对象,但没有类似 Rust 的固定状态。

Some other factor, unique to Rust, must be at play. In this article, we are aiming to understand this other factor. Along the way, we will answer the following questions:
一些其他因素,与 Rust 独特的情况有关。在本文中,我们的目标是理解这个其他因素。在此过程中,我们将回答以下问题:

  1. How are mutability, move semantics, and movability related?
    可变性、移动语义和可移动性之间有什么关联?
  2. How does Pin work, and what guarantees does it provide?
    Pin 是如何工作的,它提供了什么保障?
  3. When do you need to use Pin in your API?
    您何时需要在您的 API 中使用 Pin?
  4. How do other languages deal with self-referential objects?
    其他语言如何处理自指对象?

Pin in a nutshell 简而言之

Pin is a smart pointer. It is often said that Pin “prevents an object from being moved in memory, unless it is address-insensitive”. This article is all about deciphering this sentence, as it is not quite accurate, or rather, does not give the right impression when you first read it. Pin, alone, does not have a mechanism to keep an object in its place in memory, but is the way to express a guarantee about the object’s immovability. You, as the developer, may need to use other language constructs to achieve the stated guarantee.
Pin 是一个智能指针。通常人们说 Pin“防止对象在内存中移动,除非它是地址无关的”。本文就是要解读这句话,因为其实并不太准确,或者说在最初阅读时并没有给出正确的印象。Pin 本身并没有机制来保持对象在内存中的位置,而是一种表达对象不可移动性的方式。作为开发者,你可能需要使用其他语言构造来实现所述的保证。

The following diagram shows the structure of a Pin. We will come back to this diagram after clarifying some concepts, and will have a closer look at the three players in it when discussing Pin’s guarantees.
下图展示了一个引脚的结构。在澄清一些概念之后,我们会回到这张图,当讨论引脚的保证时,会更加详细地了解其中的三个参与者。

Movability 可移动性

If I ask you what is movability, chances are, you’ll think of Rust’s move semantics and transfer of ownership. Here is a brief summary:
如果我问你什么是可移动性,很可能你会想到 Rust 的移动语义和所有权转移。以下是简要总结:

Rust manages memory through a system of ownership with a set of rules that the compiler checks. Each value (e.g., an object) in a Rust program is owned by a variable. Ownership of a value can be transferred from one variable to another, for instance in an assignment statement. The Rust compiler calls this moving out of one variable into another. Ownership transfer is necessary for guaranteeing single ownership, a rule that the compiler relies on for ensuring memory safety.
Rust 通过一套所有权管理系统和编译器检查的一组规则来管理内存。Rust 程序中的每个值(例如对象)都归某个变量所有。值的所有权可以从一个变量转移到另一个变量,例如在赋值语句中。Rust 编译器称此为从一个变量移出并转移到另一个变量。所有权转移对于保证单一所有权是必要的,这是编译器依赖于确保内存安全的规则。

Question: Is ownership transfer the type of move that Pin expresses a guarantee about?
问题:所有权转让是平表达保证的一种类型吗?

Not really. In the context of pinning, “move” and “movability” mean that the value has moved in the mechanical sense of being located at a new place in memory. In the following, I am going to refer to this as a mechanical move.
并不完全是。在固定针脚的上下文中,“移动”和“可移动性”表示数值在机械上移动到内存中的新位置。以下,我将称其为机械移动。

Two types of move 两种移动方式

Question: Does a “move”, or transfer of ownership of a value, involve a mechanical move?
问题:在价值所有权的“移动”或转让中涉及机械移动吗?

The short answer is “yes”. The longer answer is “yes, but sometimes only partially”.
短答案是“是”。较长的答案是“是,但有时只是部分地”。

Question: Can the location of a value in memory change without a transfer of ownership?
问题:一个值在内存中的位置是否可以在不转移所有权的情况下改变?

This one is trickier to answer. This is where the boundaries between transfer of ownership, mechanical move, and even mutability get blurred. To answer this question, it is best to look at some examples.
这个问题更棘手一些。在这里,所有权转移、机械移动甚至可变性之间的界限变得模糊不清。要回答这个问题,最好看一些例子。

In the first example (playground), we create a vector with capacity 2, then push 4 elements to it. The elements are of type SelfPoint:
在第一个示例(playground)中,我们创建了一个容量为 2 的向量,然后将 4 个元素推送到其中。元素的类型为 SelfPoint

// Example 1:

struct SelfPoint {
data: u32,
ptr: *const u32,
}

impl SelfPoint {
fn new(data: u32) -> Self {
SelfPoint {
data,
ptr: std::ptr::null(),
}
}

fn init(&mut self) {
let ptr: *const u32 = &self.data;
self.ptr = ptr;
}
}

fn main() {
let mut v = Vec::with_capacity(2);

for i in 0..4 {
let x = SelfPoint::new(i);
v.push(x);
v.last_mut().expect("vec is empty").init();
println!("Vector addr & cap: {:p}, {}; heap location: {:p}",
&v, v.capacity(), &v[0]);
}
...
}

Each time we push an element to the vector, a move happens. This move involves both a transfer of ownership and a mechanical move (from the stack, to the call stack of push, and then to the heap). After pushing each element to the vector, we call init on it to set the value of ptr to the address of data. Since we initialized the vector with capacity 2, pushing the third element will result in allocating a larger buffer, and moving the elements to the new buffer. The following diagram shows the memory layout of the vector, before and after pushing the third element to it.
每次我们向向量推送一个元素,都会发生移动。这个移动涉及所有权的转移和机械移动(从堆栈到 push 的调用堆栈,然后到堆)。将每个元素推到向量后,我们调用 initptr 的值设置为 data 的地址。由于我们使用容量为 2 来初始化向量,推送第三个元素将导致分配一个更大的缓冲区,并将元素移动到新缓冲区。以下图示显示了向量的内存布局,在向其推送第三个元素之前和之后。

Notice how after the third push, the ptr fields of the first two elements of the vector point to memory locations they no longer own. This is exactly the kind of problem that Pin is designed to prevent.
请注意,在第三个 push 之后,向量的前两个元素的 ptr 字段指向它们不再拥有的内存位置。这正是 Pin 设计旨在防止的问题类型。

In this example, the memory allocated to the vector on the heap is moved, mechanically, but no explicit transfer of ownership has happened. Deep in the implementation of Vec, the vector’s ptr is updated and mutated, but even this is not something that I can call a transfer of ownership.
在这个例子中,分配给堆上向量的内存会被机械地移动,但并没有发生所有权的显式转移。在 Vec 的实现中,向量的第 0 项会被更新和改变,但即使这样也不能称之为所有权的转移。

What happens if you write a similar program in Python or Java? Surely there too new memory has to be allocated as the vector expands, but will SelfPoint instances be relocated? How about in C++?
如果您在 Python 或 Java 中编写类似的程序,会发生什么?毫无疑问,随着向量扩展,也必须分配新内存,但是 SelfPoint 实例会被重新定位吗?在 C++中呢?

In the next example (playground), we initialize the vector with a larger capacity to prevent memory reallocation. We also pass the vector to a function causing a transfer of ownership.
在下一个示例(playground)中,我们使用更大的容量初始化向量,以防

// Example 2:

fn validate(v: Vec<SelfPoint>) {
for x in &v {
let data = &x.data as *const u32;
assert!(data == x.ptr);
}
println!("Validated vector at {:p}; buffer at: {:p}", &v, v.as_ptr());
}

fn main() {
let mut v = Vec::with_capacity(4);

for i in 0..4 {
let x = SelfPoint::new(i);
v.push(x);
v.last_mut().expect("vec empty").init();
}

println!("Created vector\t at {:p}; buffer at: {:p}", &v, v.as_ptr());
validate(v);
}

The following diagram shows the memory layout in this case.
以下图表显示了此案例中的内存布局。

In both of the examples, vector v is partially moved to a new location in memory. In the first example, only the heap-allocated memory is moved, without any transfer of ownership. In the second example, only the stack-allocated memory is moved, involving a transfer of ownership.
在这两个例子中,向量 v 部分移动到内存中的新位置。在第一个例子中,只有堆分配的内存被移动,而没有所有权的转移。在第二个例子中,只有栈分配的内存被移动,涉及所有权的转移。

With these examples in mind, we can now answer the question “Can the location of a value in memory change without a transfer of ownership?”
有了这些例子,我们现在可以回答问题“一个值在内存中的位置是否可以在没有所有权转移的情况下发生改变?”

The answer is yes. In other words, mechanical move is a broader concept than ownership transfer.
答案是肯定的。换句话说,机械移动是比所有权转让更广泛的概念。

Movability and mutability
可移动性和可变性

The mechanical move in Example 1 was caused by a mutation. In fact, there is a very close connection between mutability and movability. More specifically, mutations often entail a mechanical move (see the classic example with mem::swap). However, they are not necessarily accompanied by a transfer of ownership. Transfer of ownership involves marking a memory location as invalid, but in the case of mem::swap for instance, both memory locations remain valid after the operation.
示例 1 中的机械移动是由突变引起的。实际上,可变性和可移动性之间有非常密切的关系。更具体地说,突变常常需要一个机械移动(见 mem::swap 的经典示例)。但它们并不一定伴随着所有权的转移。所有权的转移涉及将内存位置标记为无效,但例如在 mem::swap 的情况下,操作后两个内存位置依然有效。

Due to this close connection between mutability and movability, in the description of Pin, we see many references to mutability, despite Pin being all about movability!
由于可变性和可移动性之间的紧密联系,在描述中我们看到了许多关于可变性的参考,尽管 Pin 是关于可移动性的!

Leaky Abstractions 漏洞的抽象

While ownership and transfer of ownership are powerful abstractions, they are somewhat leaky, particularly when it comes to capturing the notion of mechanical moves. This is hinted at in the explanation of ownership in the Rust book. Moreover, as we saw in the use of Vec in Example 1, we may need to study the internal details of a type when deciding whether its manipulation can result in a mechanical move or not.
所有权和所有权转移虽然是强大的抽象概念,但在捕捉机械移动概念时有些不太准确。在 Rust 书中有所暗示。而且,正如我们在示例 1 中使用 Vec 所看到的,我们可能需要研究类型的内部细节,以决定其操作是否会导致机械移动。

Having a leaky abstraction or only partially hiding details in a programming language is nothing new. Niklaus Wirth, in his 1974 paper “On the Design of Programming Languages”, writes:
将抽象泄漏或部分隐藏细节在编程语言中并不新鲜。 Niklaus Wirth 在他 1974 年的论文《关于编程语言设计》中写道:

I found a large number of programs perform poorly because of the language’s tendency to hide “what is going on” with the misguided intention of “not bothering the programmer with details”.
我发现很多程序执行效果不佳,是因为该语言倾向于隐藏“发生了什么”,误以为“不让程序员受到细节的困扰”而采取了这种方法。

In my view, Rust programs are anything but “programs that perform poorly”, but knowing that an abstraction is leaky allows us to use it more effectively.
在我看来,Rust 程序绝非“执行效率低”的程序,但知道抽象是有缺陷的让我们能更有效地使用它。

Address sensitivity 地址敏感性

So far, we’ve learned that objects in Rust are movable in a mechanical sense, and that this kind of move is different from ownership transfer.
到目前为止,我们已经了解到 Rust 中的对象在机械意义上是可移动的,这种移动方式与所有权转移是不同的。

In some cases, the validity of an object’s invariants and the soundness of its behavior depend on its location in memory. We say that such an object is in an address-sensitive state. Self-referential objects (i.e., objects with pointers to themselves, or to fields in themselves) are the archetype of address-sensitivity. In the examples above, calling init on a SelfPoint instance made it self-referential and address-sensitive.
在某些情况下,一个对象的不变性的有效性以及其行为的完整性取决于其在内存中的位置。我们称这样的对象处于地址敏感状态。自引用对象(即,具有指向自身或自身字段的指针的对象)是地址敏感的原型。在上述示例中,对 SelfPoint 实例调用 init 使其自引用且地址敏感。

Pointing vs referencing 指向 vs 参考

The field ptr in SelfPoint is a pointer, not a reference! If we change ptr to be a reference, we get a compiler error (playground)! Moving a truly self-referential (as opposed to a self-pointing!) object violates Rust’s borrow-checking rules.
SelfPoint 中的 ptr 是一个指针,而不是引用!如果我们将 ptr 改为引用,编译器会报错( playground )!移动一个真正的自我引用对象(而不是自我指向!)违反了 Rust 的借用检查规则。

Strictly speaking, address-sensitivity is a state of an object, not necessarily a property that can be specified via its type (e.g., instances of SelfPoint are not address-sensitive before calling init on them). However, it is often a good practice to associate a different type to each state of a stateful entity. Following this practice, a self-referential object becomes address-sensitive early at the beginning of its lifetime, and remains address-sensitive until the end of its lifetime. In other words, once your object is no longer address-sensitive, it is a good practice to consume it and convert it into an instance of another type.
严格来说,地址敏感性是对象的状态,不一定是可以通过其类型指定的属性(例如,在调用其上的 init 之前, SelfPoint 的实例不是地址敏感的)。然而,通常将一个有状态实体的每个状态关联到不同的类型是一个好的做法。遵循这种做法,自引用对象在其生命周期的早期就变得地址敏感,并一直保持地址敏感直到其生命周期结束。换句话说,一旦你的对象不再是地址敏感的,最好的做法是消耗它并将其转换为另一个类型的实例。

As a result, in practice, address-sensitivity can be specified via types. This allows the compiler to reason about the address-sensitivity of objects at compile time.
因此,在实践中,可以通过类型来指定地址敏感性。这使得编译器能够在编译时推断对象的地址敏感性。

Why is pinning needed? 为什么需要定位?

Once an object is in an address-sensitive state, the compiler can guarantee memory-safe interaction with it, only if the object stays in its location in memory. If an object is not address-sensitive, it can freely move around in memory, without violating any of the compiler’s safety guarantees.
一旦对象处于地址敏感状态,编译器可以保证与其进行内存安全交互,前提是对象始终保持在内存中的位置。如果对象不是地址敏感的,它可以自由在内存中移动,而不违反编译器的任何安全保证。

To tell the compiler that instances of a type will not be address-sensitive, we implement the Unpin auto trait for that type. By default, all types are Unpin. To announce potential address-sensitivity of instances of a type, we implement !Unpin for it. We will learn more about Unpin below.
告诉编译器某种类型的实例不会敏感于地址,我们为该类型实现 Unpin 自动 trait。默认情况下,所有类型都是 Unpin 。要宣布某种类型的实例可能对地址敏感,我们为其实现 !Unpin 。我们将在下文中了解更多关于 Unpin

To promise to the compiler that an object has a fixed location in memory, or otherwise is address-insensitive, we wrap a pointer (i.e., pinning pointer) to that object (i.e., pointee) in a Pin. As we will see below, Pin restricts mutable access to a pointee that is !Unpin.
为了向编译器承诺对象在内存中具有固定位置,或者是与地址无关的,我们会在一个指针(即固定指针)中封装指向该对象(即指向者)的 Pin 。正如我们将在下文所见, Pin 限制对一个 !Unpin 的可变访问。

In short, pinning (i.e., the use of Pin and Unpin), are the means that you as a developer use to help the compiler provide its memory-safety guarantees, and avoid undefined behavior when it comes to address-sensitive types and objects.
简而言之,固定(即,使用 PinUnpin ),是您作为开发人员用来帮助编译器提供其内存安全性保证,并避免在涉及地址敏感型类型和对象时出现未定义行为的手段。

The Unpin Trait 不稳定的特征

Unpin is a marker trait, and is automatically implemented for almost all types. It is implemented even for type SelfPoint in Example 1. This is because the compiler has no reason to assume that at runtime you are going to set SelfPoint.ptr to point toSelfPoint.data.
Unpin 是一个标记特征,并且几乎所有类型都会自动实现它。即使在示例 1 中,甚至对于类型 SelfPoint 也实现了它。这是因为编译器没有理由假设在运行时你会把 SelfPoint.ptr 设置为指向 SelfPoint.data

To tell the compiler that instances of SelfPoint will be self-referential and address-sensitive, you have to explicitly implement !Unpin for it. The conventional way of doing this is to add a field of type PhantomPinned to your type. PhantomPinned is !Unpin, and a type that contains a PhantomPinned does not get a default Unpin implementation. We change SelfPoint as follows:
为了告诉编译器 SelfPoint 的实例将是自我引用且地址敏感的,您必须为其显式实现 !Unpin 。做此操作的传统方式是在类型中添加一个 PhantomPinned 类型的字段。 PhantomPinned!Unpin ,包含 PhantomPinned 的类型不会获得默认的 Unpin 实现。我们如下更改 SelfPoint

// Example 3:

use std::marker::PhantomPinned;

struct SelfPoint {
data: u32,
ptr: *const u32,
_pinned: PhantomPinned,
}

If your type has a field that is !Unpin (e.g., a tokio::time::Sleep field), and you want to make your type Unpin, you have to explicitly implement Unpin for it. Doing so is as simple as writing a single line of code.
如果您的类型有一个字段是 !Unpin (例如,一个 tokio::time::Sleep 字段),并且您希望使您的类型 Unpin ,您必须明确为其实现 Unpin 。 这样做就像写一行代码那么简单。

impl Unpin for MyType {}

But you have to be very careful, because when you implement Unpin, you are promising to the compiler that instances of your type will be address-insensitive and freely movable.
但是你必须非常小心,因为当你实现 Unpin 时,你承诺编译器你的类型的实例是地址无关和自由移动的。

Pin and its guarantees 销钉和它的保证

Now that we know why pinning is needed, we can take a closer look at Pin, its API, and its guarantees.
现在我们知道为什么需要固定,我们可以更仔细地查看 Pin,它的 API 和它的保证。

The structure of a Pin
销的结构

As mentioned above, to promise to the compiler that an object either has a fixed location in memory (i.e, is immovable) or is address-insensitive (i.e., is Unpin), we wrap a pointer to it in a Pin. This is shown in the diagram below (repeated from above).
如上所述,为了向编译器承诺一个对象要么在内存中拥有固定位置(即不可移动),要么是地址无关的(即 Unpin ),我们将其指针封装在 Pin 中。如下图所示(与上文重复)。

  • The object that we want to express the promise about is called the pointee.
    我们想要表达承诺的对象称为指针。
  • The pointer to pointee is called the pinning pointer.
    指向所指对象的指针称为固定指针。
  • The pin itself is a smart pointer that wraps around the pinning pointer, and takes ownership of it.
    该针脚本身是一个智能指针,它围绕着固定指针,接管了它的所有权。

We say that pin pins the pointee.
我们说 pin 的时候钉住了 pointee。

Mutability, Movability, and Pin’s guarantees
可变性、可移动性和 Pin 的保证

Once you express your promise about the immovability or address-insensitivity of an object, by pinning it, Pin guarantees that it will maintain your promise. To achieve this, Pin won’t easily give mutable access to the pointee. This is important, because as we saw above, mutable access to a value can result in moving it in memory. For address-sensitive objects, this may violate the invariant of the object, or cause memory-safety issues.
一旦你表达了关于对象的不可移动性或地址不敏感性的承诺,通过 Pin 它,Pin 保证它将保持你的承诺。为了实现这一点,Pin 不会轻易提供可变的指针访问。这很重要,因为正如我们在上面看到的,对值的可变访问可能会导致其在内存中移动。对于地址敏感的对象,这可能会违反对象的不变式,或者导致内存安全问题。

APIs utilizing Pin, with Future::poll being a prominent example, are concerned with integrity and validity of the pointee’s invariants rather than its specific memory location. These APIs need to mutate the pointee, and need to know that doing so won’t result in undefined behaviors or a violation of the pointee’s invariants. To establish this proposition, immovability is used as an overapproximation — a sufficient and easier-to-verify condition that ensures the object’s integrity as it is being mutated.
使用 Pin 的 API,其中 Future::poll 是一个突出的例子,关注的是指向物的不变性和有效性,而不是其特定的内存位置。这些 API 需要变异指向物,并且需要知道这样做不会导致未定义的行为或违反指向物的不变性。为了建立这一命题,使用不可移动性作为一个过度近似 —— 这是一个足够且更容易验证的条件,可以确保对象在被变异时具有完整性。

By relying on immovability, Pin provides a simple and generic API that allows safe interaction with address-sensitive objects in arbitrary contexts without having to make any additional assumptions about the internal invariants of those objects.
通过依赖不可移动性,Pin 提供了一个简单和通用的 API,允许在任意环境中与地址敏感对象安全地交互,而不需要对这些对象的内部不变性做任何额外假设。

Pinning address-insensitive objects
固定地址无关的对象

Instances of an Unpin type are promised to be address-insensitive, and therefore allowed to move freely in memory. Pin has a safe API for working with Unpin objects. The following table lists the methods in Pin’s safe API. Those in green require the pointee to be Unpin. The table shows the API from stable Rust 1.77.0.
Unpin 类型的实例被承诺是不受地址敏感的,因此允许在内存中自由移动。 Pin 有一个安全的 API 用于处理 Unpin 对象。下表列出了 Pin 安全 API 中的方法。绿色标出的需要指向物为 Unpin 。表显示了稳定 Rust 1.77.0 的 API。

If the pointee is Unpin, you can easily wrap it in a Pin, using the new method, without any additional restrictions. Moreover, if the pointee is Unpin, you can use the get_mut method to get mutable access to it. Mutating the pointee is safe because it is address-insensitive, and can move and mutate freely.
如果指针是 Unpin ,您可以轻松地将其包装在 Pin 中,使用 new 方法,而不会有任何额外的限制。此外,如果指针是 Unpin ,您可以使用 get_mut 方法来获得对其的可变访问权限。改变指针是安全的,因为它是地址不敏感的,可以自由移动和改变。

The methods in blue do not give direct mutable access to pointee, and cannot be used to mutate or move the object, without additional, possibly unsafe, code. Therefore they are safe.
蓝色的方法不能直接访问 pointee 以进行可变操作,并且不能用于在没有额外可能不安全的代码的情况下改变或移动对象。因此它们是安全的。

Pinning address-sensitive objects
固定地址敏感对象

If instances of a type can enter an address-sensitive state, then that type must be declared !Unpin.
如果一个类型的实例可以进入一个地址敏感的状态,那么这个类型必须被声明 !Unpin

To soundly Pin instances of an !Unpin type, you have to promise that the pointee’s data will not be moved nor have its storage invalidated, until it gets dropped. The compiler cannot check these promises, so when working with !Unpin types, you have to write unsafe code. Pin offers a number of unsafe methods particularly for working with !Unpin types.
要安全地处理 !Unpin 类型的 Pin 实例,您必须承诺指针的数据不会被移动,也不会使其存储失效,直到其被丢弃。编译器无法检查这些承诺,因此在处理 !Unpin 类型时,您必须编写不安全的代码。 Pin 提供了一些特别用于处理 !Unpin 类型的不安全方法。

To create an instance of Pin in this case, you have to use the unsafe new_unchecked method. The documentation for this method states a few additional promises that you have to make. These promises allow Pin to guarantee that your use of its safe API (colored in blue in the previous table) will be sound.
在这种情况下,要创建 Pin 的一个实例,您必须使用不安全的 new_unchecked 方法。该方法的文档规定了一些额外的承诺,您必须遵守这些承诺。这些承诺使 Pin 能够保证您对其安全 API(在前表中标记为蓝色)的使用是安全的。

These additional promises are about the pinning pointer, which is the argument to new_unchecked. Needless to say, the pinning pointer must be Deref. You have to promise that in your implementation of Deref::deref, you do not “move [any data] out of self” (even a partial move violates this promise). You have to make a similar promise about your implementation of DerefMut if you have it. This is because Pin relies on Deref::deref and DerefMut::deref_mut in the implementation of some of its safe methods (e.g., as_ref and as_mut), and can only maintain the immovability promise, if Deref::deref and DerefMut::deref_mut guarantee it.
这些额外的承诺与固定指针有关,该指针是 new_unchecked 的参数。毫无疑问,固定指针必须是 Deref 。您必须承诺在您的 Deref::deref 实现中不要“移动[任何数据]出 self ”(甚至部分移动都违反此承诺)。如果您有的话,您必须对 DerefMut 的实现作出类似的承诺。这是因为 Pin 在其部分安全方法的实现中依赖 Deref::derefDerefMut::deref_mut (例如 as_refas_mut ),只有当 Deref::derefDerefMut::deref_mut 保证它时,才能保持不可移动性的承诺。

Note that inside DerefMut::deref_mut, you have mutable access to pointee. A malicious implementation could simply use swap to move a value out of self.
请注意,在 DerefMut::deref_mut 内,您可以对指针进行可变访问。恶意实现可以简单地使用交换操作将值移出 self。

The methods map_unchecked and map_unchecked_mut are consuming methods that, among other things, are particularly useful for converting the pointee to an Unpin type, signaling its transition to an address-insensitive state. As an example, we could map instances of SelfPoint to address-insensitive u32 values holding only the content of the field data.
方法 map_uncheckedmap_unchecked_mut 是耗费方法,除其他外,它们特别适用于将指向对象转换为 Unpin 类型,表明其转变为地址无关状态。例如,我们可以将 SelfPoint 的实例映射到地址无关 u32 值,仅保存字段 data 的内容。

Putting it all together 将所有材料放在一起

We can use Pin and Unpin to fix the problem that we faced in Example 1. Here are the changes we have to make for the fix:
我们可以使用 PinUnpin 来解决我们在示例 1 中遇到的问题。以下是我们需要进行修正的更改:

  1. Change SelfPoint to be !Unpin as in Example 3.
    SelfPoint 改为 !Unpin ,就像示例 3 中那样。
  2. Change init to take Pin<&mut Self> as its receiver. It only makes sense to make an instance of SelfPoint self-referential if we can guarantee that it is immovable. One way to do this is to move instances of SelfPoint to the heap.
    init 更改为以 Pin<&mut Self> 作为其接收者。只有在我们可以保证它是不可移动的情况下,才有意义将 SelfPoint 的实例设为自引用。实现这一点的一种方法是将 SelfPoint 的实例移动到堆中。
  3. Change the vector to store pinned references to SelfPoint instances.
    更改向量以存储对 SelfPoint 实例的固定引用。

Here is the final code, with these changes applied (playground).
这是最终的代码,应用了这些更改(游乐场)。

// Example 4:

use std::marker::PhantomPinned;
use std::pin::Pin;

struct SelfPoint {
data: u32,
ptr: *const u32,
_pinned: PhantomPinned,
}

impl SelfPoint {
fn new(data: u32) -> Self {
SelfPoint {
data,
ptr: std::ptr::null(),
_pinned: PhantomPinned,
}
}

fn init(self: Pin<&mut Self>) {
let ptr: *const u32 = &self.data;
let this = unsafe { self.get_unchecked_mut() };
this.ptr = ptr;
}
}

fn validate(v: Vec<Pin<Box<SelfPoint>>>) {
for x in &v {
let data = &x.data as *const u32;
assert!(data == x.ptr);
}
println!("Validated vec:\t{:p}; buffer addr: {:p}; v[0] addr: {:p}",
&v, v.as_ptr(), v[0]);
}

fn main() {
let mut v = Vec::with_capacity(2);

for i in 0..4 {
let mut x = Box::pin(SelfPoint::new(i));
x.as_mut().init();
v.push(x);
println!("{i} Vector addr:\t{:p}; buffer addr: {:p}; v[0] addr: {:p}",
&v, &v[0], v[0]);
}

validate(v);
}

Do I need to use Pin in my API?
我需要在我的 API 中使用 Pin 吗?

When I started digging into pinning, I wanted to be able to comfortably decide whether to use Pin in my APIs or not. This decision becomes relevant when designing generic APIs or traits, where we don’t know the exact types of the objects that will interact with the API.
当我开始深入研究固定时,我希望能够自如地决定在我的 API 中是否使用 Pin 。当设计通用 API 或特性时,这个决定变得相关,因为我们不知道将与 API 进行交互的对象的确切类型。

In these cases, you might want to use Pin, if:
在这些情况下,您可能想要使用 Pin ,如果:

  1. Your API is likely to work with address-sensitive types and objects. For instance, types that implement the Future trait are likely to be self-referential, and therefore address-sensitive. APIs that work with a Future need to Pin it. On the other hand, elements stored in a Vec don’t have that tendency, and more often than not are address-insensitive.
    您的 API 很可能需要处理与地址相关的类型和对象。例如,实现 Future 特质的类型很可能是自引用的,因此和地址有关。处理 Future 的 API 需要 Pin 它。另一方面,存储在 Vec 中的元素则没有这种倾向,往往是地址无关的。
  2. Your API needs to mutate objects. For instance, Future::poll is called to make progress on a Future object. This requires changing, or mutating, the state of the Future object. On the other hand, in its implementation, Vec does not mutate the elements in its buffer.
    您的 API 需要改变对象。例如, Future::poll 被调用以在 Future 对象上取得进展。这需要改变,或者说是修改 Future 对象的状态。另一方面,在其实现中, Vec 不会改变其缓冲区中的元素。
  3. The safety of your API depends on the object’s integrity and the validity of its invariants. For instance, as a safe method, calls to Future::poll “must never cause undefined behavior (memory corruption, incorrect use of unsafe functions, or the like), regardless of the future’s state.” On the other hand, while most of the Vec’s methods are safe, its contract is not affected by the invariants of the elements in its buffer. The presence of address-sensitive objects in a Vec does not cause undefined behavior when invoking any of its methods.
    您的 API 的安全性取决于对象的完整性和不变性的有效性。例如,作为安全方法,调用 Future::poll “绝不能引起未定义行为(内存损坏、不正确使用不安全函数等),无论未来的状态如何”。另一方面,虽然大多数 Vec 的方法是安全的,但其协议不受其缓冲区中元素的不变性的影响。在 Vec 中存在地址敏感对象时,调用其任何方法也不会引起未定义行为。

If your API passes all these checks, you’ll need Pin in your API. Passing these checks implies that your API depends tightly on the inner working of the types that it interacts with. Using Pin hides those details behind an immovability promise. This promise, although possibly more conservative than what you want, is easier to reason about.
如果您的 API 通过了所有这些检查,您将需要您的 API 中的 Pin 。通过这些检查意味着您的 API 在与其交互的类型的内部工作中紧密依赖。使用 Pin 可以隐藏这些细节,背后是不可移动性的承诺。这个承诺,虽然可能比您想要的更保守,但更容易推理。

Otherwise, even if you have a general purpose API like Vec, you won’t need Pin in your API; and your API will work just fine with pinned objects too.
否则,即使您拥有像 Vec 这样的通用 API,您的 API 中也不需要 Pin ;并且您的 API 在使用固定对象时也能正常工作。

Before wrapping up this section, I want to emphasize again that Pin does not force the pointee to stay in its location. It merely expresses an immovability promise about it. While all objects in Rust are movable, most objects never leave their original locations in memory throughout their lifetimes. For instance, most heap-allocated objects never move, as we saw in Example 4.
在结束本节之前,我想再次强调 Pin 并不能强制指针停留在原地,它只是对其表达了不可移动的承诺。虽然 Rust 中的所有对象都是可移动的,但大多数对象在其生命周期中从未离开过它们的原始内存位置。例如,正如我们在示例 4 中所看到的,大多数在堆上分配

How do other languages deal with self-referential objects?
其他语言如何处理自指对象?

To answer this question, let’s start with Java, as a representative of garbage-collected languages.
要回答这个问题,让我们从 Java 作为垃圾收集语言的代表开始。

In Java, all objects are heap-allocated. In your programs, you work with references to these objects, and can easily make self-referential objects. Consider the following example, which declares class SelfRef.
在 Java 中,所有对象都是堆分配的。在您的程序中,您使用对这些对象的引用,并且可以轻松地创建自引用对象。考虑以下声明类 SelfRef 的示例。

// Example 5:
class SelfRef {
SelfRef self;

void init() {
this.self = this;
}
}

You can make instances of SelfRef self-referential by calling init on them. But how about moving instances of SelfRef in memory? Well, the language does not provide you with a mechanism to do that. The notion of movability, in the sense of manually relocating objects in memory, does not exist in Java or similar garbage-collected languages. In these languages, objects reside safely in the heap, each hiding behind a reference. However, this does not mean that objects in these languages never relocate in the memory. The runtime, more specifically the garbage collector, does move objects around. This happens specifically when defragmenting or compacting the heap after removing the garbage collected objects. The garbage collector manages this process by tracking references to ensure their validity.
您可以通过调用 init 来使 SelfRef 的实例自引用。但是如何移动 SelfRef 的实例呢?嗯,语言并未提供这样的机制。在 Java 或类似的垃圾收集语言中,不存在以手动方式重定位内存中的对象的概念。在这些语言中,对象安全地驻留在堆中,每个对象把自己隐藏在引用之后。但是,这并不意味着在这些语言中的对象从不会在内存中进行重新定位。运行时系统,更具体地说是垃圾收集器,会移动对象。特别是在清除垃圾收集对象后对堆进行碎片整理或压缩时会发生这种情况。垃圾收集器通过跟踪引用以确保其有效性来管理这个过程。

The same is true for other garbage-collected languages, such as Python. An exception worth mentioning is Golang, which (as opposed to Java and Python) allows pointer types, and pass-by-value. In Golang, you can declare a type similar to SelfPoint in Example 1, create self-referential instances of it, and move them around in memory. The garbage-collector does not release the old memory as long as a pointer to it exists, but you have to be careful with invariants as the objects move.
同样适用于其他垃圾收集语言,比如 Python。值得一提的例外是 Golang,它(与 Java 和 Python 相反)允许指针类型和按值传递。在 Golang 中,你可以像示例 1 中的 SelfPoint 声明一个类似的类型,创建它的自引用实例,并在内存中进行移动。只要存在指向旧内存的指针,垃圾收集器就不会释放旧内存,但在对象移动时必须小心处理不变量。

In C++, as far as I understand, you can move objects around. However, the language allows you to custom-implement move constructors. I suppose, move constructors are where you can ensure that invariants of your type are maintained as your objects move around in memory.
在 C++中,据我所知,您可以移动对象。但是,语言允许您自定义实现移动构造函数。我认为,移动构造函数是您可以确保类型的不变量在对象在内存中移动时得以维护的地方。

Conclusion 结论

In my quest for understanding Pin, I came to the conclusion that it is a construct that sits at the intersection of a few leaky abstractions, and provides a simple and generic mechanism for expressing invariants about address-sensitive objects without depending on their internal details.
在我寻求理解 Pin 的过程中,我得出结论,它是一个构建在一些易泄漏的抽象概念交叉点上的构造,并提供了一种简单和通用的机制,用于表达关于地址敏感对象的不变式,而不依赖于它们的内部细节。

The main leaky abstraction is the concept of moving a value. You want to think of it as a transfer of ownership, but it is more complex than that. Sometimes you don’t have an explicit transfer of ownership, but parts of the value move to another location in memory. Sometimes you have an explicit transfer of ownership, but parts of the value (e.g., the heap-allocated parts) stay in their memory location.
主要的泄漏抽象是移动值的概念。您希望将其视为所有权的转移,但它比那更复杂。有时您没有显式的所有权转移,但值的一部分会移动到内存中的另一个位置。有时您有显式的所有权转移,但值的一部分(例如,堆分配的部分)仍留在它们的内存位置。

These subtleties become important when working with address-sensitive objects, whose invariants may be violated if their location in memory changes.
当处理地址敏感的对象时,这些微妙之处变得很重要,如果它们在内存中的位置改变,可能会违反它们的不变量。

It is important to note that it is not the location of these objects that we care about, but their integrity and the validity of their invariants. However, it is generally easier to reason about the movability of an object rather than its invariants. Pin and Unpin are the constructs that we use for expressing movability guarantees.
重要的是要注意的是我们关心的不是这些对象的位置,而是它们的完整性和不变量的有效性。然而,通常更容易推理对象的可移动性,而不是其不变量。 PinUnpin 是我们用来表达可移动性保证的构造。

Despite some unsoundness issues in earlier versions of Pin, in my assessment, Pin and Unpin form a neat and effective abstraction that conceals the complexities of memory management that are leaked via the concepts of ownership and move semantics. I am curious about their origin and evolution. The story is certainly somewhere in the discussions or on Github, if you go looking!
尽管早期版本的 Pin 存在一些不完善的问题,但在我的评估中, PinUnpin 构成了一个简洁有效的抽象,隐藏了通过所有权和移动语义概念泄漏出的内存管理复杂性。我对它们的起源和发展很感兴趣。这个故事肯定在讨论中某处或者在 Github 上,如果你去寻找的话!

This article is a follow up on my earlier article titled AsyncWrite and a Tale of Four Implementations, which goes into an in-depth discussion of the decisions you have to make when working with async and Futures.
这篇文章是我早些时候发表的一篇文章《AsyncWrite 和四种实现故事》的后续内容,深入讨论了在处理异步和 Futures 时需要做出的决定。

References 参考资料

  1. Pinning in the Async Rust book
    异步 Rust 书中的固定位置
  2. Pin and Suffering 品味与遭受
  3. Pin, Unpin, and why Rust needs them
    固定,取消固定,以及为什么 Rust 需要它们
  4. The vision for Rust specification from the Inside Rust Blog
    《从 Inside Rust 博客中看 Rust 规范的愿景》
  5. Nightly Rust documentation of pinning
    每晚的 Rust 固定文档
  6. Rust internals Rust 内部结构

Razieh Behjati, PhD
ITNEXT
Writer for

Software Engineer & ML Researcher. I write about the topics that I am interested in, in an attempt to bring more clarity and higher-resolution to my knowledge.

More from Razieh Behjati, PhD and ITNEXT
更多来自 Razieh Behjati 博士和 ITNEXT

Recommended from Medium 推荐自 Medium

Lists 列表

See more recommendations
The action has been successful