What are CRDTs 什么是 CRDT
CRDT (conflict-free replicated data type) is a data structure that can be
replicated across multiple computers in a network, where replicas can be updated
independently and in parallel, without the need for coordination between
replicas, and with a guarantee that no conflicts will occur.
CRDT(无冲突复制数据类型)是一种可以在网络中的多台计算机上复制的数据结构,其中副本可以独立并行更新,不需要副本之间的协调,并且保证不会发生冲突发生。
CRDT is often used in collaborative software, such as scenarios where multiple
users need to work together to edit/read a shared document, database, or state.
It can be used in database software, text editing software, chat software, etc.
CRDT 通常用于协作软件,例如多个用户需要一起编辑/读取共享文档、数据库或状态的场景。可用于数据库软件、文本编辑软件、聊天软件等。
What problems does CRDT solve?
CRDT解决什么问题?
For example, a scenario where multiple users edit the same document online at
the same time.
例如,多个用户同时在线编辑同一个文档的场景。
This scenario requires that each user sees the same content, even after
concurrent edits by different users (e.g. two users changing the title at the
same time), which is known as consistency. (To be precise, CRDT satisfies
the eventual consistency, see below for more details)
这种情况要求每个用户看到相同的内容,即使是在不同用户并发编辑之后(例如两个用户同时更改标题),这称为一致性。 (准确来说CRDT满足最终一致性,更多详情见下文)
Users can use CRDT even when they are offline. They can be back on sync with
others the network is restored. It also supports collaboratively editing with
other users via P2P. It is known as partitioning fault tolerance. This
allows CRDT to support decentralized applications very well: synchronization
can be done even without a centralized server.
用户即使在离线状态下也可以使用CRDT。当网络恢复时,他们可以重新与其他人同步。它还支持通过P2P与其他用户协作编辑。这称为分区容错。这使得 CRDT 能够很好地支持去中心化应用:即使没有中心化服务器也可以完成同步。
The Emergence of CRDTs
CRDT 的出现
The formal concept of Conflict-free Replicated Data Types (CRDTs) was first
introduced in Marc Shapiro's 2011 paper,
Conflict-Free Replicated Data Types (opens in a new tab).
However, it can be argued that the groundwork for CRDTs was laid earlier, in the
2006 study Woot (opens in a new tab): An Algorithm for
Collaborative Real-time Editing. The primary motivation behind developing CRDTs
was to address the challenges associated with designing conflict resolution
mechanisms for eventual consistency. Prior to the introduction of CRDTs,
literature on the subject offered limited guidance, and ad hoc solutions were
often prone to errors. Consequently, Shapiro's paper presented a simple,
theoretically sound approach to achieving eventual consistency through the use
of CRDTs.
无冲突复制数据类型 (CRDT) 的正式概念首次在 Marc Shapiro 的 2011 年论文《无冲突复制数据类型》中引入。然而,可以说 CRDT 的基础是在 2006 年的研究 Woot:协作实时编辑算法中更早奠定的。开发 CRDT 的主要动机是解决与设计冲突解决机制以实现最终一致性相关的挑战。在引入 CRDT 之前,有关该主题的文献提供的指导有限,而且临时解决方案通常容易出错。因此,Shapiro 的论文提出了一种简单的、理论上合理的方法,通过使用 CRDT 来实现最终一致性。
(PS: Marc Shapiro actually wrote a paper
Designing a commutative replicated data type (opens in a new tab)
in 2007. In 2011, he reworded commutative into conflict-free in 2011, expanding
the definition of commutative to include state-based CRDT)
(PS:Marc Shapiro 实际上在 2007 年写了一篇论文《设计交换复制数据类型》。2011 年,他将交换性改写为无冲突,并在 2011 年将交换性的定义扩展为包括基于状态的 CRDT)
According to CAP theorem (opens in a new tab), it is
impossible for a distributed computing system to perfectly satisfy the following
three points at the same time.
根据CAP定理,分布式计算系统不可能同时完美满足以下三点。
- Consistency: each read receives the result of the most recent write or
reports an error; it behaves as if it is accessing the same piece of data
一致性:每次读都收到最近一次写的结果或者报错;它的行为就好像正在访问同一条数据 - Availability: every request gets a non-error response - but there is no
guarantee that the data fetched is up-to-date
可用性:每个请求都会得到无错误响应 - 但不能保证获取的数据是最新的 - Partition tolerance: the ability of a distributed system to continue
functioning properly even when communication between its different components
is lost or delayed, resulting in a partition or network failure.
分区容错性:即使不同组件之间的通信丢失或延迟,导致分区或网络故障,分布式系统也能继续正常运行。
If the system cannot achieve data consistency within the time limit, it means
that partitioning has occurred and a choice must be made between C and A for the
current operation, so "perfect consistency" is in conflict with "perfect
availability".
如果系统在时限内无法实现数据一致性,则意味着发生了分区,当前操作必须在C和A之间做出选择,因此“完美一致性”与“完美可用性”是冲突的。
CRDTs do not provide "perfect consistency", but
Strong Eventual Consistency (SEC). This
means that site A may not immediately reflect the state changes from site B, but
when A and B synchronize their messages they both regain consistency and do not
need to resolve potential conflicts (CRDT mathematically prevents conflicts from
occurring). Strong Eventual Consistency does not conflict with Availability
and Partition Tolerance. CRDTs provide a good CAP tradeoff.
CRDT 不提供“完美一致性”,而是提供强最终一致性(SEC)。这意味着站点 A 可能不会立即反映站点 B 的状态更改,但是当 A 和 B 同步消息时,它们都会恢复一致性,并且不需要解决潜在的冲突(CRDT 从数学上防止冲突发生)。强最终一致性与可用性和分区容错性并不冲突。 CRDT 提供了良好的 CAP 权衡。
CRDT satisfies A + P + Eventual Consistency; a good tradeoff under CAP
CRDT满足A+P+最终一致性; CAP 下的良好权衡
(PS: In 2012, Eric Brewer, author of the CAP theorem, wrote an article
CAP Twelve Years Later: How the "Rules" Have Changed (opens in a new tab),
explaining that the description of the "two out of three CAP features" is
actually misleading, and that the CAP actually prohibits perfect availability
and consistency in a very small part of the design space, i.e., in the presence
of partitions; in fact, the design of the tradeoff between C and A is very
flexible. A good example is CRDT.)
(PS:2012年,CAP定理的作者Eric Brewer写了一篇文章《CAP十二年后:“规则”如何改变》,解释“CAP特征三分之二”的描述实际上是误导性的,并且CAP 实际上禁止在设计空间的很小一部分中实现完美的可用性和一致性,即在存在分区的情况下;事实上,C 和 A 之间的权衡设计是非常灵活的。)
A simple CRDT case
一个简单的 CRDT 案例
We can use a few simple examples to get a general idea of how CRDTs achieve
Strong Eventual Consistency.
我们可以通过几个简单的例子来大致了解 CRDT 如何实现强最终一致性。
Grow-only Counter 只种植柜台
How can we count the number of times something happens in a distributed system
without locking?
在没有锁的情况下,我们如何计算分布式系统中某件事发生的次数?
- Let each copy increments only its own counter => no locking synchronization &
no conflicts
让每个副本仅递增自己的计数器 => 无锁定同步且无冲突 - Each copy keeps the count values of all other copies at the same time
每个副本同时保留所有其他副本的计数值 - Number of occurrences = sum of count values of all copies
出现次数 = 所有副本计数值之和 - Since each copy only updates its own count and does not conflict with other
counters, this type satisfies consistency after message synchronization
由于每个副本只更新自己的计数,不会与其他计数器冲突,因此该类型满足消息同步后的一致性
Grow-only Set 仅生长套装
- The elements in a Grow-only Set can only be increased and not decreased
Grow-only Set 中的元素只能增加,不能减少 - To merge two such states, you only need to do a merge set
要合并两个这样的状态,只需要做一个合并集 - This type satisfies consistency after message synchronization because there
are no conflicting operations since the elements only grow and do not
decrease.
这种类型满足消息同步后的一致性,因为元素只增长不减少,不存在冲突操作。
Both of these methods are CRDTs, and they both satisfy the following properties
这两种方法都是CRDT,并且都满足以下性质
- They can both be updated independently and concurrently, without coordination
(locking) between replicas
它们都可以独立并发更新,无需副本之间的协调(锁定) - There is no possibility of conflict between multiple updates
多个更新之间不存在冲突的可能性 - Final consistency can always be guaranteed
总能保证最终一致性
Introduction to the Principle
原理介绍
There are two types of CRDTs: Op-based CRDTs and State-based CRDTs. This article
focuses on the concept of Op-based CRDTs.
CRDT 有两种类型:基于 Op 的 CRDT 和基于状态的 CRDT。本文重点讨论基于 Op 的 CRDT 概念。
Op-based CRDTs operate on the principle that if two users perform identical
sequences of operations, the final state of the document should also be
identical. To achieve this, each user saves all the operations performed on the
data (Operations) and synchronizes these Operations with other users to ensure a
consistent final state. A critical challenge in this approach is ensuring the
order of Operations remains consistent, especially when parallel modification
operations occur. To address this, Op-based CRDTs require that all possible
parallel Operations be commutative, satisfying the final consistency
requirement.
基于 Op 的 CRDT 的运行原理是,如果两个用户执行相同的操作序列,则文档的最终状态也应该相同。为了实现这一点,每个用户都会保存对数据执行的所有操作(Operations),并将这些操作与其他用户同步,以确保最终状态一致。这种方法的一个关键挑战是确保操作顺序保持一致,尤其是在发生并行修改操作时。为了解决这个问题,基于 Op 的 CRDT 要求所有可能的并行操作都是可交换的,满足最终的一致性要求。
Comparison of CRDT and OT
CRDT 与 OT 的比较
Both CRDT and Operation Transformation(OT) (opens in a new tab) can be used in online
collaborative applications, with the following differences
CRDT和Operation Transformation(OT)都可以用于在线协作应用,但有以下区别
OT | CRDT |
---|---|
OT relies on a centralized server for collaboration; it is extremely difficult to make it work in a distributed environment (opens in a new tab) OT依靠集中式服务器进行协作;让它在分布式环境中工作是极其困难的 | CRDT algorithm can be used to synchronize data through a P2P approach synchronization 可以使用CRDT算法来通过P2P方式同步数据 |
The earliest paper on OT was presented in 1989 最早关于 OT 的论文发表于 1989 年 | The earliest paper on CRDT appeared in 2006 最早关于CRDT的论文出现于2006年 |
The OT algorithm is designed with higher complexity to ensure consistency OT算法设计复杂度较高,保证一致性 | The CRDT algorithm is designed to be simpler to ensure consistency CRDT算法设计得更简单,保证一致性 |
It is easier to design OT to preserve user intent 更容易设计 OT 来保留用户意图 | It is more difficult to design a CRDT algorithm that preserves user intent 设计保留用户意图的 CRDT 算法更加困难 |
OT does not affect document size OT 不影响文档大小 | CRDT documents are larger than the original document data CRDT文档比原始文档数据大 |