Mo的算法|黑客地球

Introduction 介绍

Mo’s algorithm is a generic idea. It applies to the following class of problems:
Mo的算法是一个通用的想法。它适用于以下类别的问题：

You are given array Arr of length N and Q queries. Each query is represented by two numbers L and R, and it asks you to compute some function Func with subarray Arr[L..R] as its argument.
给定长度为N的数组Arr和Q 个查询。每个查询由两个数字 L 和 R 表示，它要求您计算某个以子数组 Arr[L..R] 作为参数的函数Func 。

For the sake of brevity we will denote Func([L, R]) as the value of Func on subarray Arr[L..R].
为了简洁起见，我们将 Func([L, R]) 表示为子数组 Arr[L..R] 上 Func 的值。
If this sounds too abstract, let’s look at specific example:
如果这听起来太抽象，让我们看一下具体的例子：

There is an integer array Arr of length N and Q queries. For each i, query #i asks you to output the sum of numbers on subarray [L_i, R_i], i.e. Arr[L_i] + Arr[L_i + 1] + … + Arr[R_i].
有一个长度为 N 的整数数组 Arr 和 Q 个查询。对于每个 i，查询 #i 要求您输出子数组 [L, R] 上的数字之和，即 Arr[L] + Arr[L + 1] + … + Arr[R]。

Here we have Func([L, R]) = Arr[L] + Arr[L + 1] + ... + Arr[R].
这里我们有 Func([L, R]) = Arr[L] + Arr[L + 1] + ... + Arr[R]。
This does not sound so scary, does it? You’ve probably heard of solutions to this problem using Segment Trees or Binary Indexed Trees, or even prefix sums.
这听起来并不那么可怕，不是吗？您可能听说过使用线段树或二叉索引树，甚至前缀和来解决此问题。

Mo’s algorithm provides a way to answer all queries in O((N + Q) * sqrt(N) * F) time with at least O(Q) additional memory. Meaning of F is explained below.
Mo 的算法提供了一种在 O((N + Q) * sqrt(N) * F) 时间内回答所有查询的方法，并且至少需要 O(Q) 额外内存。 F的含义解释如下。

The algorithm is applicable if all following conditions are met:
如果满足以下所有条件，则该算法适用：

Arr is not changed by queries;
Arr 不会被查询改变；
All queries are known beforehand (techniques requiring this property are often called “offline algorithms”);
所有查询都是事先已知的（需要此属性的技术通常称为“离线算法”）；
If we know Func([L, R]), then we can compute Func([L + 1, R]), Func([L - 1, R]), Func([L, R + 1]) and Func([L, R - 1]), each in O(F) time.
如果我们知道 Func([L, R])，那么我们可以计算 Func([L + 1, R])、Func([L - 1, R])、Func([L, R + 1]) 和 Func ([L, R - 1])，每个都需要O(F)时间。

Due to constraints on array immutability and queries being known, Mo’s algorithm is inapplicable most of the time. Thus, knowing the method can help you on rare occasions. But. Due to property #3, the algorithm can solve problems that are unsolvable otherwise. If the problem was meant to be solved using Mo’s algorithm, then you can be 90% sure that it can not be accepted without knowing it. Since the approach is not well-known, in situations where technique is appropriate, you will easily overcome majority of competitors.
由于数组不变性和查询已知的限制，Mo 的算法大多数时候不适用。因此，了解该方法在极少数情况下可以为您提供帮助。但。由于属性#3，该算法可以解决其他方式无法解决的问题。如果问题是要使用Mo的算法来解决，那么你可以90%确定它在不知情的情况下无法被接受。由于该方法并不为人所知，因此在技术合适的情况下，您将轻松克服大多数竞争对手。

Basic overview 基本概述

We have Q queries to answer. Suppose we answer them in order they are asked in the following manner:
我们有 Q 个问题需要解答。假设我们按照以下方式回答问题：

for i = 0..Q-1:
  L, R = query #i
  for j = L..R:
    do some work to compute Func([L, R])

This can take Ω(N * Q) time. If N and Q are of order 10⁵, then this would lead to time limit exceeded.
这可能需要 Ω(N * Q) 时间。如果N和Q的数量级为10 ⁵ ，那么这将导致超出时间限制。

But what if we answer queries in different order? Can we do better then?
但是如果我们以不同的顺序回答查询怎么办？那我们能做得更好吗？

Definition #1: 定义#1：
Segment [L, R] is a continuous subarray Arr[L..R], i.e. array formed by elements Arr[L], Arr[L + 1], …, Arr[R]. We call L left endpoint and R right endpoint of segment [L, R]. We say that index i belongs to segment [L, R] if L ≤ i ≤ R.
段[L, R]是一个连续子数组Arr[L..R]，即由元素Arr[L]、Arr[L + 1]、…、Arr[R]组成的数组。我们称线段[L,R]的L为左端点，R为右端点。如果 L ≤ i ≤ R，则我们说索引 i 属于段 [L, R]。

Notation 符号

Throughout this tutorial “^x⁄_y” will mean “integer part of x divided by y”. For instance, ¹⁰⁄₄ = 2, ¹⁵⁄₃ = 5, ²⁷⁄₈ = 3;
在本教程中，“ ^x ⁄ _y ”表示“x 除以 y 的整数部分”。例如， ¹⁰ ⁄ ₄ = ^2、15 ⁄ ₃ = ^5、27 ⁄ ₈ = 3；
By “sqrt(x)” we will mean “largest integer less or equal to square root of x”. For example, sqrt(16) = 4, sqrt(39) = 6;
“sqrt(x)”表示“小于或等于 x 的平方根的最大整数”。例如，sqrt(16) = 4，sqrt(39) = 6；
Suppose a query asks to calculate Func([L, R]). We will denote this query as [L, R] - the same way as the respective argument to Func;
假设查询要求计算 Func([L, R])。我们将此查询表示为 [L, R] - 与 Func 的相应参数相同的方式；
Everything is 0-indexed. 一切都是 0 索引。

We will describe Mo’s algorithm, and then prove its running time.
我们将描述Mo的算法，然后证明其运行时间。

The approach works as follows:
该方法的工作原理如下：

Denote BLOCK_SIZE = sqrt(N);
表示 BLOCK_SIZE = sqrt(N);
Rearrange all queries in a way we will call “Mo’s order”. It is defined like this: [L₁, R₁] comes earlier than [L₂, R₂] in Mo’s order if and only if:
以我们称之为“莫的顺序”的方式重新排列所有查询。它的定义如下：在 Mo 的顺序中，[L ₁ , R ₁ ] 早于 [L ₂ , R ₂ ] 当且仅当：
a) ^L₁⁄_{BLOCK_SIZE} < ^L₂⁄_{BLOCK_SIZE}
a) ^{L ₁} ⁄ _{BLOCK_SIZE} < ^{L ₂} ⁄ _{BLOCK_SIZE}
b) ^L₁⁄_{BLOCK_SIZE} == ^L₂⁄_{BLOCK_SIZE} && R₁ < R₂
b) ^{L ₁} ⁄ _{BLOCK_SIZE} == ^{L ₂} ⁄ _{BLOCK_SIZE} && R ₁ < R ₂
Maintain segment [mo_left, mo_right] for which we know Func([mo_left, mo_right]). Initially, this segment is empty. We set mo_left = 0 and mo_right = -1;
维护段 [mo_left, mo_right]，我们知道 Func([mo_left, mo_right])。最初，该段是空的。我们设置 mo_left = 0 和 mo_right = -1；
Answer all queries following Mo’s order. Suppose the next query you want to answer is [L, R]. Then you perform these steps:
按照莫的命令回答所有问题。假设您要回答的下一个查询是 [L, R]。然后执行以下步骤：
   a) while mo_right is less than R, extend current segment to [mo_left, mo_right + 1];
a) 当mo_right小于R时，将当前段扩展到[mo_left, mo_right + 1]；
   b) while mo_right is greater than R, cut current segment to [mo_left, mo_right - 1];
b) 当mo_right大于R时，将当前段剪切到[mo_left, mo_right - 1]；
   c) while mo_left is greater than L, extend current segment to [mo_left - 1, mo_right];
c) 当mo_left大于L时，将当前段扩展至[mo_left - 1, mo_right]；
   d) while mo_left is less than L, cut current segment to [mo_left + 1, mo_right].
d) 当mo_left小于L时，将当前段剪切到[mo_left + 1, mo_right]。
This will take O( (|left - L| + |right - R|) * F) time, because we required that each extension\deletion is performed in O(F) steps. After all transitions, you will have mo_left = L and mo_right = R, which means that you have successfully computed Func([L, R]).
这将花费 O( (|left - L| + |right - R|) * F) 时间，因为我们要求每个扩展\删除都在 O(F) 步骤中执行。在所有转换之后，您将得到 mo_left = L 和 mo_right = R，这意味着您已成功计算 Func([L, R])。

Time complexity 时间复杂度

Let’s view Arr as a union of disjoint segments of size BLOCK_SIZE, which we will call “blocks”. Take a look at the picture for better understanding:
让我们将 Arr 视为大小为 BLOCK_SIZE 的不相交段的并集，我们将其称为“块”。看一下图片以便更好地理解：
Blocks
Let K be the index of last block. Then there are K + 1 blocks, because we number them from zero. Notice than K’th block can have less than BLOCK_SIZE elements.
令 K 为最后一个块的索引。然后有 K + 1 个块，因为我们从零开始编号。请注意，第 K 个块的元素可以少于 BLOCK_SIZE。

Proposition #1: 提案#1：
K = O(sqrt(N)). K = O(sqrt(N))。
Proof: 证明：
If sqrt(N) * sqrt(N) = N (which may be false due to our definition of sqrt(N)), then K = sqrt(N) - 1, because we have K + 1 blocks, each of size sqrt(N). Otherwise, K = sqrt(N), because we need one additional block to store N - sqrt(N) * sqrt(N) elements.
如果 sqrt(N) * sqrt(N) = N（由于我们对 sqrt(N) 的定义，这可能是错误的），则 K = sqrt(N) - 1，因为我们有 K + 1 个块，每个块的大小为 sqrt （N）。否则，K = sqrt(N)，因为我们需要一个额外的块来存储 N - sqrt(N) * sqrt(N) 个元素。
UPD 19 june 2017: As pointed out in the comments, this does not hold for small values of N. Alternative proof (less intuitive, more formal): we need the smallest K such that K * sqrt(N) ≥ N - when this is true, our space is sufficient to hold all N elements. If we divide by sqrt(N), we get K ≥ N / sqrt(N). sqrt(N) is either real_sqrt(N) (the one without truncating value after decimal point) or real_sqrt(N) - 1. So K ≥ N / (real_sqrt(N) - 1).
UPD 2017 年 6 月 19 日：正如评论中所指出的，这不适用于 N 的小值。替代证明（不太直观，更正式）：我们需要最小的 K 使得 K * sqrt(N) ≥ N - 当这个是的，我们的空间足以容纳所有 N 个元素。如果除以 sqrt(N)，我们会得到 K ≥ N / sqrt(N)。 sqrt(N) 是 real_sqrt(N) （小数点后不截断值的那个）或 real_sqrt(N) - 1。因此 K ≥ N / (real_sqrt(N) - 1)。
So, K = N / (real_sqrt(N) - 1) + 1 suffices. Let's look at the difference between this value and real_sqrt(N). Denote this difference as x: N / (real_sqrt(N) - 1) + 1 - real_sqrt(N) ≥ x. Put everything under common denominator: (N + real_sqrt(N) - 1 - N + real_sqrt(N)) / (real_sqrt(N) - 1) ≥ x. This means that (2 * real_sqrt(N) - 1) / (real_sqrt(N) - 1) ≥ x. As N approaches infinity, the bound on the right approaches 2. So, for very large N, x ≤ 2. Otherwise, it is bounded by a constant (you can find it, for example, by taking the first 1000 values of N and taking maximum value of the bound). So K - real_sqrt(N) ≤ constant, which implies K = O(sqrt(N)).
因此，K = N / (real_sqrt(N) - 1) + 1 就足够了。我们来看看这个值和real_sqrt(N)的区别。将此差异表示为 x：N / (real_sqrt(N) - 1) + 1 - real_sqrt(N) ≥ x。将所有内容放在公分母下：(N + real_sqrt(N) - 1 - N + real_sqrt(N)) / (real_sqrt(N) - 1) ≥ x。这意味着 (2 * real_sqrt(N) - 1) / (real_sqrt(N) - 1) ≥ x。当 N 接近无穷大时，右侧的界限接近 2。因此，对于非常大的 N，x ≤ 2。否则，它的界限是一个常数（例如，您可以通过取 N 的前 1000 个值和取边界的最大值）。所以 K - real_sqrt(N) ≤ 常量，这意味着 K = O(sqrt(N))。

Proposition #2: 提案#2：
Block #r is a segment [r * BLOCK_SIZE, min(N - 1, (r + 1) * BLOCK_SIZE - 1)].
块 #r 是一个段 [r * BLOCK_SIZE, min(N - 1, (r + 1) * BLOCK_SIZE - 1)]。
Proof (by induction): 证明（通过归纳法）：
For block #0 statement is true — it is a segment [0, BLOCK_SIZE - 1], containing BLOCK_SIZE elements.
For block #0 语句为 true — 它是一个段 [0, BLOCK_SIZE - 1]，包含 BLOCK_SIZE 元素。
Suppose first T ≤ K blocks satisfy the above statement. Then the last of those blocks is a segment [(T - 1) * BLOCK_SIZE, T * BLOCK_SIZE - 1].
假设前 T ≤ K 块满足上述语句。那么最后一个块是一个段 [(T - 1) * BLOCK_SIZE, T * BLOCK_SIZE - 1]。
Then we form the next, T+1’th block. First element of this block will have array index T * BLOCK_SIZE. We have at most BLOCK_SIZE elements to add to block (there may be less if T + 1 = K + 1). So the last index in T+1’th block will be min(N - 1, (T + 1) * BLOCK_SIZE - 1).
然后我们形成下一个，第 T+1 个区块。该块的第一个元素将具有数组索引 T * BLOCK_SIZE。我们最多有 BLOCK_SIZE 个元素要添加到块中（如果 T + 1 = K + 1，则可能会更少）。因此第 T+1 个块中的最后一个索引将为 min(N - 1, (T + 1) * BLOCK_SIZE - 1)。

Corollary #1: Two indices i and j belong to same block #r if and only if ⁱ⁄_{BLOCK_SIZE} = ^j⁄_{BLOCK_SIZE} = r.
推论 #1：当且仅当 ⁄ _{BLOCK_SIZE} = ⁄ _{BLOCK_SIZE} = r 时，两个索引 i 和 j 属于同一块 #r。

Definition #2: 定义#2：
Q_r = { query [L, R] | ^L⁄_{BLOCK_SIZE} = r }. Informally, Q_r is a set of queries from input, whose left endpoints belong to block #r. Notice that this set may be empty. We denote the size of Q_r as |Q_r|.
Q _r = { 查询 [L, R] | ^L ⁄ _{BLOCK_SIZE} = r }。非正式地，Q _r是来自输入的一组查询，其左端点属于块#r。请注意，该集合可能是空的。我们将 Q _r的大小表示为 |Q _r |。

Proposition #3: 提案#3：
For each r, queries from Q_r lie continuously in Mo’s order and they appear in it in non-decreasing order of right endpoints.
对于每个 r，来自 Q _r的查询连续地位于 Mo 的顺序中，并且它们以右端点的非递减顺序出现在其中。
Queries from Q_r come earlier than queries from Q_{r + 1} for every r = 0..K-1.
对于每个 r = 0..K-1，来自 Q _r的查询早于来自 Q _{r + 1}的查询。
Proof follows from definition of Mo’s order.
证明来自莫阶的定义。

Corollary #2: When we are processing queries following Mo’s order, we firstly process all queries from Q₀, then all queries from Q₁, and so on, up to Q_K.
推论 #2：当我们按照 Mo 的顺序处理查询时，我们首先处理来自 Q ₀的所有查询，然后处理来自 Q _{1 的}所有查询，依此类推，直到 Q _K 。

Theorem #1: 定理#1：

Mo_right changes its value O(N * sqrt(N)) times throughout the run of Mo’s algorithm.
Mo_right 在 Mo 算法的运行过程中更改其值 O(N * sqrt(N)) 次。
Proof: 证明：

Suppose we’ve just started processing queries from Q_r for some r, and we’ve already answered first (following Mo’s order) query from it. Let that query be [L, R₀]. This means that now mo_right = R₀.
假设我们刚刚开始处理来自 Q _r的某个 r 的查询，并且我们已经回答了其中的第一个查询（按照 Mo 的顺序）。令该查询为 [L, R ₀ ]。这意味着现在 mo_right = R ₀ 。
Let R₀, R₁, ..., R_{|Q_r| - 1} be right endpoints of queries from Q_r, in order they appear in Mo’s order. From proposition #3 we know that
令 R ₀ , R ₁ , ..., R _{|Q _r | - 1}是来自 Q _r的查询的右端点，按它们出现的顺序排列按照莫的命令。从命题#3我们知道
   R₀ ≤ R₁ ≤ R₂ ≤ … ≤ R_{|Q_r| - 1}
R ₀ ≤ R ₁ ≤ R ₂ ≤ … ≤ R _{|Q _r | - 1}

From proposition #2 and definition of Q_r we know that
从命题 #2 和 Q _r的定义我们知道
   r * BLOCK_SIZE ≤ L
Since right endpoint is not lower that left endpoint (i.e. L ≤ R), we conclude that
由于右端点不低于左端点（即 L ≤ R），我们得出结论：
   r * BLOCK_SIZE ≤ R₀
r * 块大小 ≤ R ₀
We have N elements in Arr, so any query has right endpoint less than N. Therefore,
我们在 Arr 中有 N 个元素，因此任何查询的右端点都小于 N。因此，
   R_{|Q_r| - 1} ≤ N - 1
R _{|Q _r | - 1} ≤ N - 1
Since R’s are not decreasing, the total amount of times mo_right changes is
由于 R 没有减少，因此 mo_right 变化的总次数为
   R_{|Q_r| - 1} - R₀ ≤ N - 1 - r * BLOCK_SIZE = O(N)
R _{|Q _r | - 1} - R ₀ ≤ N - 1 - r * BLOCK_SIZE = O(N)
(we’ve substituted R_{|Q_r| - 1} with its highest possible value, and R₀ with its lowest possible value to maximize the subtraction).
（我们将 R _{|Q _r | - 1}替换为其可能的最高值，并将 R ₀替换为其最小的可能值以最大化减法）。
There are O(sqrt(N)) different values of r (proposition #1), so in total we will have O(N * sqrt(N)) changes, assuming that we’ve already started from the first query of each Q_r.
r 有 O(sqrt(N)) 个不同的值（命题#1），所以总共我们将有 O(N * sqrt(N)) 变化，假设我们已经从每个的第一个查询开始 Q _r 。
Now suppose we’ve just ended processing queries from Q_r and we must process first query from Q_{r + 1} (assuming that it is not empty. If it is, then choose next non-empty set). Currently, mo_right is constrained to be:
现在假设我们刚刚结束处理来自 Q _{r 的}查询，并且必须处理来自 Q _{r + 1}的第一个查询（假设它不为空。如果是，则选择下一个非空集合）。目前，mo_right 被限制为：
   r * BLOCK_SIZE ≤ mo_right ≤ N - 1
Let the first query from Q_{r + 1} be [L', R']. We know (similarly to previous paragraph) that
令 Q _{r + 1}中的第一个查询为 [L', R']。我们知道（与上一段类似）
   (r + 1) * BLOCK_SIZE ≤ R' ≤ N - 1
(r + 1) * 块大小 ≤ R' ≤ N - 1
Hence, mo_right must be changed at most
因此，mo_right最多必须改变
   max(|r * BLOCK SIZE - (N - 1)|, N - 1 - (r + 1) * BLOCK SIZE)
max(|r * 块大小 - (N - 1)|, N - 1 - (r + 1) * 块大小)
times (we took lowest value of mo_right with highest value of R’ and vice-versa). This is clearly O(N) (and it does not matter whether it is r+1’th set of r+k’th for some k > 1).
次（我们采用 mo_right 的最低值和 R' 的最高值，反之亦然）。这很明显 O(N)（并且对于某些 k > 1 是否是第 r+k' 组的第 r+1 组并不重要）。
There are O(sqrt(N)) switches from r to r + 1 (and it's true even if we skip some empty sets), so in total we will have O(N * sqrt(N)) mo_right changes to do this.
从 r 到 r + 1 有 O(sqrt(N)) 次切换（即使我们跳过一些空集也是如此），所以总共我们将有 O(N * sqrt(N)) mo_right 进行更改以执行此操作。

There are no more cases when mo_right changes. Overall, we have O(N * sqrt(N)) + O(N * sqrt(N)) = O(N * sqrt(N)) changes of mo_right.
mo_right 发生变化的情况不再发生。总的来说，我们对 mo_right 进行了 O(N * sqrt(N)) + O(N * sqrt(N)) = O(N * sqrt(N)) 更改。

Corollary #3: All mo_right changes combined take O(N * sqrt(N) * F) time (because each change is done in O(F) time).
推论 #3：所有 mo_right 更改组合起来需要 O(N * sqrt(N) * F) 时间（因为每个更改都是在 O(F) 时间内完成的）。

Theorem #2: 定理#2：

Mo_left changes its value O(Q * sqrt(N)) times throughout the run of Mo’s algorithm.
Mo_left 在 Mo 算法的运行过程中改变其值 O(Q * sqrt(N)) 次。
Proof: 证明：

Suppose, as in the proof of Theorem #1, we’ve just started processing queries from Q_r for some r, and currently mo_left = L, mo_right = R₀, where [L, R₀] ∈ Q_r. For every query [L', R'] ∈ Q_r by definition #2 and proposition #2 we have:
假设，如定理#1 的证明，我们刚刚开始处理来自 Q _r的某个 r 的查询，当前 mo_left = L, mo_right = R ₀ ，其中 [L, R ₀ ] ∈ Q _r 。对于每个查询 [L', R'] ∈ Q _r根据定义 #2 和命题 #2，我们有：
   r * BLOCK_SIZE ≤ L' ≤ (r + 1) * BLOCK_SIZE - 1
r * BLOCK_SIZE ≤ L' ≤ (r + 1) * BLOCK_SIZE - 1
So, when we change mo_left from one query to other (both queries ∈ Q_r), we must do at most
因此，当我们将 mo_left 从一个查询更改为另一个查询时（两个查询 ∈ Q _r ），我们最多必须做
   (r + 1) * BLOCK_SIZE - 1 - r * BLOCK_SIZE = BLOCK_SIZE - 1 = O(sqrt(N))
(r + 1) * 块大小 - 1 - r * 块大小 = 块大小 - 1 = O(sqrt(N))
changes to mo_left. At the picture below we represented leftpoints that lie in block #r, where the subscript shows relative order of queries:
更改为 mo_left。在下图中，我们表示位于#r块中的左点，其中下标显示查询的相对顺序：
There are |Q_r| queries in Q_r. That means, that we can estimate upper bound on number of changes for single r as O(|Q_r| * sqrt(N)). Let’s sum it over all r:
有 |Q _r | Q _r中的查询。这意味着，我们可以估计上界单个 r 的变化次数为 O(|Q _r | * sqrt(N))。让我们对所有 r 进行总结：
   O(sqrt(N) * (|Q₀| + |Q₁| + ... + |Q_K|)) = O(sqrt(N) * Q)
O(sqrt(N) * (|Q ₀ | + |Q ₁ | + ... + |Q _K |)) = O(sqrt(N) * Q)
(because each query's leftpoint belongs to exactly one block, and we have Q queries in total).
（因为每个查询的左点恰好属于一个块，并且我们总共有 Q 个查询）。
Now suppose we’re done with queries from Q_r and want to process queries from non-empty set Q_{r + k} (for some k > 0). Any query [L', R'] ∈ Q_{r + k} has
现在假设我们已经完成了来自 Q _r的查询，并且想要处理来自非空集合 Q _{r + k}的查询（对于某些 k > 0）。任何查询 [L', R'] ∈ Q _{r + k}有
   (r + k) * BLOCK_SIZE ≤ L' ≤ (r + k + 1) * BLOCK_SIZE - 1
(r + k) * 块大小 ≤ L' ≤ (r + k + 1) * 块大小 - 1
Similarly, any query [L, R] ∈ Q_r has
类似地，任何查询 [L, R] ∈ Q _r有
   r * BLOCK_SIZE ≤ L ≤ (r + 1) * BLOCK_SIZE - 1
r * 块大小 ≤ L ≤ (r + 1) * 块大小 - 1
We can see that the maximum number of changes needed to transit from some query from Q_r to some query from Q_{r + k} is
我们可以看到，从 Q _{r 的}某个查询中传输所需的最大更改数来自 Q _{r + k}的某个查询是
   (r + k + 1) * BLOCK_SIZE - 1 - r * BLOCK_SIZE = k * BLOCK_SIZE - 1
(r + k + 1) * 块大小 - 1 - r * 块大小 = k * 块大小 - 1
Now if we sum this over all r, we will get at most
现在，如果我们对所有 r 求和，我们最多会得到
   K * BLOCK_SIZE = O(sqrt(N)) * O(sqrt(N)) = O(N),
K * 块大小 = O(sqrt(N)) * O(sqrt(N)) = O(N),
because sum of all possible k’s does not exceed K.
因为所有可能的 k 的总和不超过 K。

There are no more cases when mo_left changes. Overall, we have O(sqrt(N) * Q) + O(N) = O(sqrt(N) * Q) changes of mo_left.
mo_left 发生变化的情况不再发生。总的来说，我们有 O(sqrt(N) * Q) + O(N) = O(sqrt(N) * Q) 的 mo_left 变化。

Corollary #4: All mo_left changes combined take O(Q * sqrt(N) * F) time (because each change is done in O(F) time).
推论 #4：所有 mo_left 更改组合起来需要 O(Q * sqrt(N) * F) 时间（因为每个更改都是在 O(F) 时间内完成的）。

Corollary #5: Time complexity of Mo’s algorithm is O((N + Q) * sqrt(N) * F).
推论 #5： Mo 算法的时间复杂度为 O((N + Q) * sqrt(N) * F)。

Example problem 示例问题

Let’s look at an example problem (idea taken from here):
让我们看一个示例问题（想法取自此处）：

You have an array Arr of N numbers ranging from 0 to 99. Also you have Q queries [L, R]. For each such query you must print
您有一个由 N 个数字组成的数组 Arr，范围从 0 到 99。此外您还有 Q 查询 [L, R]。对于每个此类查询，您必须打印
V([L, R]) = ∑_i=0..99 count(i)² * i
V([L, R]) = Σ _i=0..99计数(i) ² * i
where count(i) is the number of times i occurs in Arr[L..R].
其中 count(i) 是 i 出现的次数到达[L..R]。

Constraints are N ≤ 10⁵, Q ≤ 10⁵.
约束条件为 N ≤ 10 ⁵ 、Q ≤ 10 ⁵ 。
To apply Mo’s algorithm, you must ensure of three properties:
要应用 Mo 算法，必须确保三个属性：

Arr is not modified by queries;
Arr 不会被查询修改；
Queries are known beforehand;
查询是预先知道的；
If you know V([L, R]), then you can compute V([L + 1, R]), V([L - 1, R]), V([L, R - 1]) and V([L, R + 1]), each in O(F) time.
如果您知道 V([L, R])，则可以计算 V([L + 1, R])、V([L - 1, R])、V([L, R - 1]) 和 V ([L, R + 1])，每个都需要 O(F) 时间。

First two properties obviously hold. The third property depends on the time bound — O(F).
Surely, we can compute V([L + 1, R]) from scratch in Ω(R - L) = Ω(N) in the worst case. But looking at complexity of Mo’s algorithm, you can deduce that this will surely time out, because we multiply O(F) with O((Q + N) * sqrt(N)).
Typically, you want O(F) to be O(1) or O(log(n)). Choosing a way to achieve appropriate time bound O(F) is programmer’s main concern when solving problems using Mo’s algorithm.

I will describe a way to achieve O(F) = O(1) for this problem.
Let’s maintain variable current_answer (initialized by zero) to store V([mo_left, mo_right]) and integer array cnt of size 100, where cnt[x] will be the number of times x occurs in [mo_left, mo_right].
From the definition of current_answer we can see that
current answer = V([mo_left, mo_right]) =∑_i=0..99 count'(i)² * i
where count’(x) is the number of times x occurs in [mo_left, mo_right].
By the definition of cnt we see that count’(x) = cnt[x], so
current answer = ∑_i=0..99 cnt[i]² * i.

Suppose we want to change mo_right to mo_right + 1. It means we want to add number p = Arr[mo_right + 1] into consideration. But we also want to retain current_answer and cnt’s properties.
Maintaining cnt is easy: just increase cnt[p] by 1. It is a bit trickier to deal with current_answer.
We know (from the definitions) that current_answer contains summand cnt[p]² * p before addition. Let’s subtract this value from the answer. Then, after we perform addition to cnt, add again cnt[p]² * p to the answer (make no mistake: this time it will contain updated value of cnt[p]). Both updates take O(1) time.
All other transitions (mo_left to mo_left + 1, mo_left to mo_left - 1 and mo_right to mo_right - 1) can be done in the same way, so we have O(F) = O(1). You can refer to the code below for clarity.

Now let’s look into detail on one sample test case:
Input:
Arr = [0, 1, 1, 0, 2, 3, 4, 1, 3, 5, 1, 5, 3, 5, 4, 0, 2, 2] of 18 elements
Queries (0-indexed): [0, 8], [2, 5], [2, 11], [16, 17], [13, 14], [1, 17], [17, 17]

The algorithm works as follows:
Firstly, set BLOCK_SIZE = sqrt(18) = 4. Notice that we have 5 blocks: [0, 3], [4, 7], [8, 11], [12, 15], [16, 17]. The last block contains less than BLOCK_SIZE elements.
Then, set mo_left = 0, mo_right = -1, current_answer = 0, cnt = [0, 0, 0, 0, 0, 0] (I will use only first 6 elements out of 100 for the sake of simplicity).

Then sort queries. The Mo’s order will be:
[2,5], [0, 8], [2, 11], [1, 17] (here ends Q₀) [13, 14] (here ends Q₃) [16, 17], [17, 17] (here ends Q₄).

Now, when everything is set up, we can answer queries:

We need to process query [2, 5]. Currently, our segment is [0, -1]. So we need to move mo_right to 5 and mo_left to 2.
Let’s move mo_right first:
mo_right = 0, current_answer = 0, cnt = [1, 0, 0, 0, 0, 0]
mo_right = 1, current_answer = 1, cnt = [1, 1, 0, 0, 0, 0]
mo_right = 2, current_answer = 4, cnt = [1, 2, 0, 0, 0, 0]
mo_right = 3, current_answer = 4, cnt = [2, 2, 0, 0, 0, 0]
mo_right = 4, current_answer = 6, cnt = [2, 2, 1, 0, 0, 0]
mo_right = 5, current_answer = 9, cnt = [2, 2, 1, 1, 0, 0]
Now we must move mo_left:
mo_left = 1, current_answer = 9, cnt = [1, 2, 1, 1, 0, 0]
mo_left = 2, current_answer = 6, cnt = [1, 1, 1, 1, 0, 0]
Thus, the answer for query [2, 5] is 6.
Our next query is [0, 8]. Current segment [mo_left, mo_right] is [2, 5]. We need to move mo_right to 8 and mo_left to 0.
Again, let’s move mo_right first:
mo_right = 6, current_answer = 10, cnt = [1, 1, 1, 1, 1, 0]
mo_right = 7, current_answer = 13, cnt = [1, 2, 1, 1, 1, 0]
mo_right = 8, current_answer = 22, cnt = [1, 2, 1, 2, 1, 0]
Then we move mo_left:
mo_left = 1, current_answer = 27, cnt = [1, 3, 1, 2, 1, 0]
mo_left = 0, current_answer = 27, cnt = [2, 3, 1, 2, 1, 0]
So, the answer for query [0, 8] is 27.
Next query is [2, 11]. Current segment is [0, 8]. We need to move mo_right to 11 and mo_left to 2.
mo_right = 9, current_answer = 32, cnt = [2, 3, 1, 2, 1, 1]
mo_right = 10, current_answer = 39, cnt = [2, 4, 1, 2, 1, 1]
mo_right = 11, current_answer = 54, cnt = [2, 4, 1, 2, 1, 2]
mo_left = 1, current_answer = 54, cnt = [1, 4, 1, 2, 1, 2]
mo_left = 2, current_answer = 47, cnt = [1, 3, 1, 2, 1, 2]
Answer for query [2, 11] is 47.
Next query is [1, 17]. Current segment is [2, 11]. We need to move mo_right to 17 and mo_left to 1.
mo_right = 12, current_answer = 62, cnt = [1, 3, 1, 3, 1, 2]
mo_right = 13, current_answer = 87, cnt = [1, 3, 1, 3, 1, 3]
mo_right = 14, current_answer = 99, cnt = [1, 3, 1, 3, 2, 3]
mo_right = 15, current_answer = 99, cnt = [2, 3, 1, 3, 2, 3]
mo_right = 16, current_answer = 105, cnt = [2, 3, 2, 3, 2, 3]
mo_right = 17, current_answer = 115, cnt = [2, 3, 3, 3, 2, 3]
mo_left = 1, current_answer = 122, cnt = [2, 4, 3, 3, 2, 3]
Answer for query [1, 17] is 122.
Our next goal is query [13, 14]. Notice that it starts in different block from the previous query [1, 17]. Consequently, this is the first time mo_right will move to the left. We need to move mo_left to 13 and mo_right to 14.
mo_right = 16, current_answer = 112, cnt = [2, 4, 2, 3, 2, 3]
mo_right = 15, current_answer = 106, cnt = [2, 4, 1, 3, 2, 3]
mo_right = 14, current_answer = 106, cnt = [1, 4, 1, 3, 2, 3]
mo_left = 2, current_answer = 99, cnt = [1, 3, 1, 3, 2, 3]
mo_left = 3, current_answer = 94, cnt = [1, 2, 1, 3, 2, 3]
mo_left = 4, current_answer = 94, cnt = [0, 2, 1, 3, 2, 3]
mo_left = 5, current_answer = 92, cnt = [0, 2, 0, 3, 2, 3]
mo_left = 6, current_answer = 77, cnt = [0, 2, 0, 2, 2, 3]
mo_left = 7, current_answer = 65, cnt = [0, 2, 0, 2, 1, 3]
mo_left = 8, current_answer = 62, cnt = [0, 1, 0, 2, 1, 3]
mo_left = 9, current_answer = 53, cnt = [0, 1, 0, 1, 1, 3]
mo_left = 10, current_answer = 28, cnt = [0, 1, 0, 1, 1, 2]
mo_left = 11, current_answer = 27, cnt = [0, 0, 0, 1, 1, 2]
mo_left = 12, current_answer = 12, cnt = [0, 0, 0, 1, 1, 1]
mo_left = 13, current_answer = 9, cnt = [0, 0, 0, 0, 1, 1]
Answer for query [13,14] is 9.
Next query is [16, 17]. Notice, however, that now we do not need to move mo_right to the left. We need to move mo_left to 16 and mo_right to 17.
mo_right = 15, current_answer = 9, cnt = [1, 0, 0, 0, 1, 1]
mo_right = 16, current_answer = 11, cnt = [1, 0, 1, 0, 1, 1]
mo_right = 17, current_answer = 17, cnt = [1, 0, 2, 0, 1, 1]
mo_left = 14, current_answer = 12, cnt = [1, 0, 2, 0, 1, 0]
mo_left = 15, current_answer = 8, cnt = [1, 0, 2, 0, 0, 0]
mo_left = 16, current_answer = 8, cnt = [0, 0, 2, 0, 0, 0]
Answer for [16, 17] is 8.
The last query is [17, 17]. It requires us to move mo_left one unit to the right.
mo_left = 17, current_answer = 2, cnt = [0, 0, 1, 0, 0, 0]
Answer for this query is 2.

Now the important part comes: we must output answers not in order we’ve achieved them, but in order they were asked.

Output:
27
6
47
8
9
122
2

Implementation

Here is the C++ implementation for the above problem:

#include <bits/stdc++.h>
using namespace std;

int N, Q;

// Variables, that hold current "state" of computation
long long current_answer;
long long cnt[100];

// Array to store answers (because the order we achieve them is messed up)
long long answers[100500];
int BLOCK_SIZE;
int arr[100500];

// We will represent each query as three numbers: L, R, idx. Idx is 
// the position (in original order) of this query.
pair< pair<int, int>, int> queries[100500];


// Essential part of Mo's algorithm: comparator, which we will
// use with std::sort. It is a function, which must return True
// if query x must come earlier than query y, and False otherwise.
inline bool mo_cmp(const pair< pair<int, int>, int> &x,
        const pair< pair<int, int>, int> &y)
{
    int block_x = x.first.first / BLOCK_SIZE;
    int block_y = y.first.first / BLOCK_SIZE;
    if(block_x != block_y)
        return block_x < block_y;
    return x.first.second < y.first.second;
}

// When adding a number, we first nullify it's effect on current
// answer, then update cnt array, then account for it's effect again.
inline void add(int x)
{
    current_answer -= cnt[x] * cnt[x] * x;
    cnt[x]++;
    current_answer += cnt[x] * cnt[x] * x;
}

// Removing is much like adding.
inline void remove(int x)
{
    current_answer -= cnt[x] * cnt[x] * x;
    cnt[x]--;
    current_answer += cnt[x] * cnt[x] * x;
}

int main()
{
    cin.sync_with_stdio(false);
    cin >> N >> Q;
    BLOCK_SIZE = static_cast<int>(sqrt(N));

    // Read input array
    for(int i = 0; i < N; i++)
        cin >> arr[i];

    // Read input queries, which are 0-indexed. Store each query's 
    // original position. We will use it when printing answer.
    for(int i = 0; i < Q; i++) {
        cin >> queries[i].first.first >> queries[i].first.second;
        queries[i].second = i;
    }

    // Sort queries using Mo's special comparator we defined.
    sort(queries, queries + Q, mo_cmp);

    // Set up current segment [mo_left, mo_right].
    int mo_left = 0, mo_right = -1;

    for(int i = 0; i < Q; i++) {
        // [left, right] is what query we must answer now.
        int left = queries[i].first.first;
        int right = queries[i].first.second;

        // Usual part of applying Mo's algorithm: moving mo_left
        // and mo_right.
        while(mo_right < right) {
            mo_right++;
            add(arr[mo_right]);
        }
        while(mo_right > right) {
            remove(arr[mo_right]);
            mo_right--;
        }

        while(mo_left < left) {
            remove(arr[mo_left]);
            mo_left++;
        }
        while(mo_left > left) {
            mo_left--;
            add(arr[mo_left]);
        }

        // Store the answer into required position.
        answers[queries[i].second] = current_answer;
    }

    // We output answers *after* we process all queries.
    for(int i = 0; i < Q; i++)
        cout << answers[i] << "\n";
    return 0;
}

Same solution without global variables (the way I like to implement it):

#include <bits/stdc++.h>
using std::vector;
using std::tuple;

/*
 * Take out adding\removing logic into separate class.
 * It provides functions to add and remove numbers from
 * our structure, while maintaining cnt[] and current_answer.
 * 
 */
struct Mo
{
    static constexpr int MAX_VALUE = 1005000;
    vector<long long> cnt;
    long long current_answer;

    void process(int number, int delta)
    {
        current_answer -= cnt[number] * cnt[number] * number;
        cnt[number] += delta;
        current_answer += cnt[number] * cnt[number] * number;
    }
public:
    Mo()
    {
        cnt = vector<long long>(MAX_VALUE, 0);
        current_answer = 0;
    }

    long long get_answer() const
    {
        return current_answer;
    }

    void add(int number)
    {
        process(number, 1);
    }

    void remove(int number)
    {
        process(number, -1);
    }
};

int main()
{
    int N, Q, BLOCK_SIZE;
    std::cin.sync_with_stdio(false);
    std::cin >> N >> Q;
    BLOCK_SIZE = static_cast<int>(sqrt(N));

    // No global variables, everything inside.
    vector<int> arr(N);
    vector<long long> answers(Q);
    vector< tuple<int, int, int> > queries;
    queries.reserve(Q);

    for(int i = 0; i < N; i++)
        std::cin >> arr[i];

    for(int i = 0; i < Q; i++) {
        int le, rg;
        std::cin >> le >> rg;
        queries.emplace_back(le, rg, i);
    }

    // Comparator as a lambda-function, which catches BLOCK_SIZE
    // from the above definition.
    auto mo_cmp = [BLOCK_SIZE](const tuple<int, int, int> &x,
            const tuple<int, int, int> &y) -> bool {
        int block_x = std::get<0>(x) / BLOCK_SIZE;
        int block_y = std::get<0>(y) / BLOCK_SIZE;
        if(block_x != block_y)
            return block_x < block_y;
        return std::get<1>(x) < std::get<1>(y);
    };

    std::sort(queries.begin(), queries.end(), mo_cmp);

    Mo solver;
    int mo_left = 0, mo_right = -1;

    // Usual Mo's algorithm stuff.
    for(const auto &q: queries) {
        int left, right, answer_idx;
        std::tie(left, right, answer_idx) = q;

        while(mo_right < right) {
            mo_right++;
            solver.add(arr[mo_right]);
        }
        while(mo_right > right) {
            solver.remove(arr[mo_right]);
            mo_right--;
        }

        while(mo_left < left) {
            solver.remove(arr[mo_left]);
            mo_left++;
        }
        while(mo_left > left) {
            mo_left--;
            solver.add(arr[mo_left]);
        }

        answers[answer_idx] = solver.get_answer();
    }

    for(int i = 0; i < Q; i++)
        std::cout << answers[i] << "\n";
    return 0;
}

Practice problems

In case you want to try out implementing Mo’s technique for yourself, check out these problems:

Kriti and her birthday gift
Difficulty: easy.
Prerequisites: string hashing.
Comment: tests for this problem are flawed (you can get at most 33 out of 100 for this problem), because both setter’s and tester’s implementations from the editorial have the same bug in hashing function. Can you see what it is?
Reference implementation: here.
SUBSTRINGS COUNT
Difficulty: easy.
Prerequisites: string hashing.
Comment: almost the same as previous problem, but this one asks slightly different question and has well-formed tests.
Reference implementation: here.
Sherlock and inversions
Difficulty: medium.
Prerequisites: segment tree\binary indexed tree, coordinate compression.
Comment: nice and clean problem on Mo’s algorithm. Might seem difficult for people not familiar with prerequisites.
Reference implementation: here.