This lecture and the next one are concerned with the representation of the message emitted by an information source, as illustrated in Figure 1.1 of Lecture 1. Suppose that the message emitted by an information source is among a finite set S={a_(1),a_(2),dots,a_(|delta|)}\mathcal{S}=\left\{a_{1}, a_{2}, \ldots, a_{|\delta|}\right\}. Then, if one wishes to represent the message using a string of binary digits, i.e., bits, in a unambiguous way, clearly the length of such a string should be at least |~log_(2)|delta|~|\left\lceil\log _{2}|\delta|\right\rceil. 本讲座和下一场讲座关注信息源发出的消息的表示,如第 1 讲的图 1.1 所示。假设信息源发出的消息位于 finite set S={a_(1),a_(2),dots,a_(|delta|)}\mathcal{S}=\left\{a_{1}, a_{2}, \ldots, a_{|\delta|}\right\} 中。然后,如果希望使用二进制数字字符串(即比特)以明确的方式表示消息,显然这种字符串的长度至少应为 |~log_(2)|delta|~|\left\lceil\log _{2}|\delta|\right\rceil 。
But can we use a shorter string for this task? The answer would be affirmative, if we allow some ambiguity in the representation. For example, consider S={1,2,3,4,5,6,7,8}\mathcal{S}=\{1,2,3,4,5,6,7,8\}, and suppose that the destination who wants to reproduce the message does not care as long as the reproduced message does not differ from the source message by more than one. Then, we may safely represent any of 1,2,31,2,3 by 2 , any of 4,5,64,5,6 by 5 , and any of 7,8 by 8 . So a binary string of length |~log_(2)3~|=2\left\lceil\log _{2} 3\right\rceil=2 suffices, rather than |~log_(2)8~|=3\left\lceil\log _{2} 8\right\rceil=3 if no ambiguity is tolerated. 但是我们可以使用更短的字符串来完成这个任务吗?如果我们允许表示中存在一些歧义,答案将是肯定的。例如,请考虑 S={1,2,3,4,5,6,7,8}\mathcal{S}=\{1,2,3,4,5,6,7,8\} ,并假设只要复制的消息与源消息不相差 1 多个,想要复制消息的目标就不关心。然后,我们可以安全地用 2 表示 、 5 中的任何 4,5,64,5,6 一个 1,2,31,2,3 和 7,8 的任意一个 8 。因此,长度 |~log_(2)3~|=2\left\lceil\log _{2} 3\right\rceil=2 的二进制字符串就足够了,而不是 |~log_(2)8~|=3\left\lceil\log _{2} 8\right\rceil=3 在不能容忍歧义的情况下。
From the simple example above, we clearly see that there exists a tradeoff between the amount of resource (e.g., the length of binary string) for representing a source message and the quality (e.g., the degree of ambiguity) of the reproduced message. The tradeoff will become even more interesting if the probabilistic nature of source is taken into consideration. The rate-distortion theory characterizes the fundamental limit of the tradeoff, and is the subject of this lecture. Its extreme case, where the distortion is zero or almost zero, deserves special treatment, and will be investigated in the next lecture. 从上面的简单例子中,我们清楚地看到,在表示源消息的资源量(例如,二进制字符串的长度)和再现消息的质量(例如,歧义程度)之间存在权衡。如果考虑到 source 的概率性质,这种权衡将变得更加有趣。速率失真理论描述了权衡的基本极限,也是本讲座的主题。它的极端情况,即失真为零或几乎为零,值得特别处理,我们将在下一讲中进行研究。
4.1 Problem Formulation 4.1 问题制定
In the general communication system model in Figure 1.1 of Lecture 1, we model an information source as a probabilistic device generating a stochastic process S_(1),S_(2),dotsS_{1}, S_{2}, \ldots. A message then corresponds to a segment of the stochastic process of a prescribed length nn, i.e., S_=[S_(1),S_(2),dots,S_(n)]\underline{S}=\left[S_{1}, S_{2}, \ldots, S_{n}\right]. 在第 1 讲图 1.1 中的通用通信系统模型中,我们将信息源建模为生成随机过程 S_(1),S_(2),dotsS_{1}, S_{2}, \ldots 的概率设备。然后,一条消息对应于规定长度 nn 的随机过程的一段,即 S_=[S_(1),S_(2),dots,S_(n)]\underline{S}=\left[S_{1}, S_{2}, \ldots, S_{n}\right] 。
In this lecture, we focus on the case where the information source is a discrete memoryless source (DMS); that is, S_(1),S_(2),dotsS_{1}, S_{2}, \ldots are a sequence of i.i.d. random variables, each with pmfP_(S)(s)\operatorname{pmf} P_{S}(s) and alphabet S\mathcal{S}. 在本次讲座中,我们重点介绍信息源是离散无记忆源 (DMS) 的情况;也就是说, S_(1),S_(2),dotsS_{1}, S_{2}, \ldots 是一系列 I.I.D. 随机变量,每个变量都带有 pmfP_(S)(s)\operatorname{pmf} P_{S}(s) 和 字母表 S\mathcal{S} 。
4.1 Problem Formulation … 43 4.1 问题表述 ...43
4.2 Shannon’s Fundamen-tal Theorem for SourceCoding . . . . . . . . 46 4.2 SourceCoding 的 Shannon's Fundamen-tal 定理 . . . .46
4.3 Proof of the Converse Part … 53 4.3 反话部分的证明 ...53
4.4 Proof of the Achiev- ability Part … 55 4.4 可实现性部分的证明 ...55
4.4.1 Generation of Code- book … 56 4.4.1 生成代码簿 ...56
4.4.2 Encoding … 56 4.4.2 编码 ...56
4.4.3 Analysis of Expected Distortion … 58 4.4.3 预期失真分析58
4.4.4 Estimation of Prob- ability of Encoding Failure … 58 4.4.4 编码失败可能性的估计 ...58
4.4.5 Concluding Steps … 60 4.4.5 结束步骤 ...60
Figure 4.1: Illustration of source encoding and decoding. 图 4.1: 源编码和解码图示。
In order to represent a source message, we need to assign to each possible value of S_,s_inS^(n)\underline{S}, \underline{s} \in S^{n}, an index selected from a certain finite set, which, without loss of generality, may be fixed as {1,2,dots,M_(n)}\left\{1,2, \ldots, M_{n}\right\}. This assignment is accomplished by a mapping: 为了表示源消息,我们需要为 S_,s_inS^(n)\underline{S}, \underline{s} \in S^{n} 的每个可能值 分配一个从某个有限集中选择的索引,在不失去通用性的情况下,该索引可以固定为 {1,2,dots,M_(n)}\left\{1,2, \ldots, M_{n}\right\} 。此分配是通过映射完成的:
which we call a source encoder. Here the subscript nn is used to emphasize the dependency upon the length of the source message, and the superscript ( ss ) is used to indicate that the mapping is for source coding, to be distinguished from channel coding in Lecture 6. Given f_(n)^((s))f_{n}^{(s)}, the index W=f_(n)^((s))(S_)W=f_{n}^{(s)}(\underline{S}) is then a random variable induced by the source message random vector S_\underline{S}. 我们称之为 Source Encoder。这里下标 nn 用于强调对源消息长度的依赖性,上标 () ss 用于表示映射用于源编码,以便与第 6 讲中的通道编码区分开来。给定 f_(n)^((s))f_{n}^{(s)} ,索引 W=f_(n)^((s))(S_)W=f_{n}^{(s)}(\underline{S}) 则是由源消息 random vector S_\underline{S} 引起的随机变量。
Suppose that the index WW is revealed to the destination. The destination then needs to reproduce the source message as hat(S)_=\underline{\hat{S}}=[ hat(S)_(1), hat(S)_(2),dots, hat(S)_(n)]\left[\hat{S}_{1}, \hat{S}_{2}, \ldots, \hat{S}_{n}\right], which is also a length- nn random vector. But we allow hat(S)_(i),i=1,2,dots,n\hat{S}_{i}, i=1,2, \ldots, n, to take values in an alphabet hat(delta)\hat{\delta} possibly different from the source alphabet S\mathcal{S}, in general. This reproduction is accomplished by a mapping: 假设索引 WW 已显示到目标。然后,目标需要将源消息复制为 hat(S)_=\underline{\hat{S}}=[ hat(S)_(1), hat(S)_(2),dots, hat(S)_(n)]\left[\hat{S}_{1}, \hat{S}_{2}, \ldots, \hat{S}_{n}\right] ,这也是一个长度随机 nn 向量。但是我们通常允许 hat(S)_(i),i=1,2,dots,n\hat{S}_{i}, i=1,2, \ldots, n , 取 hat(delta)\hat{\delta} 可能与源字母 S\mathcal{S} 表不同的字母表中的值。此复制是通过映射完成的:
See Figure 4.1 for an illustration of the source encoding and decoding process. 有关源编码和解码过程的图示,请参见图 4.1。
Remark 4.1 The key feature of the problem formulation is the Markov chain relationship (4.3). Indeed, one may replace the deterministic mappings (f_(n)^((s)):}\left(f_{n}^{(s)}\right. and {:g_(n)^((s)))\left.g_{n}^{(s)}\right) by some conditional probability distributions P_(W∣S_)P_{W \mid \underline{S}} and P_( hat(S)_∣W)P_{\underline{\hat{S}} \mid W} respectively, and the central result in this lecture, Shannon’s fundamental theorem for source coding, in the next section, still holds. 注 4.1 问题表述的关键特征是马尔可夫链关系 (4.3)。事实上,我们可以分别用一些条件概率 P_(W∣S_)P_{W \mid \underline{S}} 分布 和 替换确定性映射 (f_(n)^((s)):}\left(f_{n}^{(s)}\right. 和 {:g_(n)^((s)))\left.g_{n}^{(s)}\right)P_( hat(S)_∣W)P_{\underline{\hat{S}} \mid W} ,而本次讲座的中心结果,即下一节中 Shannon 的源编码基本定理仍然成立。
Remark 4.2 Allowing the alphabet of reproduction hat(delta)\hat{\delta} to be different from that of source delta\delta may seem odd at first glance. But this considerably increases the applicability of the problem formulation, by enabling the destination to accomplish different tasks related to the source. For example, in an image classification task, the source is an image, and the destination is interested in deciding which category (e.g., animal, people, or landscape) the image belongs 注 4.2 允许复制 hat(delta)\hat{\delta} 的字母表与来源 delta\delta 的字母表不同,乍一看似乎很奇怪。但是,这使目标能够完成与源相关的不同任务,从而大大提高了问题表述的适用性。例如,在图像分类任务中,源是图像,目标对决定图像属于哪个类别(例如,动物、人物或风景)感兴趣
to. In this example, the source message (i.e., image) is an array of pixels, and the reproduced message is simply a label indicating the category of the source message. 自。在此示例中,源消息(即 image)是一个像素数组,而复制的消息只是一个标签,指示源消息的类别。
A source encoder represents each length- nn segment of source message as one of M_(n)M_{n} indices, which can be stored as a binary string of length |~log_(2)M_(n)~|\left\lceil\log _{2} M_{n}\right\rceil. On average, each source symbol is thus represented by |~log_(2)M_(n)~|//n\left\lceil\log _{2} M_{n}\right\rceil / n bits. We clarify that the term “bit” here corresponds to a unit of storage in a equipment such as computer, and it should not be confused with the measure of information we have introduced in Lecture 2. 源编码器将源消息的每个 length- nn 段表示为 M_(n)M_{n} 索引之一,该索引可以存储为 length |~log_(2)M_(n)~|\left\lceil\log _{2} M_{n}\right\rceil 的二进制字符串。因此,平均而言,每个源符号都由 |~log_(2)M_(n)~|//n\left\lceil\log _{2} M_{n}\right\rceil / n 位表示。我们澄清一下,这里的术语“位”对应于计算机等设备中的存储单位,不应将其与我们在第 2 讲中介绍的信息度量相混淆。
We thus define the rate of a source encoder/decoder pair as: 因此,我们将源编码器/解码器对的速率定义为:
As we allow a certain degree of ambiguity in the reproduced message at the destination, we introduce the notion of distortion here. A distortion measure dd is a mapping that assigns each pair (s, hat(s))inSxx hat(S)(s, \hat{s}) \in \mathcal{S} \times \hat{\mathcal{S}} a non-negative number d(s, hat(s))d(s, \hat{s}), that quantifies how much the cost incurred by reproducing ss as hat(s)\hat{s}. In our lecture notes, unless otherwise specified, we consider bounded distortion measures; that is, d_("max "):=max_((s, hat(s))inSxx hat(delta))d(s, hat(s)) < ood_{\text {max }}:=\max _{(s, \hat{s}) \in \mathcal{S} \times \hat{\delta}} d(s, \hat{s})<\infty. 由于我们允许目的地的复制消息存在一定程度的歧义,因此我们在此处引入失真的概念。失真度量 dd 是一种映射,它为每对 (s, hat(s))inSxx hat(S)(s, \hat{s}) \in \mathcal{S} \times \hat{\mathcal{S}} 分配一个非负数 ,该数字 d(s, hat(s))d(s, \hat{s}) 量化了复制 ss 为 hat(s)\hat{s} 所产生的成本。在我们的讲义中,除非另有说明,否则我们考虑有界失真度量;即 d_("max "):=max_((s, hat(s))inSxx hat(delta))d(s, hat(s)) < ood_{\text {max }}:=\max _{(s, \hat{s}) \in \mathcal{S} \times \hat{\delta}} d(s, \hat{s})<\infty 。
Example 4.1 For Hamming distortion, we have S= hat(S)\mathcal{S}=\hat{\mathcal{S}}, and let d(s, hat(s))=0d(s, \hat{s})=0 if s= hat(s)s=\hat{s} and 1 otherwise. Note that for Hamming distortion, E[d(S, hat(S))]=P(S!= hat(S))\mathrm{E}[d(S, \hat{S})]=P(S \neq \hat{S}). 例 4.1 对于汉明畸变,我们有 ,否则有 S= hat(S)\mathcal{S}=\hat{\mathcal{S}} ,让 d(s, hat(s))=0d(s, \hat{s})=0 if s= hat(s)s=\hat{s} 和 1 有。请注意,对于汉明失真, E[d(S, hat(S))]=P(S!= hat(S))\mathrm{E}[d(S, \hat{S})]=P(S \neq \hat{S}) .
For a source encoder/decoder pair (f_(n)^((s)),g_(n)^((s)))\left(f_{n}^{(s)}, g_{n}^{(s)}\right), we further impose an additive structure on the distortion between S_\underline{S} and hat(S)_\underline{\hat{S}}, as 对于源编码器/解码器对 (f_(n)^((s)),g_(n)^((s)))\left(f_{n}^{(s)}, g_{n}^{(s)}\right) ,我们进一步对 和 hat(S)_\underline{\hat{S}} 之间的 S_\underline{S} 失真施加加法结构,如
that is, the distortion between a source message and its corresponding reproduced message is the average of the pairwise distortion between each source symbol and its corresponding reproduced symbol. For Hamming distortion in Example 4.1, d(s_, hat(s)_)d(\underline{s}, \underline{\hat{s}}) is then the fraction of “errors” in the reproduced message hat(s)_\underline{\hat{s}}. 也就是说,源消息与其相应的再现消息之间的失真是每个源符号与其相应的复制符号之间的成对失真的平均值。对于例 4.1 中的 Hamming 失真, d(s_, hat(s)_)d(\underline{s}, \underline{\hat{s}}) 则 是再现的消息 hat(s)_\underline{\hat{s}} 中 “错误” 的分数。
It is certainly desirable to have a source encoder/decoder pair that has a low rate as well as a small distortion. Induced by the Markov chain S_harr W harr hat(S)_\underline{S} \leftrightarrow W \leftrightarrow \underline{\hat{S}}, the distortion d(S_, hat(S)_)d(\underline{S}, \underline{\hat{S}}) is a random variable. This consideration leads to the following definition of an achievable rate-distortion pair. 当然,最好有一个具有低速率和小失真的源编码器/解码器对。由马尔可夫链 S_harr W harr hat(S)_\underline{S} \leftrightarrow W \leftrightarrow \underline{\hat{S}} 诱导,失真 d(S_, hat(S)_)d(\underline{S}, \underline{\hat{S}}) 是一个随机变量。这种考虑导致了可实现的速率失真对的以下定义。
Definition 4.1 A rate-distortion pair ( R,DR, D ) is said to be achievable, if there exists a sequence of source encoder/decoder pairs, 定义 4.1 如果存在一系列源编码器/解码器对,则称速率-失真对 ( R,DR, D ) 是可以实现的,