这是用户在 2024-12-11 17:24 为 https://app.immersivetranslate.com/pdf-pro/5be26a02-6df8-4b07-bfc0-75779ca3c272 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
用于顺序数据压缩的通用算法

A Universal Algorithm for Sequential Data Compression
用于顺序数据压缩的通用算法

JACOB ZIV, FELLOW, IEEE, AND ABRAHAM LEMPEL, MEMBER, IEEE
雅各布·齐夫, IEEE 院士, 亚伯拉罕·伦普尔, IEEE 会员

Abstract 摘要

A universal algorithm for sequential data compression is presented. Its performance is investigated with respect to a nonprobabilistic model of constrained sources. The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable-to-block codes designed to match a completely specified source.
呈现了一种用于序列数据压缩的通用算法。其性能是根据受约束源的非概率模型进行的调查。所提出的通用代码所获得的压缩率均匀地接近了块到可变长编码和可变长到块编码所能达到的压缩率下限,这些编码旨在匹配完全指定的源。

I. INTRODUCTION 一、 引言

IN MANY situations arising in digital communications and data processing, the encountered strings of data display various structural regularities or are otherwise subject to certain constraints, thereby allowing for storage and time-saving techniques of data compression. Given a discrete data source, the problem of data compression is first to identify the limitations of the source, and second to devise a coding scheme which, subject to certain performance criteria, will best compress the given source.
在数字通信和数据处理中的许多情况下,遇到的数据串显示各种结构性规则或受到某些约束,从而允许使用数据压缩的存储和节时技术。给定一个离散数据源,数据压缩的问题首先是识别该源的局限性,其次是设计一个在某些性能标准下最佳压缩给定源的编码方案。
Once the relevant source parameters have been identified, the problem reduces to one of minimum-redundancy coding. This phase of the problem has received extensive treatment in the literature [1]-[7].
一旦确定了相关的源参数,问题就简化为最小冗余编码的问题。这个问题的这一阶段在文献[1]-[7]中已经得到了广泛的研究。
When no a priori knowledge of the source characteristics is available, and if statistical tests are either impossible or unreliable, the problem of data compression becomes considerably more complicated. In order to overcome these difficulties one must resort to universal coding schemes whereby the coding process is interlaced with a learning process for the varying source characteristics [8], [9]. Such coding schemes inevitably require a larger working memory space and generally employ performance criteria that are appropriate for a wide variety of sources.
当没有先验源特征知识可用,且统计检验要么不可能要么不可靠时,数据压缩问题将变得更加复杂。为克服这些困难,必须依赖通用编码方案,其中编码过程与对变化源特征的学习过程交织在一起。这种编码方案必然需要更大的工作内存空间,并通常采用适合多种源的性能准则。
In this paper, we describe a universal coding scheme which can be applied to any discrete source and whose performance is comparable to certain optimal fixed code book schemes designed for completely specified sources. For lack of adequate criteria, we do not attempt to rank the proposed scheme with respect to other possible universal coding schemes. Instead, for the broad class of sources defined in Section III, we derive upper bounds on the compression efficiency attainable with full a priori knowledge of the source by fixed code book schemes, and
在本文中,我们描述了一种通用编码方案,可应用于任何离散源,其性能可与为特定源设计的某些最优定长码本方案相媲美。由于缺乏适当的标准,我们没有尝试将所提出的方案与其他可能的通用编码方案进行排名比较。相反,对于第三节中定义的广泛源类,我们推导了利用定长码本方案在完全事先知道源的情况下可以达到的最大压缩效率的上界。
then show that the efficiency of our universal code with no a priori knowledge of the source approaches those bounds.
然后证明我们的无先验知识的通用编码的效率接近于这些界限。
The proposed compression algorithm is an adaptation of a simple copying procedure discussed recently [10] in a study on the complexity of finite sequences. Basically, we employ the concept of encoding future segments of the source-output via maximum-length copying from a buffer containing the recent past output. The transmitted codeword consists of the buffer address and the length of the copied segment. With a predetermined initial load of the buffer and the information contained in the codewords, the source data can readily be reconstructed at the decoding end of the process.
该提议的压缩算法是对最近[10]在关于有限序列复杂性的研究中讨论的简单复制过程的改编。基本上,我们采用通过从包含最近过去输出的缓冲区进行最大长度复制来对源输出的未来段进行编码的概念。传输的代码字由缓冲区地址和复制段的长度组成。通过缓冲区的预定初始负载和代码字中包含的信息,可以在解码过程的末端轻松地重构源数据。
The main drawback of the proposed algorithm is its susceptibility to error propagation in the event of a channel error.
所提出算法的主要缺点是易受信道错误传播的影响。

II. The Compression Algorithm
压缩算法

The proposed compression algorithm consists of a rule for parsing strings of symbols from a finite alphabet A A AA into substrings, or words, whose lengths do not exceed a prescribed integer L s L s L_(s)L_{s}, and a coding scheme which maps these substrings sequentially into uniquely decipherable codewords of fixed length L c L c L_(c)L_{c} over the same alphabet A A AA.
提议的压缩算法由以下两部分组成:一个将来自有限字母表 A A AA 的符号串解析为长度不超过预定整数 L s L s L_(s)L_{s} 的子串(或单词)的规则,以及一个将这些子串按顺序映射到同一字母表 A A AA 上的固定长度 L c L c L_(c)L_{c} 的唯一可解密编码字的编码方案。
The word-length bounds L s L s L_(s)L_{s} and L c L c L_(c)L_{c} allow for boundeddelay encoding and decoding, and they are related by
词长界限 L s L s L_(s)L_{s} L c L c L_(c)L_{c} 允许有界延迟编码和解码,它们之间有如下关系
L c = 1 + log ( n L s ) + log L s , L c = 1 + log n L s + log L s , L_(c)=1+|~log(n-L_(s))~|+|~log L_(s)~|,L_{c}=1+\left\lceil\log \left(n-L_{s}\right)\right\rceil+\left\lceil\log L_{s}\right\rceil,
where x x |~x~|\lceil x\rceil is the least integer not smaller than x x xx, the logarithm base is the cardinality α α alpha\alpha of the alphabet A A AA, and n n nn is the length of a buffer, employed at the encoding end of the process, which stores the latest n n nn symbols emitted by the source. The exact relationship between n n nn and L s L s L_(s)L_{s} is discussed in Section III. Typically, n L s α h L s n L s α h L s n≃L_(s)alpha^(hL_(s))n \simeq L_{s} \alpha^{h L_{s}}, where 0 < h < 1 < h < 1 < h < 1<h<1. For on-line decoding, a buffer of similar length has to be employed at the decoding end also.
其中 x x |~x~|\lceil x\rceil 是不小于 x x xx 的最小整数,对数底数是字母表 A A AA 的基数 α α alpha\alpha , n n nn 是编码端使用的缓冲区长度,存储源最近输出的 n n nn 个符号。 n n nn L s L s L_(s)L_{s} 之间的确切关系在第三节中讨论。通常, n L s α h L s n L s α h L s n≃L_(s)alpha^(hL_(s))n \simeq L_{s} \alpha^{h L_{s}} ,其中 0 < h < 1 < h < 1 < h < 1<h<1 。对于在线解码,解码端也需要使用类似长度的缓冲区。
To describe the exact mechanics of the parsing and coding procedures, we need some preparation by way of notation and definitions.
为了描述解析和编码过程的确切机制,我们需要通过符号和定义进行一些准备工作。
Consider a finite alphabet A A AA of α α alpha\alpha symbols, say A = A = A=A= { 0 , 1 , , α 1 } { 0 , 1 , , α 1 } {0,1,cdots,alpha-1}\{0,1, \cdots, \alpha-1\}. A string, or word, S S SS of length ( S ) = k ( S ) = k ℓ(S)=k\ell(S)=k over A A AA is an ordered k k kk-tuple S = s 1 s 2 s k S = s 1 s 2 s k S=s_(1)s_(2)cdotss_(k)S=s_{1} s_{2} \cdots s_{k} of symbols from A A AA. To indicate a substring of S S SS which starts at position i i ii and ends at position j j jj, we write S ( i , j ) S ( i , j ) S(i,j)S(i, j). When i j , S ( i , j ) = i j , S ( i , j ) = i <= j,S(i,j)=i \leq j, S(i, j)= s i s i + 1 s j s i s i + 1 s j s_(i)s_(i+1)cdotss_(j)s_{i} s_{i+1} \cdots s_{j}, but when i > j i > j i > ji>j, we take S ( i , j ) = Λ S ( i , j ) = Λ S(i,j)=LambdaS(i, j)=\Lambda, the null string of length zero.
考虑一个有限字母表 A A AA ,其中有 α α alpha\alpha 个符号,如 A = A = A=A= { 0 , 1 , , α 1 } { 0 , 1 , , α 1 } {0,1,cdots,alpha-1}\{0,1, \cdots, \alpha-1\} 。长度为 ( S ) = k ( S ) = k ℓ(S)=k\ell(S)=k 的字符串或单词 S S SS 是从 A A AA 中选择的符号的有序 k k kk 元组 S = s 1 s 2 s k S = s 1 s 2 s k S=s_(1)s_(2)cdotss_(k)S=s_{1} s_{2} \cdots s_{k} 。为了表示 S S SS 中从位置 i i ii 开始到位置 j j jj 结束的子字符串,我们写为 S ( i , j ) S ( i , j ) S(i,j)S(i, j) 。当 i j , S ( i , j ) = i j , S ( i , j ) = i <= j,S(i,j)=i \leq j, S(i, j)= s i s i + 1 s j s i s i + 1 s j s_(i)s_(i+1)cdotss_(j)s_{i} s_{i+1} \cdots s_{j} 时,但是当 i > j i > j i > ji>j 时,我们取 S ( i , j ) = Λ S ( i , j ) = Λ S(i,j)=LambdaS(i, j)=\Lambda ,即长度为零的空字符串。
The concatenation of strings Q Q QQ and R R RR forms a new string S = Q R S = Q R S=QRS=Q R; if ( Q ) = k ( Q ) = k ℓ(Q)=k\ell(Q)=k and ( R ) = m ( R ) = m ℓ(R)=m\ell(R)=m, then ( S ) = k + m , Q ( S ) = k + m , Q ℓ(S)=k+m,Q\ell(S)=k+m, Q = S ( 1 , k ) = S ( 1 , k ) =S(1,k)=S(1, k), and R = S ( k + 1 , k + m ) R = S ( k + 1 , k + m ) R=S(k+1,k+m)R=S(k+1, k+m). For each j , 0 j j , 0 j j,0 <= j <=j, 0 \leq j \leq
字符串 Q Q QQ R R RR 的连接形成一个新的字符串 S = Q R S = Q R S=QRS=Q R ;如果 ( Q ) = k ( Q ) = k ℓ(Q)=k\ell(Q)=k ( R ) = m ( R ) = m ℓ(R)=m\ell(R)=m ,则 ( S ) = k + m , Q ( S ) = k + m , Q ℓ(S)=k+m,Q\ell(S)=k+m, Q = S ( 1 , k ) = S ( 1 , k ) =S(1,k)=S(1, k) , R = S ( k + 1 , k + m ) R = S ( k + 1 , k + m ) R=S(k+1,k+m)R=S(k+1, k+m) 。对于每个 j , 0 j j , 0 j j,0 <= j <=j, 0 \leq j \leq

( S ) , S ( 1 , j ) ( S ) , S ( 1 , j ) ℓ(S),S(1,j)\ell(S), S(1, j) is called a prefix of S ; S ( 1 , j ) S ; S ( 1 , j ) S;S(1,j)S ; S(1, j) is a proper prefix of S S SS if j < ( S ) j < ( S ) j < ℓ(S)j<\ell(S).
( S ) , S ( 1 , j ) ( S ) , S ( 1 , j ) ℓ(S),S(1,j)\ell(S), S(1, j) S ; S ( 1 , j ) S ; S ( 1 , j ) S;S(1,j)S ; S(1, j) 的前缀,如果 j < ( S ) j < ( S ) j < ℓ(S)j<\ell(S) ,则 ( S ) , S ( 1 , j ) ( S ) , S ( 1 , j ) ℓ(S),S(1,j)\ell(S), S(1, j) S ; S ( 1 , j ) S ; S ( 1 , j ) S;S(1,j)S ; S(1, j) 的真前缀。
Given a proper prefix S ( 1 , j ) S ( 1 , j ) S(1,j)S(1, j) of a string S S SS and a positive integer i i ii such that i j i j i <= ji \leq j, let L ( i ) L ( i ) L(i)L(i) denote the largest nonnegative integer ( S ) j ( S ) j ℓ <= ℓ(S)-j\ell \leqslant \ell(S)-j such that
给定一个字符串 S S SS 的合适前缀 S ( 1 , j ) S ( 1 , j ) S(1,j)S(1, j) 和一个正整数 i i ii ,使得 i j i j i <= ji \leq j ,令 L ( i ) L ( i ) L(i)L(i) 表示最大的非负整数 ( S ) j ( S ) j ℓ <= ℓ(S)-j\ell \leqslant \ell(S)-j ,使得
S ( i , i + 1 ) = S ( j + 1 , j + ) S ( i , i + 1 ) = S ( j + 1 , j + ) S(i,i+ℓ-1)=S(j+1,j+ℓ)S(i, i+\ell-1)=S(j+1, j+\ell)
and let p p pp be a position of S ( 1 , j ) S ( 1 , j ) S(1,j)S(1, j) for which
并且让 p p pp 成为 S ( 1 , j ) S ( 1 , j ) S(1,j)S(1, j) 的一个位置
L ( p ) = max 1 i j { L ( i ) } L ( p ) = max 1 i j { L ( i ) } L(p)=max_(1 <= i <= j){L(i)}L(p)=\max _{1 \leq i \leq j}\{L(i)\}
The substring S ( j + 1 , j + L ( p ) ) S ( j + 1 , j + L ( p ) ) S(j+1,j+L(p))S(j+1, j+L(p)) of S S SS is called the reproducible extension of S ( 1 , j ) S ( 1 , j ) S(1,j)S(1, j) into S S SS, and the integer p p pp is called the pointer of the reproduction. For example, if S S SS = 00101011 = 00101011 =00101011=00101011 and j = 3 j = 3 j=3j=3, then L ( 1 ) = 1 L ( 1 ) = 1 L(1)=1L(1)=1 since S ( j + 1 , j + 1 ) = S ( j + 1 , j + 1 ) = S(j+1,j+1)=S(j+1, j+1)= S ( 1 , 1 ) S ( 1 , 1 ) S(1,1)S(1,1) but S ( j + 1 , j + 2 ) S ( 1 , 2 ) S ( j + 1 , j + 2 ) S ( 1 , 2 ) S(j+1,j+2)!=S(1,2)S(j+1, j+2) \neq S(1,2). Similarly, L ( 2 ) = 4 L ( 2 ) = 4 L(2)=4L(2)=4 and L ( 3 ) = 0 L ( 3 ) = 0 L(3)=0L(3)=0. Hence, S ( 3 + 1 , 3 + 4 ) = 0101 S ( 3 + 1 , 3 + 4 ) = 0101 S(3+1,3+4)=0101S(3+1,3+4)=0101 is the reproducible extension of S ( 1 , 3 ) = 001 S ( 1 , 3 ) = 001 S(1,3)=001S(1,3)=001 into S S SS with pointer p = 2 p = 2 p=2p=2.
S ( j + 1 , j + L ( p ) ) S ( j + 1 , j + L ( p ) ) S(j+1,j+L(p))S(j+1, j+L(p)) S S SS 的可重复扩展到 S S SS ,整数 p p pp 是该重复的指针。例如,如果 S S SS = 00101011 = 00101011 =00101011=00101011 j = 3 j = 3 j=3j=3 ,那么 L ( 1 ) = 1 L ( 1 ) = 1 L(1)=1L(1)=1 ,因为 S ( j + 1 , j + 1 ) = S ( j + 1 , j + 1 ) = S(j+1,j+1)=S(j+1, j+1)= S ( 1 , 1 ) S ( 1 , 1 ) S(1,1)S(1,1) S ( j + 1 , j + 2 ) S ( 1 , 2 ) S ( j + 1 , j + 2 ) S ( 1 , 2 ) S(j+1,j+2)!=S(1,2)S(j+1, j+2) \neq S(1,2) 。同样地, L ( 2 ) = 4 L ( 2 ) = 4 L(2)=4L(2)=4 L ( 3 ) = 0 L ( 3 ) = 0 L(3)=0L(3)=0 。因此, S ( 3 + 1 , 3 + 4 ) = 0101 S ( 3 + 1 , 3 + 4 ) = 0101 S(3+1,3+4)=0101S(3+1,3+4)=0101 S ( 1 , 3 ) = 001 S ( 1 , 3 ) = 001 S(1,3)=001S(1,3)=001 S S SS 的可重复扩展,指针为 p = 2 p = 2 p=2p=2
Now, to describe the encoding process, let S = s 1 s 2 S = s 1 s 2 S=s_(1)s_(2)dotsS=s_{1} s_{2} \ldots denote the string of symbols emitted by the source. The sequential encoding of S S SS entails parsing S S SS into successive source words, S = S 1 S 2 S = S 1 S 2 S=S_(1)S_(2)cdotsS=S_{1} S_{2} \cdots, and assigning a codeword C i C i C_(i)C_{i} for each S i S i S_(i)S_{i}. For bounded-delay encoding, the length i i ℓ_(i)\ell_{i} of each S i S i S_(i)S_{i} is at most equal to a predetermined parameter L s L s L_(s)L_{s}, while each C i C i C_(i)C_{i} is of fixed length L c L c L_(c)L_{c} as given by (1).
现在,为了描述编码过程,让 S = s 1 s 2 S = s 1 s 2 S=s_(1)s_(2)dotsS=s_{1} s_{2} \ldots 表示源发出的符号串。 S S SS 的顺序编码涉及将 S S SS 解析为连续的源字, S = S 1 S 2 S = S 1 S 2 S=S_(1)S_(2)cdotsS=S_{1} S_{2} \cdots ,并为每个 S i S i S_(i)S_{i} 分配一个代码字 C i C i C_(i)C_{i} 。对于有界延迟编码,每个 S i S i S_(i)S_{i} 的长度 i i ℓ_(i)\ell_{i} 最多等于预先确定的参数 L s L s L_(s)L_{s} ,而每个 C i C i C_(i)C_{i} 的长度 L c L c L_(c)L_{c} 是固定的,由(1)式给出。
To initiate the encoding process, we assume that the output S S SS of the source was preceded by a string Z Z ZZ of n n n-n- L s L s L_(s)L_{s} zeros, and we store the string B 1 = Z S ( 1 , L s ) B 1 = Z S 1 , L s B_(1)=ZS(1,L_(s))B_{1}=Z S\left(1, L_{s}\right) in the buffer. If S ( 1 , j ) S ( 1 , j ) S(1,j)S(1, j) is the reproducible extension of Z Z ZZ into Z S ( 1 , L s 1 ) Z S 1 , L s 1 ZS(1,L_(s)-1)Z S\left(1, L_{s}-1\right), then S 1 = S ( 1 , j + 1 ) S 1 = S ( 1 , j + 1 ) S_(1)=S(1,j+1)S_{1}=S(1, j+1) and 1 = j + 1 1 = j + 1 ℓ_(1)=j+1\ell_{1}=j+1. To determine the next source word, we shift out the first 1 1 ℓ_(1)\ell_{1} symbols from the buffer and feed into it the next 1 1 ℓ_(1)\ell_{1} symbols of S S SS to obtain the string B 2 = B 1 ( 1 + 1 , n ) S ( L s + 1 B 2 = B 1 1 + 1 , n S L s + 1 B_(2)=B_(1)(ℓ_(1)+1,n)S(L_(s)+1:}B_{2}=B_{1}\left(\ell_{1}+1, n\right) S\left(L_{s}+1\right., L s + 1 L s + 1 L_(s)+ℓ_(1)L_{s}+\ell_{1} ). Now we look for the reproducible extension E E EE of B 2 ( 1 , n L s ) B 2 1 , n L s B_(2)(1,n-L_(s))B_{2}\left(1, n-L_{s}\right) into B 2 ( 1 , n 1 ) B 2 ( 1 , n 1 ) B_(2)(1,n-1)B_{2}(1, n-1), and set S 2 = E s S 2 = E s S_(2)=EsS_{2}=E s, where s s ss is the symbol next to E E EE in B 2 B 2 B_(2)B_{2}. In general, if B i B i B_(i)B_{i} denotes the string of n n nn source symbols stored in the buffer when we are ready to determine the i i ii th source word S i S i S_(i)S_{i}, the successive encoding steps can be formally described as follows.
为了启动编码过程,我们假设源输出 S S SS 是由长度为 n n n-n- L s L s L_(s)L_{s} 个 0 组成的字符串 Z Z ZZ 前缀的,并将字符串 B 1 = Z S ( 1 , L s ) B 1 = Z S 1 , L s B_(1)=ZS(1,L_(s))B_{1}=Z S\left(1, L_{s}\right) 存储在缓冲区中。如果 S ( 1 , j ) S ( 1 , j ) S(1,j)S(1, j) Z Z ZZ 延伸到 Z S ( 1 , L s 1 ) Z S 1 , L s 1 ZS(1,L_(s)-1)Z S\left(1, L_{s}-1\right) 的可复制扩展,那么 S 1 = S ( 1 , j + 1 ) S 1 = S ( 1 , j + 1 ) S_(1)=S(1,j+1)S_{1}=S(1, j+1) 1 = j + 1 1 = j + 1 ℓ_(1)=j+1\ell_{1}=j+1 。为了确定下一个源词,我们从缓冲区中移出前 1 1 ℓ_(1)\ell_{1} 个符号,并将 S S SS 的下一个 1 1 ℓ_(1)\ell_{1} 个符号输入到其中,以获得字符串 B 2 = B 1 ( 1 + 1 , n ) S ( L s + 1 B 2 = B 1 1 + 1 , n S L s + 1 B_(2)=B_(1)(ℓ_(1)+1,n)S(L_(s)+1:}B_{2}=B_{1}\left(\ell_{1}+1, n\right) S\left(L_{s}+1\right. , L s + 1 L s + 1 L_(s)+ℓ_(1)L_{s}+\ell_{1} )。现在我们寻找 B 2 ( 1 , n L s ) B 2 1 , n L s B_(2)(1,n-L_(s))B_{2}\left(1, n-L_{s}\right) 延伸到 B 2 ( 1 , n 1 ) B 2 ( 1 , n 1 ) B_(2)(1,n-1)B_{2}(1, n-1) 的可重复扩展 E E EE ,并设置 S 2 = E s S 2 = E s S_(2)=EsS_{2}=E s ,其中 s s ss B 2 B 2 B_(2)B_{2} 中紧挨着 E E EE 的符号。一般而言,如果 B i B i B_(i)B_{i} 表示在我们准备确定第 i i ii 个源词 S i S i S_(i)S_{i} 时存储在缓冲区中的 n n nn 个源符号的字符串,则可以将连续的编码步骤形式化地描述如下。
  1. Initially, set B 1 = 0 n L s S ( 1 , L s ) B 1 = 0 n L s S 1 , L s B_(1)=0^(n-L_(s))S(1,L_(s))B_{1}=0^{n-L_{s}} S\left(1, L_{s}\right), i.e., the all-zero string of length n L s n L s n-L_(s)n-L_{s} followed by the first L s L s L_(s)L_{s} symbols of S S SS.
    最初,设置 B 1 = 0 n L s S ( 1 , L s ) B 1 = 0 n L s S 1 , L s B_(1)=0^(n-L_(s))S(1,L_(s))B_{1}=0^{n-L_{s}} S\left(1, L_{s}\right) ,即长度为 n L s n L s n-L_(s)n-L_{s} 的全零字符串,后跟 S S SS 的前 L s L s L_(s)L_{s} 个符号。
  2. Having determined B i , i 1 B i , i 1 B_(i),i >= 1B_{i}, i \geq 1, set
    确定 B i , i 1 B i , i 1 B_(i),i >= 1B_{i}, i \geq 1 之后,设置
S i = B i ( n L s + 1 , n L s + i ) S i = B i n L s + 1 , n L s + i S_(i)=B_(i)(n-L_(s)+1,n-L_(s)+ℓ_(i))S_{i}=B_{i}\left(n-L_{s}+1, n-L_{s}+\ell_{i}\right)
where the prefix of length i 1 i 1 ℓ_(i)-1\ell_{i}-1 of S i S i S_(i)S_{i} is the reproducible extension of B i ( 1 , n L s ) B i 1 , n L s B_(i)(1,n-L_(s))B_{i}\left(1, n-L_{s}\right) into B i ( 1 , n 1 ) B i ( 1 , n 1 ) B_(i)(1,n-1)B_{i}(1, n-1).
其中长度为 i 1 i 1 ℓ_(i)-1\ell_{i}-1 S i S i S_(i)S_{i} 前缀是将 B i ( 1 , n L s ) B i 1 , n L s B_(i)(1,n-L_(s))B_{i}\left(1, n-L_{s}\right) 扩展到 B i ( 1 , n 1 ) B i ( 1 , n 1 ) B_(i)(1,n-1)B_{i}(1, n-1) 的可重复扩展。

3) If p i p i p_(i)p_{i} is the reproduction pointer used to determine S i S i S_(i)S_{i}, then the codeword C i C i C_(i)C_{i} for S i S i S_(i)S_{i} is given by
如果 p i p i p_(i)p_{i} 是用于确定 S i S i S_(i)S_{i} 的复制指针,那么 S i S i S_(i)S_{i} 的代码字 C i C i C_(i)C_{i}
C i = C i 1 C i 2 C i 3 C i = C i 1 C i 2 C i 3 C_(i)=C_(i1)C_(i2)C_(i3)C_{i}=C_{i 1} C_{i 2} C_{i 3}
where C i 1 C i 1 C_(i1)C_{i 1} is the radix- α α alpha\alpha representation of p i 1 p i 1 p_(i)-1p_{i}-1 with ( C i 1 ) = log ( n L s ) , C i 2 C i 1 = log n L s , C i 2 ℓ(C_(i1))=|~log(n-L_(s))~|,C_(i2)\ell\left(C_{i 1}\right)=\left\lceil\log \left(n-L_{s}\right)\right\rceil, C_{i 2} is the radix- α α alpha\alpha representation of i 1 i 1 ℓ_(i)-1\ell_{i}-1 with ( C i 2 ) = log L s C i 2 = log L s ℓ(C_(i2))=|~log L_(s)~|\ell\left(C_{i 2}\right)=\left\lceil\log L_{s}\right\rceil, and C i 3 C i 3 C_(i3)C_{i 3} is the last symbol of S i S i S_(i)S_{i}, i.e., the symbol occupying position n L s + i n L s + i n-L_(s)+ℓ_(i)n-L_{s}+\ell_{i} of B i B i B_(i)B_{i}. The total length of C i C i C_(i)C_{i} is given by
其中 C i 1 C i 1 C_(i1)C_{i 1} p i 1 p i 1 p_(i)-1p_{i}-1 的基数- α α alpha\alpha 表示形式, ( C i 1 ) = log ( n L s ) , C i 2 C i 1 = log n L s , C i 2 ℓ(C_(i1))=|~log(n-L_(s))~|,C_(i2)\ell\left(C_{i 1}\right)=\left\lceil\log \left(n-L_{s}\right)\right\rceil, C_{i 2} i 1 i 1 ℓ_(i)-1\ell_{i}-1 的基数- α α alpha\alpha 表示形式, ( C i 2 ) = log L s C i 2 = log L s ℓ(C_(i2))=|~log L_(s)~|\ell\left(C_{i 2}\right)=\left\lceil\log L_{s}\right\rceil , C i 3 C i 3 C_(i3)C_{i 3} S i S i S_(i)S_{i} 的最后一个符号,即 B i B i B_(i)B_{i} n L s + i n L s + i n-L_(s)+ℓ_(i)n-L_{s}+\ell_{i} 个位置的符号。 C i C i C_(i)C_{i} 的总长度由
( C i ) = log ( n L s ) + log L s + 1 C i = log n L s + log L s + 1 ℓ(C_(i))=|~log(n-L_(s))~|+|~log L_(s)~|+1\ell\left(C_{i}\right)=\left\lceil\log \left(n-L_{s}\right)\right\rceil+\left\lceil\log L_{s}\right\rceil+1
in accordance with (1). 根据(1)。
4) To update the contents of the buffer, shift out the symbols occupying the first i i ℓ_(i)\ell_{i} positions of the buffer while feeding in the next i i ℓ_(i)\ell_{i} symbols from the source to obtain
更新缓冲区内容时,先移出缓冲区前 i i ℓ_(i)\ell_{i} 个位置占用的符号,然后从源中输入下一个 i i ℓ_(i)\ell_{i} 个符号
B i + 1 = B i ( i + 1 , n ) S ( h i + 1 , h i + i ) B i + 1 = B i i + 1 , n S h i + 1 , h i + i B_(i+1)=B_(i)(ℓ_(i)+1,n)S(h_(i)+1,h_(i)+ℓ_(i))B_{i+1}=B_{i}\left(\ell_{i}+1, n\right) S\left(h_{i}+1, h_{i}+\ell_{i}\right)
where h i h i h_(i)h_{i} is the position of S S SS occupied by the last symbol of B i B i B_(i)B_{i}.
其中 h i h i h_(i)h_{i} S S SS 占用 B i B i B_(i)B_{i} 最后一个符号的位置。
This completes the description of the encoding process. It is easy to verify that the parsing rule defined by (2) guarantees a bounded, positive source word length in each iteration; in fact, 1 i L s 1 i L s 1 <= ℓ_(i) <= L_(s)1 \leq \ell_{i} \leq L_{s} for each i i ii thus allowing for a radix- α α alpha\alpha representation of i 1 i 1 ℓ_(i)-1\ell_{i}-1 with log L s log L s |~log L_(s)~|\left\lceil\log L_{s}\right\rceil symbols from A A AA. Also, since 1 p i n L s 1 p i n L s 1 <= p_(i) <= n-L_(s)1 \leq p_{i} \leq n-L_{s} for each i i ii, it is possible to represent p i 1 p i 1 p_(i)-1p_{i}-1 with log ( n L s ) log n L s |~log(n-L_(s))~|\left\lceil\log \left(n-L_{s}\right)\right\rceil symbols from A A AA.
这完成了编码过程的描述。很容易验证由(2)定义的解析规则保证了每次迭代中源词长度的有界性和正性; 事实上, 1 i L s 1 i L s 1 <= ℓ_(i) <= L_(s)1 \leq \ell_{i} \leq L_{s} 对于每一个 i i ii 从而允许用 α α alpha\alpha 进制表示 i 1 i 1 ℓ_(i)-1\ell_{i}-1 log L s log L s |~log L_(s)~|\left\lceil\log L_{s}\right\rceil 个符号来自 A A AA 。此外,由于 1 p i n L s 1 p i n L s 1 <= p_(i) <= n-L_(s)1 \leq p_{i} \leq n-L_{s} 对于每一个 i i ii ,因此可用 log ( n L s ) log n L s |~log(n-L_(s))~|\left\lceil\log \left(n-L_{s}\right)\right\rceil 个来自 A A AA 的符号表示 p i 1 p i 1 p_(i)-1p_{i}-1
Decoding can be performed simply by reversing the encoding process. Here we employ a buffer of length n n n-n- L s L s L_(s)L_{s} to store the latest decoded source symbols. Initially, the buffer is loaded with n L s n L s n-L_(s)n-L_{s} zeros. If D i = d 1 d 2 d n L s D i = d 1 d 2 d n L s D_(i)=d_(1)d_(2)cdotsd_(n-L_(s))D_{i}=d_{1} d_{2} \cdots d_{n-L_{s}} denotes the contents of the buffer after C i 1 C i 1 C_(i-1)C_{i-1} has been decoded into S i 1 S i 1 S_(i-1)S_{i-1}, then
解码只需简单地反转编码过程。这里我们使用长度为 n n n-n- 的缓冲区 L s L s L_(s)L_{s} 来存储最近解码的源符号。初始时,缓冲区被加载为 n L s n L s n-L_(s)n-L_{s} 个零。如果 D i = d 1 d 2 d n L s D i = d 1 d 2 d n L s D_(i)=d_(1)d_(2)cdotsd_(n-L_(s))D_{i}=d_{1} d_{2} \cdots d_{n-L_{s}} 表示在 C i 1 C i 1 C_(i-1)C_{i-1} 被解码为 S i 1 S i 1 S_(i-1)S_{i-1} 之后缓冲区的内容,那么
S i 1 = D i ( n L s i 1 + 1 , n L s ) S i 1 = D i n L s i 1 + 1 , n L s S_(i)quad_(1)=D_(i)(n-L_(s)-ℓ_(i-1)+1,n-L_(s))S_{i} \quad{ }_{1}=D_{i}\left(n-L_{s}-\ell_{i-1}+1, n-L_{s}\right)
where i 1 = ( S i 1 ) i 1 = S i 1 ℓ_(i-1)=ℓ(S_(i-1))\ell_{i-1}=\ell\left(S_{i-1}\right), and where D i + 1 D i + 1 D_(i+1)D_{i+1} can be obtained from D i D i D_(i)D_{i} and C i C i C_(i)C_{i} as follows.
i 1 = ( S i 1 ) i 1 = S i 1 ℓ_(i-1)=ℓ(S_(i-1))\ell_{i-1}=\ell\left(S_{i-1}\right) ,且可以从 D i D i D_(i)D_{i} C i C i C_(i)C_{i} 获得 D i + 1 D i + 1 D_(i+1)D_{i+1} ,如下所示。
Determine p i 1 p i 1 p_(i)-1p_{i}-1 and i 1 i 1 ℓ_(i)-1\ell_{i}-1 from the first log ( n L s ) log n L s |~log(n-L_(s))~|\left\lceil\log \left(n-L_{s}\right)\right\rceil and the next log L s log L s |~log L_(s)~|\left\lceil\log L_{s}\right\rceil symbols of C i C i C_(i)C_{i}. Then, apply i 1 i 1 ℓ_(i)-1\ell_{i}-1 shifts while feeding the contents of stage p i p i p_(i)p_{i} into stage n n n-n- L s L s L_(s)L_{s}. The first of these shifts will change the buffer contents from D i D i D_(i)D_{i} to
C i C i C_(i)C_{i} 的前 log ( n L s ) log n L s |~log(n-L_(s))~|\left\lceil\log \left(n-L_{s}\right)\right\rceil 个和后 log L s log L s |~log L_(s)~|\left\lceil\log L_{s}\right\rceil 个符号确定 p i 1 p i 1 p_(i)-1p_{i}-1 i 1 i 1 ℓ_(i)-1\ell_{i}-1 。然后在将 p i p i p_(i)p_{i} 阶段的内容输送到 n n n-n- 阶段时应用 i 1 i 1 ℓ_(i)-1\ell_{i}-1 个位移操作。第一次位移将把缓冲区内容从 D i D i D_(i)D_{i} 改变为
D i ( 1 ) = d 2 d 3 d n L s d p i = d 1 ( 1 ) d 2 ( 1 ) d n L n ( 1 ) D i ( 1 ) = d 2 d 3 d n L s d p i = d 1 ( 1 ) d 2 ( 1 ) d n L n ( 1 ) D_(i)^((1))=d_(2)d_(3)cdotsd_(n-L_(s))d_(p_(i))=d_(1)^((1))d_(2)^((1))cdotsd_(n-L_(n))^((1))D_{i}^{(1)}=d_{2} d_{3} \cdots d_{n-L_{s}} d_{p_{i}}=d_{1}^{(1)} d_{2}^{(1)} \cdots d_{n-L_{n}}^{(1)}
Similarly, if j i 1 j i 1 j <= ℓ_(i)-1j \leq \ell_{i}-1, the j j jj th shift will transform D i ( j 1 ) D i ( j 1 ) D_(i)^((j-1))D_{i}^{(j-1)} = d 1 ( j 1 ) d 2 ( j 1 ) d n L s ( j 1 ) = d 1 ( j 1 ) d 2 ( j 1 ) d n L s ( j 1 ) =d_(1)^((j-1))d_(2)^((j-1))cdotsd_(n-L_(s))^((j-1))=d_{1}^{(j-1)} d_{2}^{(j-1)} \cdots d_{n-L_{s}}^{(j-1)} into D i ( j ) = d 2 ( j 1 ) d 3 ( j 1 ) D i ( j ) = d 2 ( j 1 ) d 3 ( j 1 ) D_(i)^((j))=d_(2)^((j-1))d_(3)^((j-1))cdotsD_{i}^{(j)}=d_{2}^{(j-1)} d_{3}^{(j-1)} \cdots d n L s ( j 1 ) d p i ( j 1 ) = d 1 ( j ) d 2 ( j ) d n L s ( j ) d n L s ( j 1 ) d p i ( j 1 ) = d 1 ( j ) d 2 ( j ) d n L s ( j ) d_(n-L_(s))^((j-1))d_(p_(i))^((j-1))=d_(1)^((j))d_(2)^((j))cdotsd_(n-L_(s))^((j))d_{n-L_{s}}^{(j-1)} d_{p_{i}}^{(j-1)}=d_{1}^{(j)} d_{2}^{(j)} \cdots d_{n-L_{s}}^{(j)}. After these i 1 i 1 ℓ_(i)-1\ell_{i}-1 shifts are completed, shift once more, while feeding the last symbol of C i C i C_(i)C_{i} into stage n L s n L s n-L_(s)n-L_{s} of the buffer. It is easy to verify that the resulting load of the buffer contains S i S i S_(i)S_{i} in its last i = ( S i ) i = S i ℓ_(i)=ℓ(S_(i))\ell_{i}=\ell\left(S_{i}\right) positions.
类似地,如果 j i 1 j i 1 j <= ℓ_(i)-1j \leq \ell_{i}-1 ,第 j j jj 次移位将 D i ( j 1 ) D i ( j 1 ) D_(i)^((j-1))D_{i}^{(j-1)} = d 1 ( j 1 ) d 2 ( j 1 ) d n L s ( j 1 ) = d 1 ( j 1 ) d 2 ( j 1 ) d n L s ( j 1 ) =d_(1)^((j-1))d_(2)^((j-1))cdotsd_(n-L_(s))^((j-1))=d_{1}^{(j-1)} d_{2}^{(j-1)} \cdots d_{n-L_{s}}^{(j-1)} 转换为 D i ( j ) = d 2 ( j 1 ) d 3 ( j 1 ) D i ( j ) = d 2 ( j 1 ) d 3 ( j 1 ) D_(i)^((j))=d_(2)^((j-1))d_(3)^((j-1))cdotsD_{i}^{(j)}=d_{2}^{(j-1)} d_{3}^{(j-1)} \cdots d n L s ( j 1 ) d p i ( j 1 ) = d 1 ( j ) d 2 ( j ) d n L s ( j ) d n L s ( j 1 ) d p i ( j 1 ) = d 1 ( j ) d 2 ( j ) d n L s ( j ) d_(n-L_(s))^((j-1))d_(p_(i))^((j-1))=d_(1)^((j))d_(2)^((j))cdotsd_(n-L_(s))^((j))d_{n-L_{s}}^{(j-1)} d_{p_{i}}^{(j-1)}=d_{1}^{(j)} d_{2}^{(j)} \cdots d_{n-L_{s}}^{(j)} 。在这些 i 1 i 1 ℓ_(i)-1\ell_{i}-1 次移位完成后,再移位一次,同时将 C i C i C_(i)C_{i} 的最后一个符号输入缓冲区的 n L s n L s n-L_(s)n-L_{s} 级。很容易验证,缓冲区的最终负载在其最后 i = ( S i ) i = S i ℓ_(i)=ℓ(S_(i))\ell_{i}=\ell\left(S_{i}\right) 个位置包含 S i S i S_(i)S_{i}
The following example will serve to illustrate the mechanics of the algorithm. Consider the ternary ( α = 3 α = 3 alpha=3\alpha=3 ) input string
下面的示例将说明该算法的机制。考虑三进制( α = 3 α = 3 alpha=3\alpha=3 )输入字符串
S = 001010210210212021021200 S = 001010210210212021021200 S=001010210210212021021200 cdotsS=001010210210212021021200 \cdots
and an encoder with parameters L s = 9 L s = 9 L_(s)=9L_{s}=9 and n = 18 n = 18 n=18n=18. (These parameters were chosen to simplify the illustration; they do not reflect the design considerations to be discussed in Section III.) According to (1), the corresponding codeword length is given by
以及一个参数为 L s = 9 L s = 9 L_(s)=9L_{s}=9 n = 18 n = 18 n=18n=18 的编码器。(这些参数是为了简化说明而选择的;它们并不反映第三节中要讨论的设计考虑因素。)根据(1),相应的码字长度由
L c = 1 + log 3 ( 18 9 ) + log 3 9 = 5 . L c = 1 + log 3 ( 18 9 ) + log 3 9 = 5 . L_(c)=1+log_(3)(18-9)+log_(3)9=5.L_{c}=1+\log _{3}(18-9)+\log _{3} 9=5 .
Initially, the buffer is loaded with n L s = 9 n L s = 9 n-L_(s)=9n-L_{s}=9 zeros, followed by the first L s = 9 L s = 9 L_(s)=9L_{s}=9 digits of S S SS, namely,
最初,缓冲区被加载了 n L s = 9 n L s = 9 n-L_(s)=9n-L_{s}=9 个零,然后是 S S SS 的前 L s = 9 L s = 9 L_(s)=9L_{s}=9 个数字,即
B 1 = 000000000 n L s = 9 001010210 L s = 9 . B 1 = 000000000 n L s = 9 001010210 L s = 9 . B_(1)=ubrace(000000000ubrace)_(n-L_(s)=9)ubrace(001010210ubrace)_(L_(s)=9).B_{1}=\underbrace{000000000}_{n-L_{s}=9} \underbrace{001010210}_{L_{s}=9} .
To determine the first source word S 1 S 1 S_(1)S_{1}, we have to find the longest prefix B 1 ( 10 , 9 + 1 1 ) B 1 10 , 9 + 1 1 B_(1)(10,9+ℓ_(1)-1)B_{1}\left(10,9+\ell_{1}-1\right) of
要确定第一个源词语 S 1 S 1 S_(1)S_{1} ,我们必须找到最长的前缀 B 1 ( 10 , 9 + 1 1 ) B 1 10 , 9 + 1 1 B_(1)(10,9+ℓ_(1)-1)B_{1}\left(10,9+\ell_{1}-1\right)
B 1 ( 10 , 17 ) = 00101021 B 1 ( 10 , 17 ) = 00101021 B_(1)(10,17)=00101021B_{1}(10,17)=00101021
which matches a substring of B 1 B 1 B_(1)B_{1} that starts in position p 1 p 1 p_(1)p_{1} 9 9 <= 9\leq 9 and then set S 1 = B 1 ( 10 , 9 + 1 ) S 1 = B 1 10 , 9 + 1 S_(1)=B_(1)(10,9+ℓ_(1))S_{1}=B_{1}\left(10,9+\ell_{1}\right). It is easily seen that the longest match in this case is B 1 ( 10 , 11 ) = 00 B 1 ( 10 , 11 ) = 00 B_(1)(10,11)=00B_{1}(10,11)=00, and hence S 1 = 001 S 1 = 001 S_(1)=001S_{1}=001 and 1 = 3 1 = 3 ℓ_(1)=3\ell_{1}=3. The pointer p 1 p 1 p_(1)p_{1} for this step can be any integer between one and nine; we choose to set p 1 = 9 p 1 = 9 p_(1)=9p_{1}=9. The two-digit radix-3 representation of p 1 1 p 1 1 p_(1)-1p_{1}-1 is C 11 = 22 C 11 = 22 C_(11)=22C_{11}=22, and that of 1 1 1 1 ℓ_(1)-1\ell_{1}-1 is C 12 = 02 C 12 = 02 C_(12)=02C_{12}=02. Since C i 3 C i 3 C_(i3)C_{i 3} is always equal to the last symbol of S i S i S_(i)S_{i}, the codeword for S 1 S 1 S_(1)S_{1} is given by C 1 = C 1 = C_(1)=C_{1}= 22021.
该匹配 B 1 B 1 B_(1)B_{1} 的子字符串,从位置 p 1 p 1 p_(1)p_{1} 9 9 <= 9\leq 9 开始,然后设置 S 1 = B 1 ( 10 , 9 + 1 ) S 1 = B 1 10 , 9 + 1 S_(1)=B_(1)(10,9+ℓ_(1))S_{1}=B_{1}\left(10,9+\ell_{1}\right) 。很容易看出,在这种情况下最长的匹配是 B 1 ( 10 , 11 ) = 00 B 1 ( 10 , 11 ) = 00 B_(1)(10,11)=00B_{1}(10,11)=00 ,因此 S 1 = 001 S 1 = 001 S_(1)=001S_{1}=001 1 = 3 1 = 3 ℓ_(1)=3\ell_{1}=3 。这一步中指针 p 1 p 1 p_(1)p_{1} 可以是 1 到 9 之间的任意整数;我们选择设置 p 1 = 9 p 1 = 9 p_(1)=9p_{1}=9 p 1 1 p 1 1 p_(1)-1p_{1}-1 的二进制三进位表示是 C 11 = 22 C 11 = 22 C_(11)=22C_{11}=22 , 1 1 1 1 ℓ_(1)-1\ell_{1}-1 的二进制三进位表示是 C 12 = 02 C 12 = 02 C_(12)=02C_{12}=02 。由于 C i 3 C i 3 C_(i3)C_{i 3} 始终等于 S i S i S_(i)S_{i} 的最后一个符号, S 1 S 1 S_(1)S_{1} 的码字是由 C 1 = C 1 = C_(1)=C_{1}= 22021 给出的。
To obtain the buffer load B 2 B 2 B_(2)B_{2} for the second step, we shift out the first 1 = 3 1 = 3 ℓ_(1)=3\ell_{1}=3 digits of B 1 B 1 B_(1)B_{1} and feed in the next 3 digits S ( 10 , 12 ) = 210 S ( 10 , 12 ) = 210 S(10,12)=210S(10,12)=210 of the input string S S SS. The details of steps 2,3 , and 4 are tabulated below, where pointer positions are indicated by arrows and where the source words S i S i S_(i)S_{i} are indicated by the italic substring of the corresponding buffer load B i B i B_(i)B_{i}
要获得第二步的缓冲区负载 B 2 B 2 B_(2)B_{2} ,我们将 B 1 B 1 B_(1)B_{1} 的前 1 = 3 1 = 3 ℓ_(1)=3\ell_{1}=3 位移出并输入输入字符串 S S SS 的下 3 位 S ( 10 , 12 ) = 210 S ( 10 , 12 ) = 210 S(10,12)=210S(10,12)=210 。下表列出了步骤 2、3 和 4 的详细信息,其中指针位置由箭头指示,源词 S i S i S_(i)S_{i} 由相应缓冲区负荷 B i B i B_(i)B_{i} 的斜体子字符串表示。
B 2 = 000000001010210210 , C 2 = 21102 B 3 = 000010102102102120 , C 3 = 20212 . B 2 = 000000001010210210 , C 2 = 21102 B 3 = 000010102102102120 , C 3 = 20212 . {:[darr],[B_(2),=000000001010210210","],[darr,C_(2)=21102],[B_(3),=000010102102102120","]:}C_(3)=20212.\begin{array}{cc} \downarrow \\ B_{2} & =000000001010210210, \\ \downarrow & C_{2}=21102 \\ B_{3} & =000010102102102120, \end{array} C_{3}=20212 .

III. Compression of CONSTrAined Sources
受约束源的压缩

In this section, we investigate the performance of the proposed compression algorithm with respect to a nonprobabilistic model of constrained information sources. After defining the source model, we derive lower bounds on the compression ratios attainable by block-to-variable and variable-to-block codes under full knowledge of the source, and then show that the compression ratio achieved by our universal code approaches these bounds.
在本节中,我们研究了所提出的压缩算法在受约束信息源的非概率模型下的性能。在定义源模型后,我们推导出在完全了解源的情况下,块变长码和变长块码所能达到的压缩率下限,然后证明我们的通用码所实现的压缩率接近这些下限。

A. Definition of the Source Model
源模型的定义

Let A = { 0 , 1 , , α 1 } A = { 0 , 1 , , α 1 } A={0,1,cdots,alpha-1}A=\{0,1, \cdots, \alpha-1\} be the given α α alpha\alpha-symbol alphabet, and let A A A^(**)A^{*} denote the set of all finite strings over A A AA. Given a string S A S A S inA^(**)S \in A^{*} and a positive integer m ( S ) m ( S ) m <= ℓ(S)m \leq \ell(S), let S { m } S { m } S{m}S\{m\} denote the set of all substrings of length m m mm contained in S S SS, and let S ( m ) S ( m ) S(m)S(m) denote the cardinality of S { m } S { m } S{m}S\{m\}. That is,
A = { 0 , 1 , , α 1 } A = { 0 , 1 , , α 1 } A={0,1,cdots,alpha-1}A=\{0,1, \cdots, \alpha-1\} 为给定的 α α alpha\alpha 字符集, 令 A A A^(**)A^{*} 表示 A A AA 上的所有有限字符串的集合. 给定一个字符串 S A S A S inA^(**)S \in A^{*} 和一个正整数 m ( S ) m ( S ) m <= ℓ(S)m \leq \ell(S) , 令 S { m } S { m } S{m}S\{m\} 表示 S S SS 中长度为 m m mm 的所有子串的集合, 令 S ( m ) S ( m ) S(m)S(m) 表示 S { m } S { m } S{m}S\{m\} 的基数.
S { m } = i = 0 ( S ) m S ( i + 1 , i + m ) S { m } = i = 0 ( S ) m S ( i + 1 , i + m ) S{m}=uuu_(i=0)^(ℓ(S)-m)S(i+1,i+m)S\{m\}=\bigcup_{i=0}^{\ell(S)-m} S(i+1, i+m)
and 
S ( m ) = | S { m } | S ( m ) = | S { m } | S(m)=|S{m}|S(m)=|S\{m\}|
Given a subset σ σ sigma\sigma of A A A^(**)A^{*}, let
给定 σ σ sigma\sigma A A A^(**)A^{*} 的子集, 令
σ { m } = { S σ ( S ) = m } σ { m } = { S σ ( S ) = m } sigma{m}={S in sigma∣ℓ(S)=m}\sigma\{m\}=\{S \in \sigma \mid \ell(S)=m\}
and let σ ( m ) σ ( m ) sigma(m)\sigma(m) denote the cardinality of σ { m } σ { m } sigma{m}\sigma\{m\}.
并且让 σ ( m ) σ ( m ) sigma(m)\sigma(m) 表示 σ { m } σ { m } sigma{m}\sigma\{m\} 的基数。

A subset σ σ sigma\sigma of A A A^(**)A^{*} is called a source if the following three properties hold:
子集 σ σ sigma\sigma 称为源,如果满足以下三个属性:
  1. A σ A σ A sub sigmaA \subset \sigma (i.e., σ σ sigma\sigma contains all the unit length strings),
    A σ A σ A sub sigmaA \subset \sigma (即, σ σ sigma\sigma 包含所有单位长度的字符串),
  2. S σ S σ S in sigmaS \in \sigma implies S S σ S S σ SS in sigmaS S \in \sigma,
    S σ S σ S in sigmaS \in \sigma 意味着 S S σ S S σ SS in sigmaS S \in \sigma
  3. S σ S σ S in sigmaS \in \sigma implies S { m } σ { m } S { m } σ { m } S{m}sub sigma{m}S\{m\} \subset \sigma\{m\}.
    S σ S σ S in sigmaS \in \sigma 意味着 S { m } σ { m } S { m } σ { m } S{m}sub sigma{m}S\{m\} \subset \sigma\{m\}
Typically, such a source σ σ sigma\sigma is defined by specifying a finite set of strings over A A AA which are forbidden to appear as. substrings of elements belonging to σ σ sigma\sigma, and therefore σ ( m ) σ ( m ) sigma(m)\sigma(m) < α m < α m < alpha^(m)<\alpha^{m} for all m m mm exceeding some m 0 m 0 m_(0)m_{0}.
通常,这种来源 σ σ sigma\sigma 是通过指定一组有限的字符串,它们不能作为 σ σ sigma\sigma 中元素的子串而被定义的,因此 σ ( m ) σ ( m ) sigma(m)\sigma(m) < α m < α m < alpha^(m)<\alpha^{m} 所有 m m mm 超过某个 m 0 m 0 m_(0)m_{0}
With every source σ σ sigma\sigma, we associate a sequence h ( 1 ) h ( 1 ) h(1)h(1), h ( 2 ) , h ( 2 ) , h(2),cdotsh(2), \cdots of parameters, called the h h hh-parameters of σ σ sigma\sigma, where 1 1 ^(1){ }^{1}
对于每个源 σ σ sigma\sigma ,我们都会关联一个参数序列 h ( 1 ) h ( 1 ) h(1)h(1) , h ( 2 ) , h ( 2 ) , h(2),cdotsh(2), \cdots ,称为 h h hh 参数,其中 1 1 ^(1){ }^{1}
h ( m ) = 1 m log σ ( m ) , m = 1 , 2 , h ( m ) = 1 m log σ ( m ) , m = 1 , 2 , h(m)=(1)/(m)log sigma(m),quad m=1,2,cdotsh(m)=\frac{1}{m} \log \sigma(m), \quad m=1,2, \cdots
It is clear that 0 h ( m ) 1 0 h ( m ) 1 0 <= h(m) <= 10 \leq h(m) \leq 1 for all m m mm and, by 2 ) it is also clear that m h ( m ) m h ( m ) mh(m)m h(m) is a nondecreasing function of m m mm. The sequence of h h hh-parameters, however, is usually nonincreasing in m m mm. To avoid any possible confusion in the sequel, we postulate this property as an additional defining property of a source. Namely, we require
很明显, 0 h ( m ) 1 0 h ( m ) 1 0 <= h(m) <= 10 \leq h(m) \leq 1 对于所有 m m mm ,并且,根据 2)也很明显, m h ( m ) m h ( m ) mh(m)m h(m) m m mm 的非递减函数。但是, h h hh 参数的序列通常在 m m mm 中是非递增的。为了避免后续可能产生的任何困惑,我们将此特性作为源的额外定义属性来要求。

4) h ( m ) = 1 / m log σ ( m ) h ( m ) = 1 / m log σ ( m ) h(m)=1//m log sigma(m)h(m)=1 / m \log \sigma(m) is a nonincreasing function of m m mm.
4) h ( m ) = 1 / m log σ ( m ) h ( m ) = 1 / m log σ ( m ) h(m)=1//m log sigma(m)h(m)=1 / m \log \sigma(m) m m mm 的非递增函数。

B. Some Lower Bounds on the Compression Ratio
压缩率的一些下限

Consider a compression coding scheme for a source σ σ sigma\sigma which employs a block-to-variable ( B V B V BVB V ) code book of M M MM pairs ( X i , Y i ) X i , Y i (X_(i),Y_(i))\left(X_{i}, Y_{i}\right) of words over A A AA, with ( X i ) = L X i = L ℓ(X_(i))=L\ell\left(X_{i}\right)=L for i = i = i=i= 1 , 2 , , M 1 , 2 , , M 1,2,cdots,M1,2, \cdots, M. The encoding of an infinitely long string S S S inS \in σ σ sigma\sigma by such a code is carried out by first parsing S S SS into blocks of length L L LL, and then replacing each block X i X i X_(i)X_{i} by the corresponding codeword Y i Y i Y_(i)Y_{i}. It is assumed, of course, that the code book is exhaustive with respect to σ σ sigma\sigma and uniquely decipherable [2]. Hence, we must have
考虑一个源码 σ σ sigma\sigma 的压缩编码方案,它使用了一个块到变长( B V B V BVB V )代码本,其中包括 M M MM ( X i , Y i ) X i , Y i (X_(i),Y_(i))\left(X_{i}, Y_{i}\right) 单词,这些单词来自 A A AA ,且 ( X i ) = L X i = L ℓ(X_(i))=L\ell\left(X_{i}\right)=L i = i = i=i= 1 , 2 , , M 1 , 2 , , M 1,2,cdots,M1,2, \cdots, M 。通过这种编码方式,无限长的字符串 S S S inS \in σ σ sigma\sigma 会被编码成由长度为 L L LL 的块 X i X i X_(i)X_{i} 组成,每个块被替换为相应的代码字 Y i Y i Y_(i)Y_{i} 。当然,假设代码本针对 σ σ sigma\sigma 是充分的,并且是唯一可解的[2]。因此,我们必须有:
{ X i } i = 1 M = σ { L } X i i = 1 M = σ { L } {X_(i)}_(i=1)^(M)=sigma{L}\left\{X_{i}\right\}_{i=1}^{M}=\sigma\{L\}
or 
M = σ ( L ) = α L h ( L ) M = σ ( L ) = α L h ( L ) M=sigma(L)=alpha^(Lh(L))M=\sigma(L)=\alpha^{L h(L)}
and 
max 1 i M { ( Y i ) } log M = L h ( L ) max 1 i M Y i log M = L h ( L ) max_(1 <= i <= M){ℓ(Y_(i))} >= log M=Lh(L)\max _{1 \leq i \leq M}\left\{\ell\left(Y_{i}\right)\right\} \geq \log M=L h(L)
The compression ratio ρ i ρ i rho_(i)\rho_{i} associated with the i i ii th wordpair of the code is given by
相关于代码的第 i i ii 对单词对的压缩比为
ρ i = ( Y i ) L ρ i = Y i L rho_(i)=(ℓ(Y_(i)))/(L)\rho_{i}=\frac{\ell\left(Y_{i}\right)}{L}
The B V B V BVB V compression ratio, ρ B V ( σ , M ) ρ B V ( σ , M ) rho_(BV)(sigma,M)\rho_{B V}(\sigma, M), of the source σ σ sigma\sigma. is defined as the minimax value of ρ i ρ i rho_(i)\rho_{i}, where the maximization is over all word-pairs of a given code, and the minimization is over the set C B V ( σ , M ) C B V ( σ , M ) C_(BV)(sigma,M)C_{B V}(\sigma, M) of all B V B V BVB V code books consisting of M M MM word-pairs. Thus,
源码 σ σ sigma\sigma B V B V BVB V 压缩比 ρ B V ( σ , M ) ρ B V ( σ , M ) rho_(BV)(sigma,M)\rho_{B V}(\sigma, M) 定义为 ρ i ρ i rho_(i)\rho_{i} 的极小极大值,其中最大化是针对给定编码的所有词对,而最小化是针对所有由 M M MM 个词对组成的 B V B V BVB V 编码本集合 C B V ( σ , M ) C B V ( σ , M ) C_(BV)(sigma,M)C_{B V}(\sigma, M)
ρ B V ( σ , M ) = min C B V ( σ , M ) max 1 i M { ( Y i ) L } log M L = L h ( L ) L = h ( L ) 1 Throughout this paper, log x means the base- α logarithm of x . ρ B V ( σ , M ) = min C B V ( σ , M ) max 1 i M Y i L log M L = L h ( L ) L = h ( L ) 1  Throughout this paper, log  x  means the base-  α  logarithm of  x . {:[{:[rho_(BV)(sigma","M)=min_(C_(BV)(sigma,M))max_(1 <= i <= M){(ℓ(Y_(i)))/(L)}],[ >= (log M)/(L)=(Lh(L))/(L)=h(L)]:}],[^(1)" Throughout this paper, log "x" means the base- "alpha" logarithm of "x.]:}\begin{aligned} & \begin{aligned} \rho_{B V}(\sigma, M)= & \min _{C_{B V}(\sigma, M)} \max _{1 \leq i \leq M}\left\{\frac{\ell\left(Y_{i}\right)}{L}\right\} \\ & \geq \frac{\log M}{L}=\frac{L h(L)}{L}=h(L) \end{aligned} \\ & { }^{1} \text { Throughout this paper, log } x \text { means the base- } \alpha \text { logarithm of } x . \end{aligned}
For later reference, we record this result in the following lemma.
为了后续参考,我们在下面的引理中记录了这个结果。

Lemma 1: 引理 1:

ρ B V ( σ , M ) h ( L ) ρ B V ( σ , M ) h ( L ) rho_(BV)(sigma,M) >= h(L)\rho_{B V}(\sigma, M) \geq h(L)
where 哪里
L h ( L ) = log M L h ( L ) = log M Lh(L)=log ML h(L)=\log M
Now, consider a compression scheme which employs a variable-to-block ( V B V B VBV B ) code book of M M MM word-pairs ( X i , Y i X i , Y i X_(i),Y_(i)X_{i}, Y_{i} ), with ( Y i ) = L Y i = L ℓ(Y_(i))=L\ell\left(Y_{i}\right)=L for all i = 1 , 2 , , M i = 1 , 2 , , M i=1,2,cdots,Mi=1,2, \cdots, M. In this case, the compression ratio associated with the i i ii th word-pair is given by
现在,考虑一个使用可变到块(v-b)代码簿的压缩方案,其中包含 n 个单词对,其中所有单词的长度均为 m。在这种情况下,与第 k 个单词对相关的压缩比给出为
ρ i = L ( X i ) ρ i = L X i rho_(i)=(L)/(ℓ(X_(i))^('))\rho_{i}=\frac{L}{\ell\left(X_{i}\right)^{\prime}}
and similarly, the V B V B VBV B compression ratio ρ V B ( σ , M ) ρ V B ( σ , M ) rho_(VB)(sigma,M)\rho_{V B}(\sigma, M) of σ σ sigma\sigma is defined as the minimax value of ρ i ρ i rho_(i)\rho_{i} over all word-pairs and over the set C V B ( σ , M ) C V B ( σ , M ) C_(VB)(sigma,M)C_{V B}(\sigma, M) of V B V B VBV B code books with M M MM wordpairs.
同样地, V B V B VBV B 压缩比 ρ V B ( σ , M ) ρ V B ( σ , M ) rho_(VB)(sigma,M)\rho_{V B}(\sigma, M) 被定义为 σ σ sigma\sigma 上的极值,其中涉及所有单词对以及 C V B ( σ , M ) C V B ( σ , M ) C_(VB)(sigma,M)C_{V B}(\sigma, M) 中的 V B V B VBV B 个代码本,每个代码本包含 M M MM 个单词对。
Lemma 2: 引理 2:
ρ V B ( σ , M ) h ( L M ) ρ V B ( σ , M ) h L M rho_(VB)(sigma,M) >= h(L_(M))\rho_{V B}(\sigma, M) \geq h\left(L_{M}\right)
where 哪里
L M = max { M σ ( ) } L M = max { M σ ( ) } L_(M)=max{ℓ∣M >= sigma(ℓ)}L_{M}=\max \{\ell \mid M \geq \sigma(\ell)\}
Proof: We may assume, without loss of generality, that in every code under consideration
我们可以设无损概括地说,在每个考虑的代码中
( X 1 ) ( X 2 ) ( X M ) X 1 X 2 X M ℓ(X_(1)) <= ℓ(X_(2)) <= cdots <= ℓ(X_(M))\ell\left(X_{1}\right) \leq \ell\left(X_{2}\right) \leq \cdots \leq \ell\left(X_{M}\right)
and hence for each C C V B ( σ , M ) C C V B ( σ , M ) C inC_(VB)(sigma,M)C \in C_{V B}(\sigma, M),
因此对于每个 C C V B ( σ , M ) C C V B ( σ , M ) C inC_(VB)(sigma,M)C \in C_{V B}(\sigma, M)
max ρ i ( C ) = L ( C ) ( X 1 ) max ρ i ( C ) = L ( C ) X 1 maxrho_(i)(C)=(L(C))/(ℓ(X_(1)))\max \rho_{i}(C)=\frac{L(C)}{\ell\left(X_{1}\right)}
Since C C CC is exhaustive with respect to σ σ sigma\sigma, we have
由于 C C CC σ σ sigma\sigma 是广泛的,我们有
M σ ( 1 ) M σ 1 M >= sigma(ℓ_(1))M \geq \sigma\left(\ell_{1}\right)
where 1 = ( X 1 ) 1 = X 1 ℓ_(1)=ℓ(X_(1))\ell_{1}=\ell\left(X_{1}\right); and since C C CC is uniquely decipherable, we have
1 = ( X 1 ) 1 = X 1 ℓ_(1)=ℓ(X_(1))\ell_{1}=\ell\left(X_{1}\right) ;由于 C C CC 是唯一可解码的,我们有
L ( C ) log M L ( C ) log M L(C) >= log ML(C) \geq \log M
From the definition of L M L M L_(M)L_{M}, inequality (6), and the nondecreasing property of σ ( ) σ ( ) sigma(ℓ)\sigma(\ell), we obtain
L M L M L_(M)L_{M} 的定义、不等式(6)和 σ ( ) σ ( ) sigma(ℓ)\sigma(\ell) 的非递减性质可以得出
1 L M 1 L M ℓ_(1) <= L_(M)\ell_{1} \leq L_{M}
From (5), (7), and (8), we have
从(5)、(7)和(8)式中,我们有
max ρ i ( C ) log M L M max ρ i ( C ) log M L M maxrho_(i)(C) >= (log M)/(L_(M))\max \rho_{i}(C) \geq \frac{\log M}{L_{M}}
and since 而且
M σ ( L M ) = α L M h ( L M ) M σ L M = α L M h L M M >= sigma(L_(M))=alpha^(L_(M)h(L_(M)))M \geq \sigma\left(L_{M}\right)=\alpha^{L_{M} h\left(L_{M}\right)}
it follows that for each C C V B ( σ , M ) C C V B ( σ , M ) C inC_(VB)(sigma,M)C \in C_{V B}(\sigma, M),
因此对于每个 C C V B ( σ , M ) C C V B ( σ , M ) C inC_(VB)(sigma,M)C \in C_{V B}(\sigma, M) 而言,
max ρ i ( C ) h ( L M ) max ρ i ( C ) h L M maxrho_(i)(C) >= h(L_(M))\max \rho_{i}(C) \geq h\left(L_{M}\right)
Q.E.D. 证明完毕。

Remarks 评论

  1. Since the value of L L LL in the context of Lemma 1 satisfies the definition of L M L M L_(M)L_{M} as given in Lemma 2, it follows that the bounds of both lemmas are essentially the same, despite the basic difference between the respective coding schemes.
    鉴于在引理 1 的上下文中 L L LL 的值满足引理 2 给出的 L M L M L_(M)L_{M} 的定义,因此两个引理的界限本质上是相同的,尽管它们各自的编码方案存在基本差异。
  2. By 2), the second defining property of a source, the per-word bounds derived above apply to indefinitely long messages as well, since the whole message may consist of repeated appearances of the same worst case word.
    根据第二个定义的源特性,上述推导的逐词界限也适用于无限长的消息,因为整个消息可能由同一个最坏情况词的重复出现组成。
  3. By 4), the nonincreasing property of the h h hh-parameters, the form of the derived bounds confirms the intuitive expectation that an increase in the size M M MM of the employed code book causes a decrease in the lower bound on the attainable compression ratio.
    根据 4)中 h h hh 参数的非增性质,衍生界限的形式确认了直观预期,即所采用的代码本尺寸 M M MM 增加会导致可实现压缩比的下界降低。

C. Performance of the Proposed Algorithm
提议算法的性能

We proceed now to derive an upper bound on the compression ratio attainable by the algorithm of Section II. To this end, we consider the worst case source message of length n L s n L s n-L_(s)n-L_{s}, where n n nn is the prospective buffer length and L s L s L_(s)L_{s} is the maximal word-length. The bound obtained for this case will obviously apply to all messages of length n n nn - L s L s L_(s)L_{s} or greater.
我们现在要推导第二节算法所能达到的最大压缩比。为此,我们考虑长度为 n L s n L s n-L_(s)n-L_{s}
First, we assume that only the h h hh-parameters of the source under consideration are known to the designer of the encoder. Later, when we discuss the universal performance of the proposed algorithm, we will show that even this restricted a a aa priori knowledge of the source is actually unessential.
首先,我们假设只有 h h hh 参数对于编码器的设计者是已知的。当我们讨论所提出算法的整体性能时,我们将证明即使这种受限的 a a aa 先验知识也是不必要的。
We begin by choosing the buffer length n n nn to be an integer of the form
我们首先选择缓冲区长度 n n nn 为以下形式的整数
n = m = 1 λ m α m + m = λ + 1 m σ ( ) + ( + 1 ) ( N + 1 + 1 ) n = m = 1 λ m α m + m = λ + 1 m σ ( ) + ( + 1 ) N + 1 + 1 n=sum_(m=1)^(lambda)malpha^(m)+sum_(m=lambda+1)^(ℓ)m sigma(ℓ)+(ℓ+1)(N_(ℓ+1)+1)n=\sum_{m=1}^{\lambda} m \alpha^{m}+\sum_{m=\lambda+1}^{\ell} m \sigma(\ell)+(\ell+1)\left(N_{\ell+1}+1\right)
where 哪里
N + 1 = m = 1 λ ( m ) α m + m = λ + 1 ( m ) σ ( ) N + 1 = m = 1 λ ( m ) α m + m = λ + 1 ( m ) σ ( ) N_(ℓ+1)=sum_(m=1)^(lambda)(ℓ-m)alpha^(m)+sum_(m=lambda+1)^(ℓ)(ℓ-m)sigma(ℓ)N_{\ell+1}=\sum_{m=1}^{\lambda}(\ell-m) \alpha^{m}+\sum_{m=\lambda+1}^{\ell}(\ell-m) \sigma(\ell)
λ = h ( ) λ = h ( ) lambda=|__ℓh(ℓ)__|\lambda=\lfloor\ell h(\ell)\rfloor, the integer part of log σ ( ) = h ( ) log σ ( ) = h ( ) log sigma(ℓ)=ℓh(ℓ)\log \sigma(\ell)=\ell h(\ell), and
λ = h ( ) λ = h ( ) lambda=|__ℓh(ℓ)__|\lambda=\lfloor\ell h(\ell)\rfloor log σ ( ) = h ( ) log σ ( ) = h ( ) log sigma(ℓ)=ℓh(ℓ)\log \sigma(\ell)=\ell h(\ell) 的整数部分,和
= L s 1 = L s 1 ℓ=L_(s)-1\ell=L_{s}-1
The specific valuc of the parameter L s L s L_(s)L_{s} is left for later determination (see the first remark following the proof of Theorem 1). The reasoning that motivates the given form of n n nn will become clear from subsequent derivations.
参数 L s L s L_(s)L_{s} 的具体值将留待以后确定(见定理 1 证明后的第一条备注)。给定 n n nn 形式的推导过程将在后续推导中阐明。
Consider a string S σ { n L s } S σ n L s S in sigma{n-L_(s)}S \in \sigma\left\{n-L_{s}\right\}, and let N ( S ) N ( S ) N(S)N(S) denote the number of words into which S S SS is parsed by the algorithm during the encoding process. Recalling that each of these words is mapped into a codeword of fixed length L c L c L_(c)L_{c} (see (1)), it follows that the compression ratio ρ ( S ) ρ ( S ) rho(S)\rho(S) associated with the string S S SS is given by
考虑一个字符串 S σ { n L s } S σ n L s S in sigma{n-L_(s)}S \in \sigma\left\{n-L_{s}\right\} ,令 N ( S ) N ( S ) N(S)N(S) 表示在编码过程中算法将其解析为的字词数。回顾每个字词都被映射为固定长度 L c L c L_(c)L_{c} 的码字(见(1)),则与字符串 S S SS 相关的压缩比 ρ ( S ) ρ ( S ) rho(S)\rho(S)
ρ ( S ) = L c n L s N ( S ) . ρ ( S ) = L c n L s N ( S ) . rho(S)=(L_(c))/(n-L_(s))N(S).\rho(S)=\frac{L_{c}}{n-L_{s}} N(S) .
Hence, the compression ratio ρ ρ rho\rho attainable by the algorithm for a given source σ σ sigma\sigma is
因此,算法对于给定的源代码 σ σ sigma\sigma 所能实现的压缩率为 ρ ρ rho\rho
ρ = L c n L s N ρ = L c n L s N rho=(L_(c))/(n-L_(s))N\rho=\frac{L_{c}}{n-L_{s}} N
where 哪里
N = max S σ { n L s } N ( S ) N = max S σ n L s N ( S ) N=max_(S in sigma{n-L_(s)})N(S)N=\max _{S \in \sigma\left\{n-L_{s}\right\}} N(S)
Let Q σ { n L s } Q σ n L s Q in sigma{n-L_(s)}Q \in \sigma\left\{n-L_{s}\right\} be such that N ( Q ) = N N ( Q ) = N N(Q)=NN(Q)=N, and suppose that the algorithm parses Q Q QQ into Q = Q 1 Q 2 Q N Q = Q 1 Q 2 Q N Q=Q_(1)Q_(2)cdotsQ_(N)Q=Q_{1} Q_{2} \cdots Q_{N}. From step 2) of the encoding cycle it follows that if ( Q i ) = ( Q j ) Q i = Q j ℓ(Q_(i))=ℓ(Q_(j))\ell\left(Q_{i}\right)=\ell\left(Q_{j}\right) < L s < L s < L_(s)<L_{s}, for some i < j < N i < j < N i < j < Ni<j<N, then Q i Q j Q i Q j Q_(i)!=Q_(j)Q_{i} \neq Q_{j}. (Note that when Q j Q j Q_(j)Q_{j} is being determined at the j j jj th cycle of the encoding process, all of Q i Q i Q_(i)Q_{i} is still stored in the buffer, and since ( Q j ) Q j ℓ(Q_(j))\ell\left(Q_{j}\right) < L s < L s < L_(s)<L_{s}, the longest substring in the buffer that precedes Q j Q j Q_(j)Q_{j} and is a prefix of Q j Q j Q_(j)Q_{j} musi be of length ( Q j ) 1 Q j 1 ℓ(Q_(j))-1\ell\left(Q_{j}\right)-1.)
Q σ { n L s } Q σ n L s Q in sigma{n-L_(s)}Q \in \sigma\left\{n-L_{s}\right\} 满足 N ( Q ) = N N ( Q ) = N N(Q)=NN(Q)=N ,假设算法将 Q Q QQ 解析为 Q = Q 1 Q 2 Q N Q = Q 1 Q 2 Q N Q=Q_(1)Q_(2)cdotsQ_(N)Q=Q_{1} Q_{2} \cdots Q_{N} 。根据编码循环的第 2 步可知,如果 ( Q i ) = ( Q j ) Q i = Q j ℓ(Q_(i))=ℓ(Q_(j))\ell\left(Q_{i}\right)=\ell\left(Q_{j}\right) < L s < L s < L_(s)<L_{s} ,对于某些 i < j < N i < j < N i < j < Ni<j<N ,则 Q i Q j Q i Q j Q_(i)!=Q_(j)Q_{i} \neq Q_{j} 。(注意,当在编码过程的第 j j jj 个循环中确定 Q j Q j Q_(j)Q_{j} 时,所有 Q i Q i Q_(i)Q_{i} 仍存储在缓冲区中,并且由于 ( Q j ) Q j ℓ(Q_(j))\ell\left(Q_{j}\right) < L s < L s < L_(s)<L_{s} ,缓冲区中最长的前导 Q j Q j Q_(j)Q_{j} 的前缀子串长度必须为 ( Q j ) 1 Q j 1 ℓ(Q_(j))-1\ell\left(Q_{j}\right)-1 。)
Denoting by K m K m K_(m)K_{m} the number of Q i , 1 i N 1 Q i , 1 i N 1 Q_(i),1 <= i <= N-1Q_{i}, 1 \leq i \leq N-1, of length m , 1 m L s m , 1 m L s m,1 <= m <= L_(s)m, 1 \leq m \leq L_{s}, we have
K m K m K_(m)K_{m} 表示 Q i , 1 i N 1 Q i , 1 i N 1 Q_(i),1 <= i <= N-1Q_{i}, 1 \leq i \leq N-1 的数量,长度为 m , 1 m L s m , 1 m L s m,1 <= m <= L_(s)m, 1 \leq m \leq L_{s} ,那么
N = 1 + m = 1 L s K m N = 1 + m = 1 L s K m N=1+sum_(m=1)^(L_(s))K_(m)N=1+\sum_{m=1}^{L_{\mathrm{s}}} K_{m}
By the above argument, and by property 3) of the source, we have
通过上述论证以及根据源的属性 3),我们有
K m σ ( m ) , for 1 m = L s 1 K m σ ( m ) ,  for  1 m = L s 1 K_(m) <= sigma(m),quad" for "1 <= m <= ℓ=L_(s)-1K_{m} \leq \sigma(m), \quad \text { for } 1 \leq m \leq \ell=L_{s}-1
Since 自从
n L s = ( Q N ) + m = 1 + 1 m K m n L s = Q N + m = 1 + 1 m K m n-L_(s)=ℓ(Q_(N))+sum_(m=1)^(ℓ+1)mK_(m)n-L_{s}=\ell\left(Q_{N}\right)+\sum_{m=1}^{\ell+1} m K_{m}
and n n nn and L s L s L_(s)L_{s} are both fixed, it is clear that by overestimating the values of K m K m K_(m)K_{m} for 1 m 1 m 1 <= m <= ℓ1 \leq m \leq \ell at the expense of K + 1 K + 1 K_(ℓ+1)K_{\ell+1}, we can only overestimate the value of N N NN. Therefore, since σ ( m ) σ ( m + 1 ) σ ( m ) σ ( m + 1 ) sigma(m) <= sigma(m+1)\sigma(m) \leq \sigma(m+1) and σ ( m ) = α m h ( m ) α m σ ( m ) = α m h ( m ) α m sigma(m)=alpha^(mh(m)) <= alpha^(m)\sigma(m)=\alpha^{m h(m)} \leq \alpha^{m}, we obtain
n n nn L s L s L_(s)L_{s} 都是固定的,很明显,通过高估 1 m 1 m 1 <= m <= ℓ1 \leq m \leq \ell K m K m K_(m)K_{m} 值来牺牲 K + 1 K + 1 K_(ℓ+1)K_{\ell+1} ,我们只能高估 N N NN 的值。因此,既然 σ ( m ) σ ( m + 1 ) σ ( m ) σ ( m + 1 ) sigma(m) <= sigma(m+1)\sigma(m) \leq \sigma(m+1) σ ( m ) = α m h ( m ) α m σ ( m ) = α m h ( m ) α m sigma(m)=alpha^(mh(m)) <= alpha^(m)\sigma(m)=\alpha^{m h(m)} \leq \alpha^{m} ,我们得到
N K + 1 + m = 1 K m = N N K + 1 + m = 1 K m = N N <= K_(ℓ+1)^(')+sum_(m=1)^(ℓ)K_(m)^(')=N^(')N \leq K_{\ell+1}^{\prime}+\sum_{m=1}^{\ell} K_{m}^{\prime}=N^{\prime}
where 哪里
K m = { α m , for 1 m λ = h ( ) σ ( ) , for λ < m K m = α m ,       for  1 m λ = h ( ) σ ( ) ,       for  λ < m K_(m)^(')={[alpha^(m)","," for "1 <= m <= lambda=|__ℓh(ℓ)__|],[sigma(ℓ)","," for "lambda < m <= ℓ]:}K_{m}^{\prime}= \begin{cases}\alpha^{m}, & \text { for } 1 \leq m \leq \lambda=\lfloor\ell h(\ell)\rfloor \\ \sigma(\ell), & \text { for } \lambda<m \leq \ell\end{cases}
and 
K + 1 = 1 + 1 ( n L s m = 1 m K m ) . K + 1 = 1 + 1 n L s m = 1 m K m . K_(ℓ+1)^(')=|~(1)/(ℓ+1)(n-L_(s)-sum_(m=1)^(ℓ)mK_(m)^('))~|.K_{\ell+1}^{\prime}=\left\lceil\frac{1}{\ell+1}\left(n-L_{s}-\sum_{m=1}^{\ell} m K_{m}^{\prime}\right)\right\rceil .
From (14), (15), and (9), we obtain K + 1 = N + 1 K + 1 = N + 1 K_(ℓ+1)^(')=N_(ℓ+1)K_{\ell+1}^{\prime}=N_{\ell+1}, and
从(14)、(15)和(9)式中得到 K + 1 = N + 1 K + 1 = N + 1 K_(ℓ+1)^(')=N_(ℓ+1)K_{\ell+1}^{\prime}=N_{\ell+1}
N = N + 1 + m = 1 λ α m + m = λ + 1 σ ( ) N = N + 1 + m = 1 λ α m + m = λ + 1 σ ( ) N^(')=N_(ℓ+1)+sum_(m=1)^(lambda)alpha^(m)+sum_(m=lambda+1)^(ℓ)sigma(ℓ)N^{\prime}=N_{\ell+1}+\sum_{m=1}^{\lambda} \alpha^{m}+\sum_{m=\lambda+1}^{\ell} \sigma(\ell)
which, together with (9) and (10), yields
与(9)和(10)一起产生的
n L s N = m = 1 λ m α m + m = λ + 1 m σ ( ) + N + 1 [ m = 1 λ α m + m = λ + 1 σ ( ) ] = 0 n L s N = m = 1 λ m α m + m = λ + 1 m σ ( ) + N + 1 m = 1 λ α m + m = λ + 1 σ ( ) = 0 {:[n-L_(s)-ℓN^(')=sum_(m=1)^(lambda)malpha^(m)+sum_(m=lambda+1)^(ℓ)m sigma(ℓ)],[+N_(ℓ+1)-ℓ[sum_(m=1)^(lambda)alpha^(m)+sum_(m=lambda+1)^(ℓ)sigma(ℓ)]=0]:}\begin{aligned} n-L_{s}-\ell N^{\prime} & =\sum_{m=1}^{\lambda} m \alpha^{m}+\sum_{m=\lambda+1}^{\ell} m \sigma(\ell) \\ & +N_{\ell+1}-\ell\left[\sum_{m=1}^{\lambda} \alpha^{m}+\sum_{m=\lambda+1}^{\ell} \sigma(\ell)\right]=0 \end{aligned}
or 
N N = n L s = n L s L s 1 . N N = n L s = n L s L s 1 . N <= N^(')=(n-L_(s))/(ℓ)=(n-L_(s))/(L_(s)-1).N \leq N^{\prime}=\frac{n-L_{s}}{\ell}=\frac{n-L_{s}}{L_{s}-1} .
Hence, from (12) and (16), we have
因此,根据(12)和(16),我们得到
ρ L c = L c L s 1 . ρ L c = L c L s 1 . rho <= (L_(c))/(ℓ)=(L_(c))/(L_(s)-1).\rho \leq \frac{L_{c}}{\ell}=\frac{L_{c}}{L_{s}-1} .
Note that despite the rater rudimentary overestimation of N N NN by N N N^(')N^{\prime}, the upper bound of (17) is quite tight, since the fact that no source word is longer than L s L s L_(s)L_{s} immediately implies ρ L c / L s ρ L c / L s rho >= L_(c)//L_(s)\rho \geq L_{c} / L_{s}.
尽管 N N N^(')N^{\prime} N N NN 有相当粗糙的过度估计,但式(17)的上界是相当紧的,因为没有源词长于 L s L s L_(s)L_{s} 这一事实意味着 ρ L c / L s ρ L c / L s rho >= L_(c)//L_(s)\rho \geq L_{c} / L_{s}
We can state now the following result.
我们现在可以陈述以下结果。

Theorem 1: If the buffer length n n nn for a source with known h h hh-parameters is chosen according to (9), then
定理 1:如果对于已知参数的源,缓冲区长度 n n nn 按照公式 (9) 选择,那么
ρ h ( L s 1 ) + ϵ ( L s ) ρ h L s 1 + ϵ L s rho <= h(L_(s)-1)+epsilon(L_(s))\rho \leq h\left(L_{s}-1\right)+\epsilon\left(L_{s}\right)
where 哪里
ϵ ( L s ) = 1 L s 1 ( 3 + 3 log ( L s 1 ) + log L s 2 ) . ϵ L s = 1 L s 1 3 + 3 log L s 1 + log L s 2 . epsilon(L_(s))=(1)/(L_(s)-1)(3+3log(L_(s)-1)+log((L_(s))/(2))).\epsilon\left(L_{s}\right)=\frac{1}{L_{s}-1}\left(3+3 \log \left(L_{s}-1\right)+\log \frac{L_{s}}{2}\right) .
Proof: From (1) we have
证明:从(1)式可得
L c = 1 + log L s + log ( n L s ) 3 + log ( L s 1 ) + log ( n L s ) L c = 1 + log L s + log n L s 3 + log L s 1 + log n L s {:[L_(c)=1+|~log L_(s)~|+|~{: log(n-L_(s))~|],[ <= 3+log(L_(s)-1)+log(n-L_(s))]:}\begin{aligned} L_{c}=1+\left\lceil\log L_{s}\right\rceil+\lceil & \left.\log \left(n-L_{s}\right)\right\rceil \\ & \leq 3+\log \left(L_{s}-1\right)+\log \left(n-L_{s}\right) \end{aligned}
From (9) and (10) we obtain
根据(9)和(10)式,我们得到
n L s = [ m = 1 λ ( m ) α m + m = λ + 1 ( m ) σ ( ) + m = 1 λ α m + m = λ + 1 σ ( ) ] n L s = m = 1 λ ( m ) α m + m = λ + 1 ( m ) σ ( ) + m = 1 λ α m + m = λ + 1 σ ( ) {:[n-L_(s)=ℓ[sum_(m=1)^(lambda)(ℓ-m)alpha^(m):}+sum_(m=lambda+1)^(ℓ)(ℓ-m)sigma(ℓ)],[{:+sum_(m=1)^(lambda)alpha^(m)+sum_(m=lambda+1)^(ℓ)sigma(ℓ)]]:}\begin{aligned} n-L_{s}=\ell\left[\sum_{m=1}^{\lambda}(\ell-m) \alpha^{m}\right. & +\sum_{m=\lambda+1}^{\ell}(\ell-m) \sigma(\ell) \\ & \left.+\sum_{m=1}^{\lambda} \alpha^{m}+\sum_{m=\lambda+1}^{\ell} \sigma(\ell)\right] \end{aligned}
and since α m σ ( ) α m σ ( ) alpha^(m) <= sigma(ℓ)\alpha^{m} \leq \sigma(\ell), for 1 m λ 1 m λ 1 <= m <= lambda1 \leq m \leq \lambda, we have
并且自从 α m σ ( ) α m σ ( ) alpha^(m) <= sigma(ℓ)\alpha^{m} \leq \sigma(\ell) ,为了 1 m λ 1 m λ 1 <= m <= lambda1 \leq m \leq \lambda ,我们有
n L s σ ( ) m = 1 ( m + 1 ) = 1 2 2 ( + 1 ) σ ( ) n L s σ ( ) m = 1 ( m + 1 ) = 1 2 2 ( + 1 ) σ ( ) n-L_(s) <= ℓsigma(ℓ)sum_(m=1)^(ℓ)(ℓ-m+1)=(1)/(2)ℓ^(2)(ℓ+1)sigma(ℓ)n-L_{s} \leq \ell \sigma(\ell) \sum_{m=1}^{\ell}(\ell-m+1)=\frac{1}{2} \ell^{2}(\ell+1) \sigma(\ell)
or 
log ( n L s ) 2 log + log + 1 2 + h ( ) log n L s 2 log + log + 1 2 + h ( ) log(n-L_(s)) <= 2logℓ+log((ℓ+1)/(2))+ℓh(ℓ)\log \left(n-L_{s}\right) \leq 2 \log \ell+\log \frac{\ell+1}{2}+\ell h(\ell)
Since = L s 1 = L s 1 ℓ=L_(s)-1\ell=L_{s}-1, we obtain
自从 = L s 1 = L s 1 ℓ=L_(s)-1\ell=L_{s}-1 以来,我们得到
L c 3 + 3 log ( L s 1 ) + log L s 2 + ( L s 1 ) h ( L s 1 ) L c 3 + 3 log L s 1 + log L s 2 + L s 1 h L s 1 L_(c) <= 3+3log(L_(s)-1)+log((L_(s))/(2))+(L_(s)-1)h(L_(s)-1)L_{c} \leq 3+3 \log \left(L_{s}-1\right)+\log \frac{L_{s}}{2}+\left(L_{s}-1\right) h\left(L_{s}-1\right)
or 
L c ( L s 1 ) [ h ( L s 1 ) + ϵ ( L s ) ] . L c L s 1 h L s 1 + ϵ L s . L_(c) <= (L_(s)-1)[h(L_(s)-1)+epsilon(L_(s))].L_{c} \leq\left(L_{s}-1\right)\left[h\left(L_{s}-1\right)+\epsilon\left(L_{s}\right)\right] .
Substituting this result into (17), we obtain the bound of Theorem 1.
将这个结果代入(17)式,我们就得到定理 1 的界限。

Q.E.D. 证明完毕。

Remarks 评论

  1. The value of ϵ ( L s ) ϵ L s epsilon(L_(s))\epsilon\left(L_{s}\right) decreases with L s L s L_(s)L_{s} and, consequently, the compression ratio ρ ρ rho\rho approaches the value of h ( L s 1 ) h L s 1 h(L_(s)-1)h\left(L_{s}-1\right), the h h hh-parameter associated with the second largest word-length processed by the encoder. Given any δ > 0 δ > 0 delta > 0\delta>0, one can always find the least integer s s ℓ_(s)\ell_{s} such that ρ h ( s 1 ) δ ρ h s 1 δ rho-h(ℓ_(s)-1) <= delta\rho-h\left(\ell_{s}-1\right) \leq \delta. The magnitude of the acceptable de-
    ϵ ( L s ) ϵ L s epsilon(L_(s))\epsilon\left(L_{s}\right) L s L s L_(s)L_{s} 而降低,因此压缩率 ρ ρ rho\rho 接近 h ( L s 1 ) h L s 1 h(L_(s)-1)h\left(L_{s}-1\right) 的值,这是编码器处理的第二大单词长度相关的 h h hh 参数。对于任何给定的 δ > 0 δ > 0 delta > 0\delta>0 ,总可以找到最小整数 s s ℓ_(s)\ell_{s} 使得 ρ h ( s 1 ) δ ρ h s 1 δ rho-h(ℓ_(s)-1) <= delta\rho-h\left(\ell_{s}-1\right) \leq \delta 成立。可接受的解压缩幅度的大小

    viation δ δ delta\delta determines the operational range of L s L s L_(s)L_{s}, namely, L s s L s s L_(s) >= ℓ_(s)L_{s} \geq \ell_{s}.
    航空 δ δ delta\delta 决定了 L s L s L_(s)L_{s} 的操作范围,即 L s s L s s L_(s) >= ℓ_(s)L_{s} \geq \ell_{s}
  2. Since our code maps source words of variable length at most L s L s L_(s)L_{s} into codewords of fixed length L c L c L_(c)L_{c}, we adopt as a reference for comparison the best V B V B VBV B code C V B C V B C_(VB)C_{V B} discussed in Subsection III-B. The counterpart of our block-length L c L c L_(c)L_{c} is L ( C ) L ( C ) L(C)L(C) of (5), and by (7) we can write
    由于我们的代码将长度最长为 L s L s L_(s)L_{s} 的源词映射到长度固定为 L c L c L_(c)L_{c} 的码字,我们采用第 III-B 小节中讨论的最佳 V B V B VBV B C V B C V B C_(VB)C_{V B} 作为比较参考。我们的块长 L c L c L_(c)L_{c} 的对应项是(5)中的 L ( C ) L ( C ) L(C)L(C) ,根据(7)我们可以写为
L c log M L M h ( L M ) L c log M L M h L M L_(c)~~log M~~L_(M)h(L_(M))L_{c} \approx \log M \approx L_{M} h\left(L_{M}\right)
where M M MM is the code book size and h ( L M ) h L M h(L_(M))h\left(L_{M}\right) is the lower bound (see Lemma 2) on the compression ratio ρ V B ρ V B rho_(VB)\rho_{V B} attainable by C V B C V B C_(VB)C_{V B}. Since for sufficiently large L s L s L_(s)L_{s}, we have also
其中 M M MM 是码书大小, h ( L M ) h L M h(L_(M))h\left(L_{M}\right) 是可实现的压缩比下限 (见引理 2), ρ V B ρ V B rho_(VB)\rho_{V B} 是可实现的压缩比. 当 L s L s L_(s)L_{s} 足够大时, 还有
L c ( L s 1 ) ρ ( L s 1 ) h ( L s 1 ) , L c L s 1 ρ L s 1 h L s 1 , L_(c)~~(L_(s)-1)rho~~(L_(s)-1)h(L_(s)-1),L_{\mathrm{c}} \approx\left(L_{s}-1\right) \rho \approx\left(L_{s}-1\right) h\left(L_{s}-1\right),
it follows that ρ ρ V B ρ ρ V B rho~~rho_(VB)\rho \approx \rho_{V B}.
故此, ρ ρ V B ρ ρ V B rho~~rho_(VB)\rho \approx \rho_{V B}

We turn now to investigate the universal performance of the proposed algorithm, i.e., the performance when no a priori knowledge of the source to be compressed is available.
我们现在转向研究所提出算法的普遍性能,即在没有预先了解要压缩的源信息的情况下的性能。
Given δ 1 δ 1 delta_(1)\delta_{1} and h 1 h 1 h_(1)h_{1} such that 0 < δ 1 < h 1 < 1 0 < δ 1 < h 1 < 1 0 < delta_(1) < h_(1) < 10<\delta_{1}<h_{1}<1, let 1 1 ℓ_(1)\ell_{1} be the least positive integer satisfying
给定 δ 1 δ 1 delta_(1)\delta_{1} h 1 h 1 h_(1)h_{1} 满足 0 < δ 1 < h 1 < 1 0 < δ 1 < h 1 < 1 0 < delta_(1) < h_(1) < 10<\delta_{1}<h_{1}<1 条件,求 1 1 ℓ_(1)\ell_{1} 的最小正整数值
δ 1 ϵ ( 1 + 1 ) = 1 1 ( 3 + 3 log 1 + log 1 + 1 2 ) δ 1 ϵ 1 + 1 = 1 1 3 + 3 log 1 + log 1 + 1 2 delta_(1) >= epsilon(ℓ_(1)+1)=(1)/(ℓ_(1))(3+3log ℓ_(1)+log((ℓ_(1)+1)/(2)))\delta_{1} \geq \epsilon\left(\ell_{1}+1\right)=\frac{1}{\ell_{1}}\left(3+3 \log \ell_{1}+\log \frac{\ell_{1}+1}{2}\right)
and let K = α 1 h 1 K = α 1 h 1 K=alpha^(ℓ_(1)h_(1))K=\alpha^{\ell_{1} h_{1}} and λ 1 = 1 h 1 λ 1 = 1 h 1 lambda_(1)=|__ℓ_(1)h_(1)__|\lambda_{1}=\left\lfloor\ell_{1} h_{1}\right\rfloor. Consider the encoder E 1 E 1 E_(1)E_{1} which employs a buffer of length n 1 = n ( h 1 , 1 ) n 1 = n h 1 , 1 n_(1)=n(h_(1),ℓ_(1))n_{1}=n\left(h_{1}, \ell_{1}\right), obtained from (9) and (10) by setting = 1 , λ = λ 1 = 1 , λ = λ 1 ℓ=ℓ_(1),lambda=lambda_(1)\ell=\ell_{1}, \lambda=\lambda_{1}, and σ ( ) σ ( ) sigma(ℓ)\sigma(\ell) = K = K =K=K, and whose maximal word-length L s L s L_(s)L_{s} is equal to 1 + 1 + ℓ_(1)+\ell_{1}+ 1.
并让 K = α 1 h 1 K = α 1 h 1 K=alpha^(ℓ_(1)h_(1))K=\alpha^{\ell_{1} h_{1}} λ 1 = 1 h 1 λ 1 = 1 h 1 lambda_(1)=|__ℓ_(1)h_(1)__|\lambda_{1}=\left\lfloor\ell_{1} h_{1}\right\rfloor 。考虑采用长度为 n 1 = n ( h 1 , 1 ) n 1 = n h 1 , 1 n_(1)=n(h_(1),ℓ_(1))n_{1}=n\left(h_{1}, \ell_{1}\right) 的缓冲区的编码器 E 1 E 1 E_(1)E_{1} ,该缓冲区由(9)和(10)设置 = 1 , λ = λ 1 = 1 , λ = λ 1 ℓ=ℓ_(1),lambda=lambda_(1)\ell=\ell_{1}, \lambda=\lambda_{1} σ ( ) σ ( ) sigma(ℓ)\sigma(\ell) = K = K =K=K 获得,其最大字长 L s L s L_(s)L_{s} 等于 1 + 1 + ℓ_(1)+\ell_{1}+ 1。
It follows from Theorem 1 that if this encoder is applied to a source σ 1 σ 1 sigma_(1)\sigma_{1} such that h σ 1 ( 1 ) = h 1 h σ 1 1 = h 1 h_(sigma_(1))(ℓ_(1))=h_(1)h_{\sigma_{1}}\left(\ell_{1}\right)=h_{1}, then the resulting compression ratio ρ 1 ( σ 1 ) ρ 1 σ 1 rho_(1)(sigma_(1))\rho_{1}\left(\sigma_{1}\right) satisfies
根据定理 1,如果对源 σ 1 σ 1 sigma_(1)\sigma_{1} 应用此编码器,使得 h σ 1 ( 1 ) = h 1 h σ 1 1 = h 1 h_(sigma_(1))(ℓ_(1))=h_(1)h_{\sigma_{1}}\left(\ell_{1}\right)=h_{1} ,则所得压缩比 ρ 1 ( σ 1 ) ρ 1 σ 1 rho_(1)(sigma_(1))\rho_{1}\left(\sigma_{1}\right) 满足
ρ 1 ( σ 1 ) h σ 1 ( 1 ) + δ 1 = h 1 + δ 1 ρ 1 σ 1 h σ 1 1 + δ 1 = h 1 + δ 1 rho_(1)(sigma_(1)) <= h_(sigma_(1))(ℓ_(1))+delta_(1)=h_(1)+delta_(1)\rho_{1}\left(\sigma_{1}\right) \leq h_{\sigma_{1}}\left(\ell_{1}\right)+\delta_{1}=h_{1}+\delta_{1}
Suppose now that h 1 h 1 h_(1)h_{1} and δ 1 δ 1 delta_(1)\delta_{1} were chosen to fit the prescribed upper value ρ 1 = h 1 + δ 1 ρ 1 = h 1 + δ 1 rho_(1)=h_(1)+delta_(1)\rho_{1}=h_{1}+\delta_{1} of a prospective compression ratio range ( ρ 0 , ρ 1 ρ 0 , ρ 1 rho_(0),rho_(1)\rho_{0}, \rho_{1} ) with
假设现在 h 1 h 1 h_(1)h_{1} δ 1 δ 1 delta_(1)\delta_{1} 被选择来适应预期压缩比范围 ( ρ 0 , ρ 1 ρ 0 , ρ 1 rho_(0),rho_(1)\rho_{0}, \rho_{1} ) 的规定上限 ρ 1 = h 1 + δ 1 ρ 1 = h 1 + δ 1 rho_(1)=h_(1)+delta_(1)\rho_{1}=h_{1}+\delta_{1}
0 < δ 1 R < ρ 0 < ρ 1 < 1 , R > 1 0 < δ 1 R < ρ 0 < ρ 1 < 1 , R > 1 0 < delta_(1)R < rho_(0) < rho_(1) < 1,quad R > 10<\delta_{1} R<\rho_{0}<\rho_{1}<1, \quad R>1
where R R RR is an adjustment parameter to be determined later. As shown above, the encoder E 1 E 1 E_(1)E_{1} is then matched exactly to the upper value ρ 1 ρ 1 rho_(1)\rho_{1} of the prospective compression range. In order to accommodate the whole given range, we propose to employ a slightly larger encoder E 0 E 0 E_(0)E_{0} whose buffer is of length n 0 = n 1 1 + 0 n 0 = n 1 1 + 0 n_(0)=n_(1)-ℓ_(1)+ℓ_(0)n_{0}=n_{1}-\ell_{1}+\ell_{0}, where n 1 n 1 n_(1)n_{1} and 1 1 ℓ_(1)\ell_{1} are the parameters of E 1 E 1 E_(1)E_{1}, and where 0 0 ℓ_(0)\ell_{0} is an integer, greater than 1 1 ℓ_(1)\ell_{1}, for which the solution h 0 h 0 h_(0)h_{0} of the equation
其中 R R RR 是稍后确定的调整参数。如上所示,编码器 E 1 E 1 E_(1)E_{1} 精确地匹配于预期压缩范围的上限值 ρ 1 ρ 1 rho_(1)\rho_{1} 。为了容纳整个给定范围,我们建议使用略大的编码器 E 0 E 0 E_(0)E_{0} ,其缓冲区长度为 n 0 = n 1 1 + 0 n 0 = n 1 1 + 0 n_(0)=n_(1)-ℓ_(1)+ℓ_(0)n_{0}=n_{1}-\ell_{1}+\ell_{0} ,其中 n 1 n 1 n_(1)n_{1} 1 1 ℓ_(1)\ell_{1} E 1 E 1 E_(1)E_{1} 的参数,而 0 0 ℓ_(0)\ell_{0} 是一个大于 1 1 ℓ_(1)\ell_{1} 的整数,其方程的解为 h 0 h 0 h_(0)h_{0}
n 1 1 + 0 = n ( h 0 , 0 ) n 1 1 + 0 = n h 0 , 0 n_(1)-ℓ_(1)+ℓ_(0)=n(h_(0),ℓ_(0))n_{1}-\ell_{1}+\ell_{0}=n\left(h_{0}, \ell_{0}\right)
satisfies 满足
ρ 0 ϵ ( 0 ) < h 0 ρ 0 ϵ ( 0 + 1 ) ρ 0 ϵ 0 < h 0 ρ 0 ϵ 0 + 1 rho_(0)-epsilon(ℓ_(0)) < h_(0) <= rho_(0)-epsilon(ℓ_(0)+1)\rho_{0}-\epsilon\left(\ell_{0}\right)<h_{0} \leq \rho_{0}-\epsilon\left(\ell_{0}+1\right)
Noting that n 0 0 = n 1 1 n 0 0 = n 1 1 n_(0)-ℓ_(0)=n_(1)-ℓ_(1)n_{0}-\ell_{0}=n_{1}-\ell_{1} is fixed, it is clear from (9) and (10) that as 0 0 ℓ_(0)\ell_{0} increases h 0 h 0 h_(0)h_{0} decreases; also, (21) and the fact that 0 > 1 0 > 1 ℓ_(0) > ℓ_(1)\ell_{0}>\ell_{1} imply that ρ 0 ϵ ( 0 + 1 ) > ρ 0 ϵ ( 1 + 1 ) ρ 0 ϵ 0 + 1 > ρ 0 ϵ 1 + 1 rho_(0)-epsilon(ℓ_(0)+1) > rho_(0)-epsilon(ℓ_(1)+1)\rho_{0}-\epsilon\left(\ell_{0}+1\right)>\rho_{0}-\epsilon\left(\ell_{1}+1\right) ρ 0 δ 1 > 0 ρ 0 δ 1 > 0 >= rho_(0)-delta_(1) > 0\geq \rho_{0}-\delta_{1}>0. Hence, there exist h 0 > 0 h 0 > 0 h_(0) > 0h_{0}>0 and 0 > 1 0 > 1 ℓ_(0) > ℓ_(1)\ell_{0}>\ell_{1} that satisfy both (22) and (23).
注意到 n 0 0 = n 1 1 n 0 0 = n 1 1 n_(0)-ℓ_(0)=n_(1)-ℓ_(1)n_{0}-\ell_{0}=n_{1}-\ell_{1} 是固定的,从(9)和(10)可以清楚地看出,随着 0 0 ℓ_(0)\ell_{0} 的增加, h 0 h 0 h_(0)h_{0} 会减少;同时,(21)和 0 > 1 0 > 1 ℓ_(0) > ℓ_(1)\ell_{0}>\ell_{1} 的事实也意味着 ρ 0 ϵ ( 0 + 1 ) > ρ 0 ϵ ( 1 + 1 ) ρ 0 ϵ 0 + 1 > ρ 0 ϵ 1 + 1 rho_(0)-epsilon(ℓ_(0)+1) > rho_(0)-epsilon(ℓ_(1)+1)\rho_{0}-\epsilon\left(\ell_{0}+1\right)>\rho_{0}-\epsilon\left(\ell_{1}+1\right) ρ 0 δ 1 > 0 ρ 0 δ 1 > 0 >= rho_(0)-delta_(1) > 0\geq \rho_{0}-\delta_{1}>0 。因此,存在 h 0 > 0 h 0 > 0 h_(0) > 0h_{0}>0 0 > 1 0 > 1 ℓ_(0) > ℓ_(1)\ell_{0}>\ell_{1} 同时满足(22)和(23)。
In analogy with (20), it is also clear that if E 0 E 0 E_(0)E_{0} is applied to a source σ 0 σ 0 sigma_(0)\sigma_{0} such that h σ 0 ( 0 ) = h 0 h σ 0 0 = h 0 h_(sigma_(0))(ℓ_(0))=h_(0)h_{\sigma_{0}}\left(\ell_{0}\right)=h_{0}, then the resulting compression ratio ρ 0 ( σ 0 ) ρ 0 σ 0 rho_(0)(sigma_(0))\rho_{0}\left(\sigma_{0}\right) satisfies
根据(20)的类比,如果对源 σ 0 σ 0 sigma_(0)\sigma_{0} 应用 E 0 E 0 E_(0)E_{0} ,使得 h σ 0 ( 0 ) = h 0 h σ 0 0 = h 0 h_(sigma_(0))(ℓ_(0))=h_(0)h_{\sigma_{0}}\left(\ell_{0}\right)=h_{0} ,则所得的压缩比 ρ 0 ( σ 0 ) ρ 0 σ 0 rho_(0)(sigma_(0))\rho_{0}\left(\sigma_{0}\right) 满足
ρ 0 ( σ 0 ) h σ 0 ( 0 ) + δ 0 = h 0 + δ 0 ρ 0 σ 0 h σ 0 0 + δ 0 = h 0 + δ 0 rho_(0)(sigma_(0)) <= h_(sigma_(0))(ℓ_(0))+delta_(0)=h_(0)+delta_(0)\rho_{0}\left(\sigma_{0}\right) \leq h_{\sigma_{0}}\left(\ell_{0}\right)+\delta_{0}=h_{0}+\delta_{0}
where δ 0 = ϵ ( 0 + 1 ) δ 0 = ϵ 0 + 1 delta_(0)=epsilon(ℓ_(0)+1)\delta_{0}=\epsilon\left(\ell_{0}+1\right). 在哪里 δ 0 = ϵ ( 0 + 1 ) δ 0 = ϵ 0 + 1 delta_(0)=epsilon(ℓ_(0)+1)\delta_{0}=\epsilon\left(\ell_{0}+1\right)
From (23) and (24), we have ρ 0 ( σ 0 ) ρ 0 ϵ ( 0 + 1 ) + δ 0 ρ 0 σ 0 ρ 0 ϵ 0 + 1 + δ 0 rho_(0)(sigma_(0)) <= rho_(0)-epsilon(ℓ_(0)+1)+delta_(0)\rho_{0}\left(\sigma_{0}\right) \leq \rho_{0}-\epsilon\left(\ell_{0}+1\right)+\delta_{0} = ρ 0 = ρ 0 =rho_(0)=\rho_{0}, and
从(23)和(24)中,我们有 ρ 0 ( σ 0 ) ρ 0 ϵ ( 0 + 1 ) + δ 0 ρ 0 σ 0 ρ 0 ϵ 0 + 1 + δ 0 rho_(0)(sigma_(0)) <= rho_(0)-epsilon(ℓ_(0)+1)+delta_(0)\rho_{0}\left(\sigma_{0}\right) \leq \rho_{0}-\epsilon\left(\ell_{0}+1\right)+\delta_{0} = ρ 0 = ρ 0 =rho_(0)=\rho_{0}
δ 0 ρ 0 h 0 < ϵ ( 0 ) ϵ ( 1 + 1 ) δ 1 < 1 R ρ 0 δ 0 ρ 0 h 0 < ϵ 0 ϵ 1 + 1 δ 1 < 1 R ρ 0 delta_(0) <= rho_(0)-h_(0) < epsilon(ℓ_(0)) <= epsilon(ℓ_(1)+1) <= delta_(1) < (1)/(R)rho_(0)\delta_{0} \leq \rho_{0}-h_{0}<\epsilon\left(\ell_{0}\right) \leq \epsilon\left(\ell_{1}+1\right) \leq \delta_{1}<\frac{1}{R} \rho_{0}
Hence, h 0 h 0 h_(0)h_{0} can be made arbitrarily close to ρ 0 ρ 0 rho_(0)\rho_{0}, and consequently, the encoder E 0 E 0 E_(0)E_{0} matches as closely as desired the lower end ρ 0 ρ 0 rho_(0)\rho_{0} of the given compression range.
因此, h 0 h 0 h_(0)h_{0} 可以任意接近 ρ 0 ρ 0 rho_(0)\rho_{0} ,因此,编码器 E 0 E 0 E_(0)E_{0} 可以匹配给定压缩范围的下端 ρ 0 ρ 0 rho_(0)\rho_{0}
Theorem 2: Let σ σ sigma\sigma be a source for which a matched encoder E E EE achieves a compression ratio ρ ( σ ) ρ ( σ ) rho(sigma)\rho(\sigma) within the range ( ρ 0 , ρ 1 ) ρ 0 , ρ 1 (rho_(0),rho_(1))\left(\rho_{0}, \rho_{1}\right). Then the compression ratio ρ 0 ( σ ) ρ 0 ( σ ) rho_(0)(sigma)\rho_{0}(\sigma), achieved for σ σ sigma\sigma by E 0 E 0 E_(0)E_{0}, satisfies
定理 2:设 σ σ sigma\sigma 为一个源,其匹配编码器 E E EE 可在范围 ( ρ 0 , ρ 1 ) ρ 0 , ρ 1 (rho_(0),rho_(1))\left(\rho_{0}, \rho_{1}\right) 内实现压缩比 ρ ( σ ) ρ ( σ ) rho(sigma)\rho(\sigma) 。那么由 E 0 E 0 E_(0)E_{0} σ σ sigma\sigma 实现的压缩比 ρ 0 ( σ ) ρ 0 ( σ ) rho_(0)(sigma)\rho_{0}(\sigma) 满足
ρ 0 ( σ ) ρ ( σ ) + Δ , ρ 0 ( σ ) ρ ( σ ) + Δ , rho_(0)(sigma) <= rho(sigma)+Delta,\rho_{0}(\sigma) \leq \rho(\sigma)+\Delta,
where 哪里
Δ log d 1 d = max { h 1 h 0 , 1 1 h 0 } Δ log d 1 d = max h 1 h 0 , 1 1 h 0 Delta <= (|~log d~|)/(ℓ_(1))quad d=max{(h_(1))/(h_(0)),(1)/(1-h_(0))}\Delta \leq \frac{\lceil\log d\rceil}{\ell_{1}} \quad d=\max \left\{\frac{h_{1}}{h_{0}}, \frac{1}{1-h_{0}}\right\}
(Typically, ( h 1 / h 0 ) > ( 1 / 1 h 0 ) h 1 / h 0 > 1 / 1 h 0 (h_(1)//h_(0)) > (1//1-h_(0))\left(h_{1} / h_{0}\right)>\left(1 / 1-h_{0}\right) and d = ( h 1 / h 0 ) d = h 1 / h 0 d=(h_(1)//h_(0))d=\left(h_{1} / h_{0}\right).)
(通常情况下, ( h 1 / h 0 ) > ( 1 / 1 h 0 ) h 1 / h 0 > 1 / 1 h 0 (h_(1)//h_(0)) > (1//1-h_(0))\left(h_{1} / h_{0}\right)>\left(1 / 1-h_{0}\right) d = ( h 1 / h 0 ) d = h 1 / h 0 d=(h_(1)//h_(0))d=\left(h_{1} / h_{0}\right) 。)

Proof: To prove the theorem we shall consider the obviously worst case of applying E 0 E 0 E_(0)E_{0} to the source σ 1 σ 1 sigma_(1)\sigma_{1} whose matched encoder E 1 E 1 E_(1)E_{1} realizes ρ 1 ρ 1 rho_(1)\rho_{1}.
证明:为了证明该定理,我们将考虑应用 E 0 E 0 E_(0)E_{0} 到其编码器 E 1 E 1 E_(1)E_{1} 实现 ρ 1 ρ 1 rho_(1)\rho_{1} 的源 σ 1 σ 1 sigma_(1)\sigma_{1} 的显而易见的最坏情况。
Let ρ 0 ( σ 1 ) ρ 0 σ 1 rho_(0)(sigma_(1))\rho_{0}\left(\sigma_{1}\right) denote the compression ratio achievable by E 0 E 0 E_(0)E_{0} when applied to σ 1 σ 1 sigma_(1)\sigma_{1}. According to (12), we have
ρ 0 ( σ 1 ) ρ 0 σ 1 rho_(0)(sigma_(1))\rho_{0}\left(\sigma_{1}\right) 表示在应用于 σ 1 σ 1 sigma_(1)\sigma_{1} E 0 E 0 E_(0)E_{0} 可达到的压缩率。根据(12)式,我们有
ρ 0 ( σ 1 ) = L c 0 n 0 ( 0 + 1 ) N 0 ( σ 1 ) ρ 0 σ 1 = L c 0 n 0 0 + 1 N 0 σ 1 rho_(0)(sigma_(1))=(L_(c0))/(n_(0)-(ℓ_(0)+1))N_(0)(sigma_(1))\rho_{0}\left(\sigma_{1}\right)=\frac{L_{c 0}}{n_{0}-\left(\ell_{0}+1\right)} N_{0}\left(\sigma_{1}\right)
where 哪里
L c i = 1 + log ( i + 1 ) + log ( n i i 1 ) , i { 0 , 1 } L c i = 1 + log i + 1 + log n i i 1 , i { 0 , 1 } L_(ci)=1+|~log(ℓ_(i)+1)~|+|~log(n_(i)-ℓ_(i)-1)~|,quad i in{0,1}L_{c i}=1+\left\lceil\log \left(\ell_{i}+1\right)\right\rceil+\left\lceil\log \left(n_{i}-\ell_{i}-1\right)\right\rceil, \quad i \in\{0,1\}
and N i ( σ j ) , i , j { 0 , 1 } N i σ j , i , j { 0 , 1 } N_(i)(sigma_(j)),i,j in{0,1}N_{i}\left(\sigma_{j}\right), i, j \in\{0,1\}, is the maximum number of words into which a string S σ j { n i i 1 } S σ j n i i 1 S insigma_(j){n_(i)-ℓ_(i)-1}S \in \sigma_{j}\left\{n_{i}-\ell_{i}-1\right\} is parsed by E i E i E_(i)E_{i}.
N i ( σ j ) , i , j { 0 , 1 } N i σ j , i , j { 0 , 1 } N_(i)(sigma_(j)),i,j in{0,1}N_{i}\left(\sigma_{j}\right), i, j \in\{0,1\} ,是将字符串 S σ j { n i i 1 } S σ j n i i 1 S insigma_(j){n_(i)-ℓ_(i)-1}S \in \sigma_{j}\left\{n_{i}-\ell_{i}-1\right\} 通过 E i E i E_(i)E_{i} 解析的最大单词数。
Since n 0 0 = n 1 1 n 0 0 = n 1 1 n_(0)-ℓ_(0)=n_(1)-ℓ_(1)n_{0}-\ell_{0}=n_{1}-\ell_{1} and 0 > 1 0 > 1 ℓ_(0) > ℓ_(1)\ell_{0}>\ell_{1}, it is easy to verify that 2 N 0 ( σ 1 ) N 1 ( σ 1 ) 2 N 0 σ 1 N 1 σ 1 ^(2)N_(0)(sigma_(1)) <= N_(1)(sigma_(1)){ }^{2} N_{0}\left(\sigma_{1}\right) \leq N_{1}\left(\sigma_{1}\right). Also by (16),
自从 n 0 0 = n 1 1 n 0 0 = n 1 1 n_(0)-ℓ_(0)=n_(1)-ℓ_(1)n_{0}-\ell_{0}=n_{1}-\ell_{1} 0 > 1 0 > 1 ℓ_(0) > ℓ_(1)\ell_{0}>\ell_{1} ,很容易验证 2 N 0 ( σ 1 ) N 1 ( σ 1 ) 2 N 0 σ 1 N 1 σ 1 ^(2)N_(0)(sigma_(1)) <= N_(1)(sigma_(1)){ }^{2} N_{0}\left(\sigma_{1}\right) \leq N_{1}\left(\sigma_{1}\right) 。同时由式(16)可得,
N 1 ( σ 1 ) N 1 ( σ 1 ) = n 1 ( 1 + 1 ) 1 = n 0 ( 0 + 1 ) 1 N 1 σ 1 N 1 σ 1 = n 1 1 + 1 1 = n 0 0 + 1 1 N_(1)(sigma_(1)) <= N_(1)^(')(sigma_(1))=(n_(1)-(ℓ_(1)+1))/(ℓ_(1))=(n_(0)-(ℓ_(0)+1))/(ℓ_(1))N_{1}\left(\sigma_{1}\right) \leq N_{1}^{\prime}\left(\sigma_{1}\right)=\frac{n_{1}-\left(\ell_{1}+1\right)}{\ell_{1}}=\frac{n_{0}-\left(\ell_{0}+1\right)}{\ell_{1}}
Hence 因此
ρ 0 ( σ 1 ) L c 0 1 = L c 1 1 + L c 0 L c 1 1 ρ 1 + L c 0 L c 1 1 ρ 0 σ 1 L c 0 1 = L c 1 1 + L c 0 L c 1 1 ρ 1 + L c 0 L c 1 1 {:[rho_(0)(sigma_(1)) <= (L_(c0))/(ℓ_(1))=(L_(c1))/(ℓ_(1))+(L_(c0)-L_(c1))/(ℓ_(1))],[ <= rho_(1)+(L_(c0)-L_(c1))/(ℓ_(1))]:}\begin{aligned} \rho_{0}\left(\sigma_{1}\right) & \leq \frac{L_{c 0}}{\ell_{1}}=\frac{L_{c 1}}{\ell_{1}}+\frac{L_{c 0}-L_{c 1}}{\ell_{1}} \\ & \leq \rho_{1}+\frac{L_{c 0}-L_{c 1}}{\ell_{1}} \end{aligned}
and since 而且
L c 0 L c 1 = log ( 0 + 1 ) log ( 1 + 1 ) log k , L c 0 L c 1 = log 0 + 1 log 1 + 1 log k , L_(c0)-L_(c1)=|~log(ℓ_(0)+1)~|-|~log(ℓ_(1)+1)~| <= |~log k~|,L_{c 0}-L_{c 1}=\left\lceil\log \left(\ell_{0}+1\right)\right\rceil-\left\lceil\log \left(\ell_{1}+1\right)\right\rceil \leq\lceil\log k\rceil,
where k = ( 0 / 1 ) k = 0 / 1 k=(ℓ_(0)//ℓ_(1))k=\left(\ell_{0} / \ell_{1}\right), we obtain
其中 k = ( 0 / 1 ) k = 0 / 1 k=(ℓ_(0)//ℓ_(1))k=\left(\ell_{0} / \ell_{1}\right) ,我们得到

ρ 0 ( σ 1 ) ρ 1 + 1 1 log k . ρ 0 σ 1 ρ 1 + 1 1 log k . rho_(0)(sigma_(1)) <= rho_(1)+(1)/(ℓ_(1))|~log k~|.\rho_{0}\left(\sigma_{1}\right) \leq \rho_{1}+\frac{1}{\ell_{1}}\lceil\log k\rceil .
To obtain an upper bound on k k kk, we first observe that
要获得 k k kk 的上界,我们首先观察到
0 m = λ 0 + 1 0 ( 0 m + 1 ) α 0 h 0 n 0 0 = n 1 1 1 m = 1 1 ( 1 m + 1 ) α 1 h 1 0 m = λ 0 + 1 0 0 m + 1 ) α 0 h 0 n 0 0 = n 1 1 1 m = 1 1 1 m + 1 α 1 h 1 {:[ℓ_(0)sum_(m=lambda_(0)+1)^(ℓ_(0))(ℓ_(0)-m:}+1)alpha^(ℓ_(0)h_(0)) <= n_(0)-ℓ_(0)],[=n_(1)-ℓ_(1) <= ℓ_(1)sum_(m=1)^(ℓ_(1))(ℓ_(1)-m+1)alpha^(ℓ_(1)h_(1))]:}\begin{aligned} \ell_{0} \sum_{m=\lambda_{0}+1}^{\ell_{0}}\left(\ell_{0}-m\right. & +1) \alpha^{\ell_{0} h_{0}} \leq n_{0}-\ell_{0} \\ & =n_{1}-\ell_{1} \leq \ell_{1} \sum_{m=1}^{\ell_{1}}\left(\ell_{1}-m+1\right) \alpha^{\ell_{1} h_{1}} \end{aligned}
which reduces to 减为
k 2 ( 1 h 0 ) 2 α 1 h 1 ( 1 k ( h 0 / h 1 ) ) k 2 1 h 0 2 α 1 h 1 1 k h 0 / h 1 k^(2)(1-h_(0))^(2) <= alpha^(ℓ_(1)h_(1)(1-k(h_(0)//h_(1))))k^{2}\left(1-h_{0}\right)^{2} \leq \alpha^{\ell_{1} h_{1}\left(1-k\left(h_{0} / h_{1}\right)\right)}
Now, either k ( 1 h 0 ) < 1 k 1 h 0 < 1 k(1-h_(0)) < 1k\left(1-h_{0}\right)<1, or else the exponent on the right side of (26) must be nonnegative and thus k ( h 1 / k h 1 / k <= (h_(1)//:}k \leq\left(h_{1} /\right. h 0 h 0 h_(0)h_{0} ). In either case,
现在,要么 k ( 1 h 0 ) < 1 k 1 h 0 < 1 k(1-h_(0)) < 1k\left(1-h_{0}\right)<1 ,否则(26)右侧的指数必须是非负的,因此 k ( h 1 / k h 1 / k <= (h_(1)//:}k \leq\left(h_{1} /\right. h 0 h 0 h_(0)h_{0} ) 。在任何一种情况下,
k max { h 1 h 0 , 1 1 h 0 } = d k max h 1 h 0 , 1 1 h 0 = d k <= max{(h_(1))/(h_(0)),(1)/(1-h_(0))}=dk \leq \max \left\{\frac{h_{1}}{h_{0}}, \frac{1}{1-h_{0}}\right\}=d
Q.E.D. 证明完毕。
Theorems 1 and 2 demonstrate the efficiency and universality of the proposed algorithm. They show that an encoder designed to operate over a prescribed compression range performs, practically, as well as one designed to match a specific compression ratio within the given range.
定理 1 和 2 证明了所提出的算法的效率和广泛适用性。它们表明,设计用于指定压缩范围的编码器在实际应用中的表现与专门针对给定范围内的特定压缩比设计的编码器相当。
Moreover, for given complexity i.e., a given codeword length, the compression efficiency is comparable to that of an optimal variable-to-block code book designed to match a given source.
此外,对于给定的复杂性,即给定的码字长度,压缩效率与设计用于匹配给定源的最佳可变到块码本相当。

REFERENCES 参考文献

[1] D. A. Huffman, “A method for the construction of minimum-redundancy codes,” Proc. IRE, vol. 40, pp. 1098-1101, 1952.
[1] 赫夫曼, "最小冗余码构建方法",《IRE 杂志》,第 40 卷,第 1098-1101 页,1952 年。

[2] R. M. Karp, “Minimum-redundancy coding for the discrete noiseless channel,” IRE Trans. Inform. Theory, vol. IT-17, pp. 27-38, Jan. 1961.
[2] R. M. Karp, "最小冗余编码离散无噪信道," IRE 信息理论会议论文集, 第 IT-17 卷, 第 27-38 页, 1961 年 1 月.

[3] B. F. Varn, “Optimal variable length codes,” Inform. Contr., vol. 19, pp. 289-301, 1971.
[3] B. F. Varn, "可变长度码的最优编码,"《信息控制》,第 19 卷,289-301 页,1971 年。

[4] Y. Perl, M. R. Gary, and S. Even, “Efficient generation of optimal prefix code: Equiprobable words using unequal cost letters,” J. ACM, vol. 22, pp. 202-214, April 1975.
[4] 玉, M. R. Gary, 和 S. Even, "高效生成最佳前缀码: 使用不等成本字母的等概率单词," J. ACM, 第 22 卷, 第 202-214 页, 1975 年 4 月。

[5] A. Lempel, S. Even, and M. Cohn, “An algorithm for optimal prefix parsing of a noiseless and memoryless channel,” IEEE Trans. Inform. Theory, vol. IT-19, pp. 208-214, March 1973.
[5] A. 勒姆泊尔, S. 伊文, 以及 M. 科恩, "一种无噪音和无记忆信道最优前缀解析算法", IEEE 信息理论学报, 第 19 卷, 第 208-214 页, 1973 年 3 月。

[6] F. Jelinek and K. S. Schneider, “On variable length to block coding,” IEEE Trans. Inform. Theory, vol. IT-18, pp. 765-774, Nov. 1972.
[6] F. Jelinek 和 K. S. Schneider, "关于变长编码到定长编码的转换," IEEE 信息理论学报, 卷 IT-18, 页 765-774, 1972 年 11 月.

[7] R. G. Gallager, Information Theory and Reliable Communication. New York: Wiley, 1968.
[7] R. G. 高勒,信息论与可靠通信。纽约:威利,1968。

[8] J. Ziv, “Coding of sources with unknown statistics—Part I; Probability of encoding error,” IEEE Trans. Inform. Theory, vol. IT-18, pp. 384-394, May 1972.
[8] J. Ziv, "未知统计源编码—第一部分;编码误差概率," IEEE 信息理论交易, vol. IT-18, 第 384-394 页, 1972 年 5 月.

[9] L. D. Davisson, “Universal noiseless coding,” IEEE Trans. Inform. Theory, vol. IT-19, pp. 783-795, Nov. 1973.
[9] L. D. Davisson, "通用无噪编码," IEEE 信息理论汇刊, vol. IT-19, pp. 783-795, 1973 年 11 月.

[10] A. Lempel and J. Ziv, “On the complexity of finite sequences,” IEEE Trans. Inform. Theory, vol. IT-22, pp. 75-81, Jan. 1976.
[10] A. 伦佩尔和 J. 齐夫,"论有限序列的复杂性,"IEEE 信息论交易杂志,第 IT-22 卷,第 75-81 页,1976 年 1 月。

[11] B. M. Fitingof, “Optimal coding in the case of unknown and changing message statistics,” Prob. Inform. Transm., vol. 2, pp. 3-11, 1966.
[11] B. M. 费廷诺夫, "在未知和变化的消息统计情况下的最优编码,"《信息传输问题》, 第 2 卷, 第 3-11 页, 1966 年。

  1. Manuscript received June 23, 1975; revised July 6, 1976. Paper previously presented at the IEEE International Symposium on Information Theory, Ronneby, Sweden, June 21-24, 1976.
    手稿于 1975 年 6 月 23 日收到;1976 年 7 月 6 日修订。该论文曾在 1976 年 6 月 21 日至 24 日于瑞典隆内比举行的 IEEE 国际信息理论研讨会上进行过演讲。

    J. Ziv was with the Department of Electrical Engineering, TechnionIsrael Institute of Technology, Haifa, Israel. He is now with the Bell Telephone Laboratories, Murray Hill, NJ 07974.
    齐郑六治已与以色列海法技术学院电气工程系工作,现任职于美国新泽西州默里山贝尔电话实验室。

    A. Lempel was with the Department of Electrical Engineering, Tech-nion-Israel Institute of Technology, Haifa, Israel. He is now with the Sperry Research Center, Sudbury, MA 01776.
    列佩尔先生曾在以色列海法技术学院电气工程系工作,目前他在马萨诸塞州苏德伯里斯皮里研究中心任职。
  2. 2 The assertion here is analogous to that of [ 10 [ 10 [10[10, theorem 1].
    这里的断言类似于[ [ 10 [ 10 [10[10 ,定理 1]。