这是用户在 2024-4-25 16:43 为 https://app.immersivetranslate.com/pdf-pro/f01305ab-d114-43a2-bffb-db95fc2942cd 保存的双语快照页面，由 沉浸式翻译 提供双语支持。了解如何保存？

Aup10 音频工程学会大会论文 10242

This Convention paper was selected based on a submitted abstract and 750-word precis that have been peer reviewed by at least two qualified anonymous reviewers. The complete manuscript was not peer reviewed.

这篇大会论文是根据已提交的摘要和 750 字的前言筛选出来的，至少有两名合格的匿名评审员对摘要和前言进行了同行评审。完整稿件未经同行评审。

This convention paper has been reproduced from the author's advance manuscript without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. This paper is available in the AES E-Library, http://www.aes.org/e-lib. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society.

本大会论文系根据作者的预发手稿转载，未经编辑、修改或审查委员会审议。AES 对其内容不承担任何责任。本文可在 AES 电子图书馆查阅，http://www.aes.org/e-lib。保留所有权利。未经《音频工程学会学报》直接许可，不得复制本文或其中任何部分。

低频扬声器失真感知评估

A perceptually-driven distortion metric for loudspeakers is proposed which is based on a critical-band spectral comparison of the distortion and noise to an appropriate masking threshold. The loudspeaker is excited by a sine-wave signal composed of windowed 0.3 second bursts.

该指标基于失真和噪声与适当掩蔽阈值的临界频段频谱比较。扬声器由窗口 0.3 秒脉冲串组成的正弦波信号激励。

Loudspeaker masking curves for sine waves between

For each burst, the ratios of measured distortion and noise levels to the appropriate masking curve values are determined for each critical band starting at the second harmonic.

对于每个脉冲串，从二次谐波开始，确定每个临界频段的测量失真和噪声水平与相应屏蔽曲线值的比率。

Once this is done the audibility of all these contributions are combined into various audibility values.

一旦完成这项工作，所有这些贡献的可听度就会合并成不同的可听度值。

This study assesses the audible degradation due to nonlinear distortion for low-frequency loudspeakers, i.e. those reproducing sounds at or below
. Unfortunately, simple nonlinear distortion metrics such as harmonic distortion, total harmonic distortion and noise, and intermodulation distortion do not correlate well with perceived quality.

本研究评估了低频扬声器非线性失真引起的听觉衰减，即重现 或以下声音的扬声器。遗憾的是，谐波失真、总谐波失真和噪声以及互调失真等简单的非线性失真指标与感知质量并没有很好的关联。

Instead this approach models the perceptual process of nonlinear distortion detection via a spectral comparison of loudspeaker acoustic output at the second harmonic and above to auditory masking values on a critical-band basis and then uses a critical-band combination model for overall audibility.

相反，这种方法通过将扬声器二次谐波及以上的声学输出与听觉掩蔽值进行临界频段的频谱比较，对非线性失真检测的感知过程进行建模，然后使用临界频段组合模型进行整体可听性建模。

Sine waves between and at a sample rate (SR) of either 48 or
stimuli are used for this test. For the purposes of this study distortion will be used to designate nonlinear distortion

该测试使用 之间的正弦波，采样率 (SR) 为 48 或
。在本研究中，失真指的是非线性失真。

本研究评估了低频扬声器非线性失真引起的听觉衰减，即重现

Instead this approach models the perceptual process of nonlinear distortion detection via a spectral comparison of loudspeaker acoustic output at the second harmonic and above to auditory masking values on a critical-band basis and then uses a critical-band combination model for overall audibility.

相反，这种方法通过将扬声器二次谐波及以上的声学输出与听觉掩蔽值进行临界频段的频谱比较，对非线性失真检测的感知过程进行建模，然后使用临界频段组合模型进行整体可听性建模。

Sine waves between

该测试使用

The problem of finding meaningful distortion evaluation has resulted in a number of efforts to find perceptually relevant metrics for transducer distortion.

为了找到有意义的失真评估方法，许多人都在努力寻找与感知相关的传感器失真指标。

Lin [1] proposed using subjective harmonic distortion audibility experiments to develop a weighting for a traditional distortion analyzer. Another paper by Boer et al.

Lin [1] 建议使用主观谐波失真可听性实验来为传统失真分析仪制定权重。Boer 等人的另一篇论文

[2] investigated the audibility of harmonic distortion, various intermodulation distortions, and a loudspeaker nonlinearity model using subjective paired-comparison tests with music.

[2]利用音乐主观配对比较测试，研究了谐波失真、各种互调失真和扬声器非线性模型的可听性。

Lee and Geddes [3] and [4] developed a metric based on evaluating the nonlinear characteristics of a transducer's transfer function and compared this metric to the results from subjective tests. Tan, Moore, and Zacharov [5] used artificially generated distortions

Lee 和 Geddes [3] 和 [4] 在评估传感器传递函数非线性特性的基础上开发了一种度量方法，并将该方法与主观测试结果进行了比较。Tan、Moore 和 Zacharov [5]使用人为产生的失真来评估传感器的非线性特性。

and recordings of real transducers on speech and music to assess the effectiveness of a proposed spectral distortion measure.

和真实传感器对语音和音乐的录音，以评估所提出的频谱失真测量方法的有效性。

Voishvillo [6] summarized the effectiveness of some of the above distortion metrics along with the PEAQ standard, "Method for Objective Measurement of Perceived Audio Quality" [7].

Voishvillo [6]总结了上述一些失真指标的有效性，以及 PEAQ 标准 "客观测量感知音频质量的方法"[7]。

Temme, Brunet, and Keele [8] also applied the PEAQ model to the assessment of loudspeaker distortion and Temme, Brunet, and Qarabaqi [9] adapted the PEAQ model for improved performance in distortion assessment by comparison to subjective tests.

Temme、Brunet 和 Keele[8]还将 PEAQ 模型应用于扬声器失真评估，Temme、Brunet 和 Qarabaqi[9]对 PEAQ 模型进行了改进，通过与主观测试进行比较，提高了失真评估的性能。

Fielder and Benjamin [10] evaluated subwoofer nonlinear performance via 20 sine-wave stimuli and a comparison of thirdoctave spectral levels to masking curves.

Fielder 和 Benjamin [10]通过 20 正弦波刺激和第三倍频程频谱水平与掩蔽曲线的比较，对亚低音扬声器的非线性性能进行了评估。

为了找到有意义的失真评估方法，许多人都在努力寻找与感知相关的传感器失真指标。

Lin [1] proposed using subjective harmonic distortion audibility experiments to develop a weighting for a traditional distortion analyzer. Another paper by Boer et al.

Lin [1] 建议使用主观谐波失真可听性实验来为传统失真分析仪制定权重。Boer 等人的另一篇论文

[2] investigated the audibility of harmonic distortion, various intermodulation distortions, and a loudspeaker nonlinearity model using subjective paired-comparison tests with music.

[2]利用音乐主观配对比较测试，研究了谐波失真、各种互调失真和扬声器非线性模型的可听性。

Lee and Geddes [3] and [4] developed a metric based on evaluating the nonlinear characteristics of a transducer's transfer function and compared this metric to the results from subjective tests. Tan, Moore, and Zacharov [5] used artificially generated distortions

Lee 和 Geddes [3] 和 [4] 在评估传感器传递函数非线性特性的基础上开发了一种度量方法，并将该方法与主观测试结果进行了比较。Tan、Moore 和 Zacharov [5]使用人为产生的失真来评估传感器的非线性特性。

and recordings of real transducers on speech and music to assess the effectiveness of a proposed spectral distortion measure.

和真实传感器对语音和音乐的录音，以评估所提出的频谱失真测量方法的有效性。

Voishvillo [6] summarized the effectiveness of some of the above distortion metrics along with the PEAQ standard, "Method for Objective Measurement of Perceived Audio Quality" [7].

Voishvillo [6]总结了上述一些失真指标的有效性，以及 PEAQ 标准 "客观测量感知音频质量的方法"[7]。

Temme, Brunet, and Keele [8] also applied the PEAQ model to the assessment of loudspeaker distortion and Temme, Brunet, and Qarabaqi [9] adapted the PEAQ model for improved performance in distortion assessment by comparison to subjective tests.

Temme、Brunet 和 Keele[8]还将 PEAQ 模型应用于扬声器失真评估，Temme、Brunet 和 Qarabaqi[9]对 PEAQ 模型进行了改进，通过与主观测试进行比较，提高了失真评估的性能。

Fielder and Benjamin [10] evaluated subwoofer nonlinear performance via 20

Fielder 和 Benjamin [10]通过 20

The method presented here is different than these earlier methods in that the present model includes use of measured masking curves and a method to combine critical-band audibility into an overall audibility number.

这里介绍的方法与之前的方法不同，因为本模型包括使用测量的掩蔽曲线和将临界频段可听度合并为整体可听度数字的方法。

This method is very similar to what was recently proposed for the assessment of headphone distortion by Fielder [11], except that the masking curves have been adapted for loudspeaker listening, the acoustic signal is taken from a microphone located closely to the loudspeaker, the acoustic level is referenced at a 1 meter or greater distance, a shorter 16,384-sample FFT is employed, and sine-wave bursts are used instead of steady-state sine-wave signals. The sequence of sine-wave bursts is composed of windowed
bursts with sequentially increasing amplitudes in 1 decibel increments that are spaced apart at 2 second intervals.

该方法与 Fielder [11]最近提出的耳机失真评估方法十分相似，不同之处在于：掩蔽曲线经调整后适用于扬声器聆听；声学信号取自靠近扬声器的麦克风；声级以 1 米或更远的距离为基准；采用较短的 16,384 样本 FFT；以及使用 正弦波脉冲串代替稳态正弦波信号。正弦波脉冲串是由窗口化的
脉冲串组成，其振幅以 1 分贝为单位依次增大，间隔为 2 秒。

The use of short bursts reduces the possibility of damage by limiting the duration of the test stimulus and the sequence of bursts more quickly provide results over a range of drive levels.

使用短脉冲串可以限制测试刺激的持续时间，从而降低损坏的可能性，而且脉冲串的顺序可以更快地在一定范围内提供驱动水平的结果。

这里介绍的方法与之前的方法不同，因为本模型包括使用测量的掩蔽曲线和将临界频段可听度合并为整体可听度数字的方法。

This method is very similar to what was recently proposed for the assessment of headphone distortion by Fielder [11], except that the masking curves have been adapted for loudspeaker listening, the acoustic signal is taken from a microphone located closely to the loudspeaker, the acoustic level is referenced at a 1 meter or greater distance, a shorter 16,384-sample FFT is employed, and

该方法与 Fielder [11]最近提出的耳机失真评估方法十分相似，不同之处在于：掩蔽曲线经调整后适用于扬声器聆听；声学信号取自靠近扬声器的麦克风；声级以 1 米或更远的距离为基准；采用较短的 16,384 样本 FFT；以及使用

The use of short bursts reduces the possibility of damage by limiting the duration of the test stimulus and the sequence of bursts more quickly provide results over a range of drive levels.

使用短脉冲串可以限制测试刺激的持续时间，从而降低损坏的可能性，而且脉冲串的顺序可以更快地在一定范围内提供驱动水平的结果。

This distortion-audibility approach analyzes the acoustic output of the loudspeaker for each sine-wave burst and produces the spectrum of the acoustic signals using a 16,384-sample FFT.

这种失真可听度方法分析扬声器对每个正弦波脉冲的声学输出，并使用 16,384 样本 FFT 生成声学信号的频谱。

A spectral comparison of the acoustic signal to auditory-masking levels is made on a critical-band by critical-band basis and then the audibility values for each critical band are combined into several audibility numbers.

在逐个临界频段的基础上，将声音信号与听觉掩蔽水平进行频谱比较，然后将每个临界频段的可听度值合并为几个可听度数值。

The audibility of all distortion products, the combination of harmonics, and high-frequency products above
or
for drive frequencies of
or
, respectively is calculated.

计算所有失真产物、 谐波组合以及高于
或
的高频产物的可听度，驱动频率分别为
或
。

这种失真可听度方法分析扬声器对每个正弦波脉冲的声学输出，并使用 16,384 样本 FFT 生成声学信号的频谱。

A spectral comparison of the acoustic signal to auditory-masking levels is made on a critical-band by critical-band basis and then the audibility values for each critical band are combined into several audibility numbers.

在逐个临界频段的基础上，将声音信号与听觉掩蔽水平进行频谱比较，然后将每个临界频段的可听度值合并为几个可听度数值。

The audibility of all distortion products, the combination of

计算所有失真产物、

This test method shares some similarities to a test of powered subwoofers, denoted as CEA-2010 [12]. Both tests use a sequence of short windowed sine-wave bursts, time-windowed spectral analysis, and comparison to a frequency dependent threshold.

这种测试方法与 CEA-2010 [12] 中的有源低音炮测试有一些相似之处。这两项测试都使用了短窗口正弦波脉冲串、时间窗口频谱分析以及与频率相关阈值的比较。

Differences are the length of the bursts, burst window shapes, analysis window lengths/shapes, range of applied frequencies, and type of frequency-dependent threshold employed.

不同之处在于脉冲串长度、脉冲串窗口形状、分析窗口长度/形状、应用频率范围以及所采用的频率阈值类型。

Additionally, the method proposed here generates distortion audibility values, rather than imposing a hard limit for acceptable performance.

此外，本文提出的方法可生成失真可听度值，而不是为可接受的性能设定硬性限制。

Another significant difference in this proposed method is that it more accurately focuses on the audibility of higher-frequency buzz and noise distortion products, which can significantly degrade sound quality but be low in numerical level.

这种拟议方法的另一个重要区别是，它更准确地关注高频嗡嗡声和噪声失真产品的可听性，这些产品会显著降低音质，但数值水平较低。

Further to CEA-2010 [12], whilst the thresholds for the acceptable level of harmonics have some similarity to perceptual masking in terms of fall-off with increasing frequency, their absolute levels are much higher than perceptual masking levels and so subwoofers deemed acceptable under this method still exhibit quite audible distortion.

此外，根据 CEA-2010 [12]，虽然谐波可接受水平的阈值在随频率增加而下降方面与感知掩蔽有一定的相似性，但其绝对水平远高于感知掩蔽水平，因此根据这种方法认为可接受的超低音仍然会表现出相当明显的失真。

这种测试方法与 CEA-2010 [12] 中的有源低音炮测试有一些相似之处。这两项测试都使用了短窗口正弦波脉冲串、时间窗口频谱分析以及与频率相关阈值的比较。

Differences are the length of the bursts, burst window shapes, analysis window lengths/shapes, range of applied frequencies, and type of frequency-dependent threshold employed.

不同之处在于脉冲串长度、脉冲串窗口形状、分析窗口长度/形状、应用频率范围以及所采用的频率阈值类型。

Additionally, the method proposed here generates distortion audibility values, rather than imposing a hard limit for acceptable performance.

此外，本文提出的方法可生成失真可听度值，而不是为可接受的性能设定硬性限制。

Another significant difference in this proposed method is that it more accurately focuses on the audibility of higher-frequency buzz and noise distortion products, which can significantly degrade sound quality but be low in numerical level.

这种拟议方法的另一个重要区别是，它更准确地关注高频嗡嗡声和噪声失真产品的可听性，这些产品会显著降低音质，但数值水平较低。

Further to CEA-2010 [12], whilst the thresholds for the acceptable level of harmonics have some similarity to perceptual masking in terms of fall-off with increasing frequency, their absolute levels are much higher than perceptual masking levels and so subwoofers deemed acceptable under this method still exhibit quite audible distortion.

此外，根据 CEA-2010 [12]，虽然谐波可接受水平的阈值在随频率增加而下降方面与感知掩蔽有一定的相似性，但其绝对水平远高于感知掩蔽水平，因此根据这种方法认为可接受的超低音仍然会表现出相当明显的失真。

A number of distortion assessment examples are examined by this new method. It will be shown that nonlinear impairments in loudspeakers can be quite significant and dependent on the fundamental frequency.

通过这种新方法，对一些失真评估实例进行了研究。结果表明，扬声器中的非线性损耗可能相当严重，并与基频有关。

Typically distortion products become less and less significant as the frequency of the signal increases because of the combined effect of increased upward-frequency auditory masking and reduced excursion of the loudspeaker diaphragm.

通常情况下，随着信号频率的增加，失真产品的重要性会越来越小，这是因为上行频率听觉掩蔽增加和扬声器振膜偏移减小的共同作用。

通过这种新方法，对一些失真评估实例进行了研究。结果表明，扬声器中的非线性损耗可能相当严重，并与基频有关。

Typically distortion products become less and less significant as the frequency of the signal increases because of the combined effect of increased upward-frequency auditory masking and reduced excursion of the loudspeaker diaphragm.

通常情况下，随着信号频率的增加，失真产品的重要性会越来越小，这是因为上行频率听觉掩蔽增加和扬声器振膜偏移减小的共同作用。

2 正弦波脉冲串的设计

The design of the sine-wave bursts is a compromise between analysis time, reducing spectral splatter, and limiting damage at high drive levels.

正弦波脉冲串的设计是在分析时间、减少频谱飞溅和限制高驱动级损坏之间的折衷方案。

The compromise chosen results in bursts composed of an integer number of cycles of the fundamental frequency to result in a time interval that is equal or just less than . It is assumed that a 24-bit drive signal is used, with either a 48 or
SR.

折中的结果是，脉冲串由整数个基频周期组成，时间间隔等于或刚好小于 。假设使用的是 24 位驱动信号，SR 为 48 或
。

正弦波脉冲串的设计是在分析时间、减少频谱飞溅和限制高驱动级损坏之间的折衷方案。

The compromise chosen results in bursts composed of an integer number of cycles of the fundamental frequency to result in a time interval that is equal or just less than

折中的结果是，脉冲串由整数个基频周期组成，时间间隔等于或刚好小于

The burst is then windowed with the same length, flat-top window with fade up/down shapes based on the first and last halves of a 6000-sample,
Kaiser-Bessel window. This window shape is chosen to reduce the effect of the spectral spatter of the sine-wave burst while maximizing the time spent at the desired level. The
sine-wave bursts all have virtually the same length and window shaping. Fig 1 shows the shape of a
sine-wave burst at a
SR.

然后，使用相同长度的平顶窗口对脉冲串进行窗口处理，窗口的上/下渐变形状基于 6000 个样本的前半部分和后半部分， Kaiser-Bessel 窗口。选择这种窗口形状是为了减少正弦波脉冲串频谱散射的影响，同时最大限度地延长所需电平的时间。
正弦波脉冲串的长度和窗口形状几乎相同。图 1 显示了
SR 下
正弦波脉冲串的形状。

然后，使用相同长度的平顶窗口对脉冲串进行窗口处理，窗口的上/下渐变形状基于 6000 个样本的前半部分和后半部分，

Figure 1.20 Hz windowed sine-wave burst

图 1.20 Hz 窗口正弦波脉冲串

图 1.20 Hz 窗口正弦波脉冲串

Examination of figure 1 shows that a burst is mostly composed of sine-wave cycles at full amplitude but smoothly fades to zero at the burst boundaries. Compared to a steady-state sine wave of the same length, the RMS level is reduced 1.41 or
for 48 or
SR, respectively. This also means that the distortion products are slightly underestimated due to the shorter duration of the full-amplitude, sine-wave signal portion.

对图 1 的研究表明，脉冲串主要由全振幅正弦波周期组成，但在脉冲串边界处会平滑渐弱为零。与相同长度的稳态正弦波相比，48 或 SR 的有效值电平分别降低了 1.41 或
。这也意味着，由于全振幅正弦波信号部分的持续时间较短，失真产物被略微低估。

对图 1 的研究表明，脉冲串主要由全振幅正弦波周期组成，但在脉冲串边界处会平滑渐弱为零。与相同长度的稳态正弦波相比，48 或

The bursts are assembled into a sequence, spaced at 2 second intervals and sequentially increasing in level by
increments. The use of a sequence of bursts allows rapid testing of distortion audibility over a range of levels. The
range of burst levels enables assessment of the loudspeaker performance from fairly linear operation at modest displacement, up to the maximum anticipated or designed displacement.

这些脉冲串组合成一个序列，间隔为 2 秒钟，音量以 为增量依次递增。使用脉冲串可以在一定范围内快速测试失真可听度。
的脉冲串电平范围可以评估扬声器的性能，从适度位移时的相当线性运行，一直到最大预期或设计位移。

It is also possible to examine the quiet intervals between bursts to determine the effect of background noise on the measurement.

还可以检查脉冲串之间的静音间隔，以确定背景噪声对测量的影响。

Figure 2 shows a sequence of bursts covering the level range of
to
. Often it is useful to include an additional low-level burst at the beginning to allow for start-up effects. This additional burst is not included in the analysis

图 2 显示了 脉冲串序列，其电平范围为
至
。通常，在开始时加入一个额外的低电平突发以考虑启动效应是很有用的。这个额外的突发不包括在分析中。

这些脉冲串组合成一个序列，间隔为 2 秒钟，音量以

It is also possible to examine the quiet intervals between bursts to determine the effect of background noise on the measurement.

还可以检查脉冲串之间的静音间隔，以确定背景噪声对测量的影响。

Figure 2 shows a sequence of

图 2 显示了

Figure 2. Sequence of
bursts from
to

图 2.从 到
的脉冲串序列

图 2.从

The effect of spectral spatter due to the burst duration and window shape produces spectral components other than the fundamental frequency. These spectral components can cause errors in the perceptual distortion assessment process. These errors are greatest for
since the masking curve at
is minimal. Figure 3 shows a comparison of the spectrum of a
burst of figure 1 to the associated masking curve.

由于脉冲串持续时间和窗口形状造成的频谱散射效应会产生基频以外的频谱成分。这些频谱成分会在感知失真评估过程中造成误差。 的这些误差最大，因为
的掩蔽曲线最小。图 3 显示了图 1 中
突发的频谱与相关屏蔽曲线的对比。

由于脉冲串持续时间和窗口形状造成的频谱散射效应会产生基频以外的频谱成分。这些频谱成分会在感知失真评估过程中造成误差。

Figure 3. Worst case spectral splatter
sinewave burst @
)

图 3.最坏情况下的频谱飞溅 正弦波突发 @
)

图 3.最坏情况下的频谱飞溅

Examination of this figure shows that the burst spectral components are significantly below the masking curve for frequencies
. If the measurement of the
harmonic level does not include frequencies below
, the spectral splatter of the burst will not create significant errors in this measurement. Frequencies at
and above result in even less significant burst spectral splatter errors and the tests can employ a critical band centered at the second harmonic.

从图中可以看出，脉冲串的频谱成分明显低于 的掩蔽曲线。如果
谐波电平的测量不包括
以下的频率，则脉冲串的频谱飞溅不会对这一测量造成重大误差。频率在
及以上的脉冲串频谱飞溅误差更小，测试可采用以二次谐波为中心的临界频段。

从图中可以看出，脉冲串的频谱成分明显低于

The assessment of loudspeaker distortion is performed by placing the loudspeaker in a half-plane environment, preferably in an anechoic half plane or a full-anechoic environment and correcting the level to the half-plane one.

扬声器失真评估是通过将扬声器置于半平面环境中进行的，最好是半消声环境或全消声环境，并将电平校正至半平面电平。

The acoustic level reference is typically measured at 1 meter or greater. Non-anechoic environments require a frequency correction factor for both the level reference microphone and the more closely placed measurement microphone to account for room modes and effects.

声级参考通常在 1 米或更高处测量。在非消声环境中，需要对声级参考传声器和距离较近的测量传声器进行频率校正，以考虑房间模式和影响。

This is best measured using the same type of stimulus and analysis windowing. In the examples shown later, a high-quality listening room is used and the loudspeakers placed inward from the wall boundaries, but otherwise the effect of room modes is ignored.

最好使用相同类型的刺激和分析窗口进行测量。在后面的示例中，使用了高质量的聆听室，扬声器放置在墙壁边界向内的位置，除此之外，房间模式的影响都被忽略。

A Schoeps MK2 microphone and CMC6 microphone amplifier are used as the measurement microphone because of their low self noise and high overload capabilities.

使用 Schoeps MK2 麦克风和 CMC6 麦克风放大器作为测量麦克风，是因为它们具有低自噪声和高过载能力。

扬声器失真评估是通过将扬声器置于半平面环境中进行的，最好是半消声环境或全消声环境，并将电平校正至半平面电平。

The acoustic level reference is typically measured at 1 meter or greater. Non-anechoic environments require a frequency correction factor for both the level reference microphone and the more closely placed measurement microphone to account for room modes and effects.

声级参考通常在 1 米或更高处测量。在非消声环境中，需要对声级参考传声器和距离较近的测量传声器进行频率校正，以考虑房间模式和影响。

This is best measured using the same type of stimulus and analysis windowing. In the examples shown later, a high-quality listening room is used and the loudspeakers placed inward from the wall boundaries, but otherwise the effect of room modes is ignored.

最好使用相同类型的刺激和分析窗口进行测量。在后面的示例中，使用了高质量的聆听室，扬声器放置在墙壁边界向内的位置，除此之外，房间模式的影响都被忽略。

A Schoeps MK2 microphone and CMC6 microphone amplifier are used as the measurement microphone because of their low self noise and high overload capabilities.

使用 Schoeps MK2 麦克风和 CMC6 麦克风放大器作为测量麦克风，是因为它们具有低自噪声和高过载能力。

The sequence of bursts is used to drive the loudspeaker and calibrated-level 24-bit recordings are made by placing the measurement microphone close to the loudspeaker (
meters), which feed a wide dynamic range
as the recording device. The sine-wave burst sequence to the loudspeaker is supplied by a DAC driven by the 24-bit sine-wave burst-sequence file. Ideally, the
and DAC would use the same clocks, but this test does not require that. The test set up should be configured to maximize the SNR so as not to create errors due to environmental or measurement-device self noise.

猝发序列用于驱动扬声器，将测量麦克风靠近扬声器（ 米）并馈入宽动态范围
作为记录设备，就能录制校准电平的 24 位录音。扬声器的正弦波突发序列由 24 位正弦波突发序列文件驱动的 DAC 提供。理想情况下，
和 DAC 使用相同的时钟，但本测试并不要求如此。测试装置的配置应使信噪比最大化，以免因环境或测量设备自身噪声而产生误差。

The distortion-assessment algorithm is also run during quiet intervals following each burst to determine the effect of this background and equipment noise.

失真评估算法还在每个脉冲串之后的安静间歇期间运行，以确定背景噪声和设备噪声的影响。

猝发序列用于驱动扬声器，将测量麦克风靠近扬声器（

The distortion-assessment algorithm is also run during quiet intervals following each burst to determine the effect of this background and equipment noise.

失真评估算法还在每个脉冲串之后的安静间歇期间运行，以确定背景噪声和设备噪声的影响。

Figure 4. Experimental Setup

图 4.实验装置

图 4.实验装置

4 掩蔽曲线推导

The masking curves used for this analysis are from an earlier study of the perceptual assessment of headphone distortion by Fielder [11]. These are derived from masking tests for frequencies of 20,50 ,
, and
, where the masker is a sine wave and the masked signal is narrow-band noise that is less than or equal in width to a critical bandwidth.

本分析中使用的掩蔽曲线来自 Fielder [11] 早先对耳机失真感知评估的研究。这些曲线来自频率为 20、50、 和
的掩蔽测试，其中掩蔽器为正弦波，被掩蔽信号为宽度小于或等于临界带宽的窄带噪声。

Typical masking curves in the psychoacoustic literature are different in that they use single sine waves as both the masker and masked signals, see Fastl and Zwicker [13].

心理声学文献中的典型掩蔽曲线与此不同，它们使用单正弦波作为掩蔽信号和被掩蔽信号，见 Fastl 和 Zwicker [13]。

Perceived beating effects occur due to the interaction of the sine-wave masker and masked sine wave with nonlinearities in the human auditory system.

由于正弦波掩蔽器和被掩蔽的正弦波与人类听觉系统中的非线性相互作用，会产生感知跳动效应。

The use of narrow-band noise as the masked signal minimizes this and acts as a good compromise for the audibility of sine waves, noises, and

使用窄带噪声作为掩蔽信号可最大限度地减少这种情况，并对正弦波、噪声和窄带噪声的可听性起到良好的折中作用。

combinations of the two. Additionally, the Author [14] had found that lower-level sine wave and critical-bandwidth noises produced similar average masking levels.

两者的组合。此外，作者[14]还发现，低电平正弦波和临界带宽噪声产生的平均掩蔽水平相似。

本分析中使用的掩蔽曲线来自 Fielder [11] 早先对耳机失真感知评估的研究。这些曲线来自频率为 20、50、

Typical masking curves in the psychoacoustic literature are different in that they use single sine waves as both the masker and masked signals, see Fastl and Zwicker [13].

心理声学文献中的典型掩蔽曲线与此不同，它们使用单正弦波作为掩蔽信号和被掩蔽信号，见 Fastl 和 Zwicker [13]。

Perceived beating effects occur due to the interaction of the sine-wave masker and masked sine wave with nonlinearities in the human auditory system.

由于正弦波掩蔽器和被掩蔽的正弦波与人类听觉系统中的非线性相互作用，会产生感知跳动效应。

The use of narrow-band noise as the masked signal minimizes this and acts as a good compromise for the audibility of sine waves, noises, and

使用窄带噪声作为掩蔽信号可最大限度地减少这种情况，并对正弦波、噪声和窄带噪声的可听性起到良好的折中作用。

combinations of the two. Additionally, the Author [14] had found that lower-level sine wave and critical-bandwidth noises produced similar average masking levels.

两者的组合。此外，作者[14]还发现，低电平正弦波和临界带宽噪声产生的平均掩蔽水平相似。

The masking tests took place over a 16-month period, employed 25 listeners with an average age of 30.4 years. Test subjects were approximately a
mix of male and female. Listeners with abnormal hearing were excluded. Each masking frequency test included a threshold-in-quiet test. The
masking test used 10 test subjects while the remaining frequency tests used 6 subjects each. The masking tests employed Oppo PM3 headphones because of their very low distortion and high-output capability.

掩蔽测试历时 16 个月，共使用了 25 名平均年龄为 30.4 岁的听众。测试对象男女比例约为 。听力异常的听众被排除在外。每个掩蔽频率测试都包括一个安静阈值测试。
屏蔽测试使用了 10 名测试者，其余频率测试各使用了 6 名测试者。由于 Oppo PM3 耳机具有极低的失真和高输出能力，因此掩蔽测试使用了该耳机。

掩蔽测试历时 16 个月，共使用了 25 名平均年龄为 30.4 岁的听众。测试对象男女比例约为

The masking curves for headphone distortion measurement at the ear drum are converted to ones appropriate for loudspeaker measurements by correction by the average transfer function from the loudspeaker measurement point to the ear drum for situations of frontally-positioned loudspeakers.

耳鼓处耳机失真测量的掩蔽曲线通过扬声器测量点到耳鼓的平均传递函数修正后，转换为适合扬声器测量的曲线，以适应前置扬声器的情况。

Therefore the effect of the acoustic gain of the head, torso, pinna, and ear canal in typical rooms are used to adjust the masking curves. Figure 5 shows the acoustic gain due to head, torso, pinna, and ear canal as determined in [11].

因此，典型房间中的头部、躯干、耳廓和耳道的声增益效应被用来调整掩蔽曲线。图 5 显示了 [11] 中确定的头部、躯干、耳廓和耳道的声增益。

耳鼓处耳机失真测量的掩蔽曲线通过扬声器测量点到耳鼓的平均传递函数修正后，转换为适合扬声器测量的曲线，以适应前置扬声器的情况。

Therefore the effect of the acoustic gain of the head, torso, pinna, and ear canal in typical rooms are used to adjust the masking curves. Figure 5 shows the acoustic gain due to head, torso, pinna, and ear canal as determined in [11].

因此，典型房间中的头部、躯干、耳廓和耳道的声增益效应被用来调整掩蔽曲线。图 5 显示了 [11] 中确定的头部、躯干、耳廓和耳道的声增益。

Figure 5. Acoustic gain for the head and ear

图 5.头部和耳朵的声增益

图 5.头部和耳朵的声增益

Headphone-masking thresholds referenced at the ear drum position were specified at ISO
-octave frequencies between one
-octave frequency below the
harmonic to
and at levels between
for
for 50
for
for 200,315 &
, and
for
. These are converted to masking curves for loudspeaker listening conditions by the inverse of the curve shown in figure 5. Additionally, the designated drive levels for the masking curves at 315,400 , and 500
need to be reduced by the acoustic gain of the head and ear. At 315, 400, and
, these gains are 0.86 . 1.35, and
, respectively. Rounded to the nearest decibel, the drive levels for the masking curves are reduced 1,1 , and
for masking curves at 315, 400, and
, respectively. For instance, the
headphone masking curve at
becomes the loudspeaker masking curve at
after correction by the inverse of the curve in figure 5 .

以耳鼓位置为基准的耳机掩蔽阈值指定为 ISO -倍频程频率，介于
谐波与
之间的一个
-倍频程频率，以及介于
for 50
for 200,315 &
和
for
之间的电平。通过图 5 所示曲线的倒数，将这些转换为扬声器聆听条件下的掩蔽曲线。此外，315、400 和 500
的掩蔽曲线的指定驱动电平需要根据头部和耳朵的声增益进行降低。在 315、400 和
时，这些增益分别为 0.86 .1.35 和
。四舍五入到最接近的分贝，315、400 和
的掩蔽曲线的驱动水平分别降低了 1、1 和
。例如，
耳机掩蔽曲线
，经图 5 中曲线的逆校正后，变为扬声器掩蔽曲线
。

以耳鼓位置为基准的耳机掩蔽阈值指定为 ISO

The above process results in these masking curves:

上述过程产生了这些遮蔽曲线：

上述过程产生了这些遮蔽曲线：

The values for these masking curves are indicated in table form for
-octave ISO frequencies in appendix 1. It should be noted that the hearing threshold below
may be too high by as much as
due to the amplification of physiologic noise of the test subjects by the closed volume of the Oppo PM3 headphones, see [11]. As a result, the hearing threshold and low-level masking curves may be too forgiving for low-level distortion components below
. Fortunately, masking thresholds higher than the hearing threshold should not be significantly affected.

附录 1 中以表格形式列出了 倍频程 ISO 频率的这些掩蔽曲线值。需要注意的是，由于 Oppo PM3 耳机的封闭音量放大了测试对象的生理噪音，因此
以下的听阈可能过高，高达
，见 [11]。因此，听阈和低电平掩蔽曲线可能对
以下的低电平失真成分过于宽容。幸运的是，高于听阈的掩蔽阈值应该不会受到明显影响。

附录 1 中以表格形式列出了

Next, an estimate for the masking curves at 1 decibel intervals between
are obtained by linear interpolation and extrapolation. If it is assumed that low-level masking curves are never lower the hearing threshold, downward extrapolation proceeds in a linear manner but limits at the hearing-threshold values.

接下来，通过线性内插法和外推法，对 之间 1 分贝间隔的掩蔽曲线进行估计。如果假定低电平掩蔽曲线永远不会低于听阈，则向下外推法以线性方式进行，但以听阈值为限。

See appendix 2 for the appropriate equations.

相关公式见附录 2。

接下来，通过线性内插法和外推法，对

See appendix 2 for the appropriate equations.

相关公式见附录 2。

The masking curves for 20, 50100,200 , and
fundamentals, plus interpolated curves are shown in figures
and 10 , respectively.

图 和图 10 分别显示了 20、50100、200 和
基本数据的屏蔽曲线以及内插曲线。

图

Figure 6.
masking curves

图 6. 遮蔽曲线

图 6.

Figure 7.
masking curves

图 7. 遮蔽曲线

图 7.

Figure 8.
masking curves

图 8. 遮蔽曲线

图 8.

Figure 9.
masking curves

图 9. 遮蔽曲线

图 9.

Figure 10.315 Hz masking curves

图 10.315 Hz 屏蔽曲线

图 10.315 Hz 屏蔽曲线

Examination of figures 6-10 show the masking thresholds derived from the headphone data in [11] as heavy lines. The interpolated/extrapolated curves are shown as fine lines. Note that the masking curves at lower levels merge with the hearing threshold.

图 6-10 显示了根据 [11] 中的耳机数据得出的掩蔽阈值，以粗线表示。内插/外推曲线显示为细线。请注意，较低水平的掩蔽曲线与听阈合并。

图 6-10 显示了根据 [11] 中的耳机数据得出的掩蔽阈值，以粗线表示。内插/外推曲线显示为细线。请注意，较低水平的掩蔽曲线与听阈合并。

5 失真评估算法

The distortion-assessment process analyzes the results for each sine-wave burst individually. Additionally, it performs a duplicate analysis for the quiet region
or
after the start of each burst, for a
or
SR, respectively. This analysis process allows the evaluation of the effect of background noise in the distortion-assessment process.

失真评估程序会单独分析每个正弦波脉冲串的结果。此外，它还对每个脉冲串开始后的安静区域 或
（分别为
或
SR）进行重复分析。通过这一分析过程，可以在失真评估过程中评估背景噪声的影响。

失真评估程序会单独分析每个正弦波脉冲串的结果。此外，它还对每个脉冲串开始后的安静区域

This results in two assessment processes occurring for each burst, one for the distortion-audibility values and the second for effect of background-noise errors.

这就导致对每个脉冲串进行两次评估，一次评估失真可听度值，另一次评估背景噪声误差的影响。

这就导致对每个脉冲串进行两次评估，一次评估失真可听度值，另一次评估背景噪声误差的影响。

The analysis is begun by first centering the burst in the middle of the FFT interval of 16,384 samples, which is further windowed by a flat-top window with 1024-sample fade up/down with the shape of the first and last half of a 2048-sample,
Kaiser-Bessel window. At a
SR the burst has a small overlap of 32 samples with the fade up/down regions, while at
SR a gap of 553 samples on each side of the burst exists to the start of the fade regions.

分析开始时，首先将脉冲串置于 16,384 个采样点的 FFT 间隔中间，然后用一个平顶窗口对其进行窗口化，该窗口具有 1024 个采样点的上/下渐变，其形状为 2048 个采样点的前半部分和后半部分， Kaiser-Bessel 窗口。在
SR 时，脉冲串与淡入淡出区域有 32 个采样点的微小重叠，而在
SR 时，脉冲串两侧与淡入淡出区域的起始点之间有 553 个采样点的间隔。

分析开始时，首先将脉冲串置于 16,384 个采样点的 FFT 间隔中间，然后用一个平顶窗口对其进行窗口化，该窗口具有 1024 个采样点的上/下渐变，其形状为 2048 个采样点的前半部分和后半部分，

The FFT windowing for the transform does not affect the direct burst signal but does prevent spectral splatter of the low-frequency room-noise components upward in frequency from creating errors.

变换的 FFT 窗口不会影响直接猝发信号，但会防止低频室内噪声成分的频谱飞溅造成误差。

Because the transform is longer than the burst signal, the effect of background noise is amplified by the small amounts of 0.56 or for 48 or
SR, respectively. This is true because the FFT analysis samples a longer interval for the background noise than the
sine-wave burst.

由于变换的时间比脉冲串信号长，背景噪声的影响被放大了，48 或 SR 的背景噪声分别为 0.56 或
。这是因为 FFT 分析对背景噪声的采样间隔比
正弦波脉冲串的采样间隔要长。

变换的 FFT 窗口不会影响直接猝发信号，但会防止低频室内噪声成分的频谱飞溅造成误差。

Because the transform is longer than the burst signal, the effect of background noise is amplified by the small amounts of 0.56 or

由于变换的时间比脉冲串信号长，背景噪声的影响被放大了，48 或

The perceptual distortion assessment method is shown in figure 11.

感知失真评估方法如图 11 所示。

感知失真评估方法如图 11 所示。

Figure 11. Distortion-assessment algorithm

图 11.失真评估算法

图 11.失真评估算法

Examination of this figure shows that the distortion-assessment algorithm combines the values from a level-scaled and windowed FFT with

从图中可以看出，失真评估算法将来自平移和加窗 FFT 的值与下列参数结合在一起

frequency-interpolated masking-curve values to produce distortion-audibility numbers.

频率内插的掩蔽曲线值，以产生失真可听度数字。

从图中可以看出，失真评估算法将来自平移和加窗 FFT 的值与下列参数结合在一起

frequency-interpolated masking-curve values to produce distortion-audibility numbers.

频率内插的掩蔽曲线值，以产生失真可听度数字。

The FFT-level scaling is first done to match the
value of the fundamental at the reference position. This is found using the RMS sum of the five FFT-spectral components centered as much as possible around the fundamental frequency. This scaled level is rounded to the nearest
number and used to select the appropriate masking curve. These scaled FFT-spectral values are denoted
, where
is the "bin" or spectral component index. The selected masking curve with values at the
-octave ISO frequencies is linearly interpolated to create masking values at the FFT-basis frequencies associated with each index
, and denoted
. Next, the ratio of
is calculated, with these ratios designated as audibility components.

首先进行 FFT 级缩放，以匹配参考位置上基频的 值。这是用尽可能以基频为中心的五个 FFT 频谱分量的均方根和求得的。该缩放电平四舍五入为最接近的
数字，并用于选择适当的屏蔽曲线。这些经过缩放的 FFT 频谱值以
表示，其中
是 "bin "或频谱分量索引。选定的屏蔽曲线在
倍频程 ISO 频率上的值经过线性插值，在与每个索引
相关的 FFT 基准频率上创建屏蔽值，并表示为
。然后，计算
的比率，并将这些比率指定为可听分量。

首先进行 FFT 级缩放，以匹配参考位置上基频的

These audibility components are first combined using RMS summation over critical bands to create audibility values in
. These audibility values in
are denoted sensation levels (SL). Each SL represents the number of
relative to the threshold of audibility for each ERB band individually.

首先使用临界频段的有效值求和法将这些可听度分量组合起来，以创建 中的可听度值。
中的这些可听度值被称为感觉级 (SL)。每个 SL 表示相对于每个 ERB 波段可听度阈值的
数量。

首先使用临界频段的有效值求和法将这些可听度分量组合起来，以创建

Moore's equivalent-rectangular-bandwidth (ERB) bands [15], are used to represent auditory critical bandwidths, which are used to assess the audible effect of the combination of noise and sine-wave components.

摩尔等效矩形带宽 (ERB) 波段 [15] 用来表示听觉临界带宽，用于评估噪声和正弦波成分组合的听觉效果。

Within a critical band, all sound components sum together on a power basis. The bandwidth of ERB bands in is given by equation 1 :

在临界频段内，所有声音成分的功率相加。 中的 ERB 波段带宽由公式 1 得出：

摩尔等效矩形带宽 (ERB) 波段 [15] 用来表示听觉临界带宽，用于评估噪声和正弦波成分组合的听觉效果。

Within a critical band, all sound components sum together on a power basis. The bandwidth of ERB bands in

在临界频段内，所有声音成分的功率相加。

The first ERB band is centered over the second harmonic. Succeeding ERB bands are then created adjacently, upward in frequency using the lower band-edge frequency and the assumption that the ERB bandwidth is divided equally in
above and below the center frequency. This process continues until the uppermost band has a center frequency slightly less than or equal
. The ERB as a function of lower band-edge frequency is given by equation 2 :

第一个 ERB 波段以二次谐波为中心。然后，利用较低的带边频率，并假设ERB 带宽在中心频率上下平分 ，依次向上创建 ERB 频带。这个过程一直持续到最上层频带的中心频率略低于或等于
。ERB 与低频带边缘频率的函数关系如公式 2 所示：

第一个 ERB 波段以二次谐波为中心。然后，利用较低的带边频率，并假设ERB 带宽在中心频率上下平分

When the burst frequency is less than
, the
level from first ERB band centered on the second harmonic is replaced with the
value of the RMS sum of the 3 spectral components centered as much as possible around the
harmonic of the burst frequency. This reduces the effect of spectral splatter components just below the second harmonic, takes advantage of the fact that the only significant distortion product around the second harmonic frequency is the second harmonic, and is within 0.25
of the level obtained by summation over a larger bandwidth.

当猝发频率小于 时，以二次谐波为中心的第一个 ERB 波段的
电平将被尽可能以猝发频率的
谐波为中心的 3 个频谱分量的有效值之和的
值所取代。这样可以减少低于二次谐波的频谱飞溅分量的影响，利用二次谐波频率附近唯一重要的失真产物是二次谐波这一事实，并且与在更大带宽上求和得到的电平在 0.25
范围内。

当猝发频率小于

The final step in the distortion-assessment method is determining the effect that the audibility of the multiple ERB or critical bands have on the overall perception of distortion.

失真评估方法的最后一步是确定多个 ERB 或临界频段的可听度对失真总体感知的影响。

失真评估方法的最后一步是确定多个 ERB 或临界频段的可听度对失真总体感知的影响。

Buus et al. [16] found that multiple critical-band components had increased audibility because of a statistical sharing of detection probabilities. This effect was shown to increase the audibility for
equally audible components by equation 3 .

Buus 等人[16] 发现，由于检测概率的统计共享，多个临界频段成分的可听度增加了。等式 3 表明，这种效应会增加 同样可听的分量的可听度。

Buus 等人[16] 发现，由于检测概率的统计共享，多个临界频段成分的可听度增加了。等式 3 表明，这种效应会增加

In order to accommodate components with varying sensation levels and be accurate for a single component, equation 4 was derived in [11].

为了适应感官水平不同的组件，并对单个组件进行精确计算，[11] 推导出方程 4。

为了适应感官水平不同的组件，并对单个组件进行精确计算，[11] 推导出方程 4。

The Buus combination of individual audibility values is used to calculate a total-distortion audibility metric. Additionally, distortion audibility metrics are generated for the combination of
harmonics, designated low-order harmonics (LOH), and high-frequency distortion including ERB bands above
or
for sine-wave burst frequencies of
or
, respectively. The noise analysis of the quiet regions performs the calculation used for the total-distortion audibility assessment. As a result, each burst

单个可听度值的 Buus 组合用于计算总失真可听度指标。此外，还为 谐波、指定的低阶谐波 (LOH) 和高频失真（包括高于
或
的 ERB 波段，正弦波突发频率分别为
或
）组合生成失真可听度指标。对安静区域的噪声分析进行用于总失真可听性评估的计算。因此，每个脉冲串

generates the 4 audibility values for each burst in the sequence and are associated with the appropriate acoustic level.

为序列中的每个脉冲串生成 4 个可听度值，并与相应的声级相关联。

单个可听度值的 Buus 组合用于计算总失真可听度指标。此外，还为

generates the 4 audibility values for each burst in the sequence and are associated with the appropriate acoustic level.

为序列中的每个脉冲串生成 4 个可听度值，并与相应的声级相关联。

6 失真评估示例

Five examples are used to demonstrate the perceptual distortion assessment process. These are a subwoofer tested at
, a 3-way loudspeaker with a 15 " woofer tested at
, another 15 " woofer tested at
, a coaxial loudspeaker with a 12 " woofer tested at
, and a final coaxial loudspeaker with a
woofer tested at
. Three listening rooms were used for the tests, with the first two using their own listening rooms and the third listening room used for the last three tests. The first two examples employed a
SR, while the remaining three a
SR.

我们用五个例子来演示感知失真评估过程。它们是在 测试的超低音扬声器、在
测试的配备 15 英寸低音扬声器的三路扬声器、在

我们用五个例子来演示感知失真评估过程。它们是在