Multi-Speaker Diarization
多说话人分类
Multi-Speaker Diarization involves the process of segmenting audio recordings by speaker labels. The goal of this project is to identify human speech and annotate who spoke and when. You will identify speakers as Speaker 1, Speaker 2, and so on, as the true identity of the speakers will be unknown.
多说话人分类涉及按说话人标签对录音进行分段的过程。该项目的目标是识别人类语音并注释谁在何时发言。您会将说话人识别为说话人 1、说话人 2 等,因为说话人的真实身份是未知的。
The intent of the project is the creation of speech-to-speech datasets on human-to-human conversations (namely, natural dialogues and interactions). For this reason, it is important to capture and focus on the speakers that have such natural dialogues and interactions, while ignoring speech that is not representative of conversational data (such as commercials, advertisements, showopening clips).
该项目的目的是创建关于人与人对话(即自然对话和互动)的语音到语音数据集。因此,重要的是要捕获并关注具有如此自然对话和互动的说话人,同时忽略不代表对话数据的语音(例如广告、广告、节目开场剪辑)。
In this project you will perform the following actions:
在此项目中,您将执行以下作:
1. Listen to the audio and identify segments where human sound is audible.
1. 聆听音频并确定可听到人声的片段。
2. Use your best judgement to determine if the human sound is part of a conversation with another human or not.
2. 使用您最好的判断来确定人的声音是否是与另一个人对话的一部分。
• If yes, then identify each unique voice and annotate it as a Speaker. If you cannot identify to which speaker the voice belongs, annotate as Unknown.
• 如果是,则识别每个唯一的语音并将其注释为 Speaker。如果您无法识别语音属于哪个说话人,请批注为 Unknown。
• If not, then annotate as Ignore.
• 如果不是,则注释为 Ignore。
3. Indicate the total number of unique speakers in the audio.
3. 指示音频中唯一说话人的总数。
4. Indicate if the audio file starts in the middle of human speech or sound.
4. 指示音频文件是否从人类语音或声音的中间开始。
5. Indicate if the audio file ends in the middle of human speech or sound.
5. 指示音频文件是否在人类语音或声音的中间结束。
Discard
丢弃
Discard the task in the following scenarios if they apply to the entire audio clip:
如果任务适用于整个音频剪辑,则在以下情况下丢弃该任务:
• If no speech is present in the entire clip.
• 如果整个剪辑中没有语音。
• If the audio is so badly distorted that it is impossible to determine whether speech is present.
• 如果音频严重失真,以至于无法确定是否存在语音。
• If there are audios that do not load, those tasks will not be presented to you.
• 如果有音频无法加载,则不会向您显示这些任务。
o If by any chance audios in the platform do not load, DO NOT use the discard
o 如果平台中的音频没有加载,请不要使用丢弃
option for this scenario. Try to refresh the browser, check your connection, or contact
选项。尝试刷新浏览器、检查您的连接或联系
the Centific team for more information.
Centific 团队了解更多信息。
End-to-End Process
端到端流程
To process a task in this project, follow the steps below:
要处理此项目中的任务,请执行以下步骤:
Step 1: Listen to the audio
第 1 步:收听音频
To listen to the audio recording, use the audio segmentation tool. Pay attention to the distinctive speakers you can hear.
要收听录音,请使用音频分段工具。注意你能听到的独特扬声器。
There will be no audio reference for speakers, so always refer to the audio tool to listen to the voices as much as needed to determine if a voice speaks multiple times throughout the audio.
扬声器没有音频参考,因此请始终参考音频工具以根据需要尽可能多地收听声音,以确定声音是否在整个音频中多次说话。
Step 2: Label segments
第 2 步:为区段添加标签
Use the audio segmentation tool to label segments. Each segment should contain the speech of a single speaker turn.
使用音频分段工具为片段添加标签。每个段落应包含单个扬声器轮次的语音。
If there is a dialogue between two or more speakers, each speaker turn would be in its own segment. This is also true if the voices overlap: the segments will also overlap to reflect that. To label the speaker segments, follow the instructions below:
如果两个或多个扬声器之间有对话,则每个扬声器回合都将位于其自己的段落中。如果声部重叠,也是如此:段落也会重叠以反映这一点。 要标记扬声器分段,请按照以下说明作:
Timestamp Annotation: Identify the start and end times of each speaker's turn, including instances of overlapping speech.
Timestamp Annotation(时间戳注释):确定每个说话人轮次的开始和结束时间,包括重叠语音的实例。
·To start the segment, click on the start of where the speaker audio starts and drag the cursor to the end of the segment, being careful of following all the segment requirements in the guidelines.
要开始该段落,请单击扬声器音频开始位置的开头,然后将光标拖动到段落的末尾,注意遵循指南中的所有段落要求。
▪ End the segment every time there is a pause of over 2 seconds between the end of a speaker’s last utterance and the beginning of their next utterance.
▪ 每当说话人的最后一个话语结束和下一个话语开始之间停顿超过 2 秒时,就结束该段。
·Carefully note when speakers overlap, indicating the start and end of the overlap by creating a segment. You must pay attention to the point at which one speaker stops and another begins by following the described process above.
仔细注意说话人重叠的时间,通过创建段落来指示重叠的开始和结束。您必须注意一个扬声器停止而另一个扬声器按照上述过程开始的点 。
2. Speaker Labeling: Select a label from the following list:
2. 扬声器标签:从以下列表中选择一个标签:
· Speaker 1, Speaker 2, Speaker 3, Speaker 4, Speaker 5, Speaker 6…
扬声器 1、扬声器 2、扬声器 3、扬声器 4、扬声器 5、扬声器 6...
· Unknown
未知
· Ignore
忽视
To perform the Speaker Labeling step, follow the guidance below:
要执行 Speaker Labeling 步骤,请按照以下指南进行作:
Assigning a Speaker
分配扬声器
• Assign a speaker label (for example, Speaker 1, Speaker 2, Speaker 3) to each segment of speech.
• 为每个语音段分配一个说话人标签 (例如,Speaker 1、Speaker 2、Speaker 3)。
·Assign a speaker label to the primary speaker or speakers (for example, the primary speakers of a conversation, the host or hosts of a podcast, the person being interviewed or a guest in a podcast or TV show episode).
为一个或多个主要发言人(例如,对话的主要发言人、播客的主持人或主持人、被采访的人或播客或电视节目剧集中的嘉宾)分配说话人标签。
• If the voice of the current speaker sounds distinctly different from the previous speaker, assign a new speaker label.
• 如果当前说话人的声音听起来与前一个说话人的声音明显不同,请分配新的说话人标签。
· Once you’ve assigned a specific speaker a label, that label should always be used for that voice.
为特定说话人分配标签后,该标签应始终用于该语音。
• If at all possible, provide a speaker label. There may be cases where you are unsure if a segment is from a previous speaker or a new speaker. In these cases, go back and listen carefully to the previous speaker(s) to help determine the label for the current speaker you are segmenting. Additionally, use the context of the conversation, such as the names used and voice characteristics, to inform your decision.
• 如果可能,请提供扬声器标签。在某些情况下,您可能不确定某个段落是来自以前的演讲者还是新的演讲者。在这些情况下,请返回并仔细聆听前面的说话人,以帮助确定您正在分段的当前说话人的标签。此外,使用对话的上下文(例如使用的名称和语音特征)来告知您的决定。
Labeling as “Unknown”
标记为 “未知”
• In extreme cases of uncertainty, where you are absolutely unable to determine if it is a previous or new speaker, select Unknown. This might occur due to noise or other audio events. Use the Notes section to add details about what caused or contributed to the Unknown evaluation. Here are some examples of comments you can use:
• 在极端不确定的情况下,您绝对无法确定它是以前的扬声器还是新的扬声器,请选择 Unknown(未知)。这可能是由于噪音或其他音频事件造成的。使用 Notes 部分可添加有关导致或促成 Unknown 评估的原因的详细信息。以下是您可以使用的一些注释示例:
·speaker is unintelligible due to heavy noise or distortion
扬声器由于噪音较大或失真而难以理解
· overlapping speech makes it impossible to distinguish who is speaking
重叠的语音无法区分谁在说话
·speaker’s voice is too brief or unclear to assign to a known label
说话人的声音太简短或不清晰,无法分配给已知标签
·technical issue causes part of the audio to be garbled or missing
技术问题导致部分音频出现乱码或缺失
·unknown speaker appears only once and cannot be matched to any prior or subsequent turn
未知扬声器仅出现一次,无法与任何之前或之后的回合匹配
·other: provide explanation
其他:提供解释
Labeling as “Ignore”
标记为 “忽略”
• Segment and select Ignore when there is a voice coming from a commercial, advertisement, any background speech, an agent (such as Siri, Google Home, and so on), or an opening segment or clip preceding the actual main content of the provided audio content, such as podcasts or TV shows, even if that segment contains speech or speakers.
• 当有来自广告、广告、任何背景语音、代理(如 Siri、Google Home 等) 或所提供音频内容的实际主要内容(如播客或电视节目)之前的开场片段或剪辑的声音时,即使该片段包含语音或扬声器,也请选择“忽略”。
·The defining characteristic for advertisements that should be ignored is that they are from an outside production, meaning they were clearly produced outside of the production of the main content and later inserted. Advertisements or sponsorships from outside productions will have external speakers that sound different than the speaker(s) in the main production and may have other differentiating characteristics like volume or audio quality.
应该忽略的广告的决定性特征是它们来自外部制作,这意味着它们显然是在主要内容的制作之外制作的,后来入。来自外部制作的广告或赞助将具有外部扬声器,这些扬声器听起来与主要制作中的扬声器不同,并且可能具有其他差异化特征,例如音量或音频质量。
· In cases where context supports that the host of a podcast, for example, is doing an ad or sponsorship, do not select Ignore. Instead, give this segment a Speaker label. This includes cases where it is a monologue of an ad or sponsorship from the podcast host. These cases may be ambiguous and require you to use your best judgement to determine if it is the podcast host, or an outside speaker.
如果上下文支持播客的主持人(例如,正在做广告或赞助),请不要选择 Ignore (忽略)。相反,请为此区段指定一个 Speaker 标签。这包括播客主持人的广告独白或赞助的情况。这些情况可能模棱两可,需要您根据自己的判断力来确定是播客主持人还是外部演讲者。
▪ In these cases, you can identify the criteria of the audio that caused this label in the Notes field.
▪ 在这些情况下,您可以在 Notes 字段中确定导致此标签的音频的标准。
· If there is an audio sequence that contains multiple speakers or voices, but it is all a part of one sequence such as background music vocals, intro segments, commercials, ads, and so on, create a single segment and mark as Ignore.
如果音频序列包含多个扬声器或语音,但它都是一个序列的一部分,例如背景音乐人声、介绍片段、广告、广告等,请创建一个片段并标记为 忽略。
· Similarly to background music and songs, ignore any pre-recorded segments (for example, [sports event (could contain player voices, crowd cheering) + commentators (1+)]), that do not fit into the main conversation. Such segments may be in the foreground or background.
与背景音乐和歌曲类似,请忽略任何不适合主对话的预先录制的片段(例如,[体育赛事(可能包含球员声音、人群欢呼声)+ 评论员 (1+)])。此类区段可能位于前台或后台。
▪ Even if it is a pre-recorded segment, you should still segment it, like Example 1 in the Examples section. The pre-recorded segment is a part of the main line of exchange, even if it is not the main hosts or narrators speaking (they could refer to the pre-recorded segment and the speakers in the segment).
▪ 即使它是预先录制的区段,您仍应对其进行分段,如 Examples 部分中的 Example 1。预先录制的片段是主要交换线的一部分,即使它不是主要主持人或旁白发言(他们可以指预先录制的片段和片段中的演讲者)。
· In all these cases, you must specify in the comment box what type of element is being marked as Other:
在所有这些情况下,您必须在注释框中指定标记为 Other 的元素类型:
▪ background music with vocals
▪ 带人声的背景音乐
▪ intro song/audio
▪ 介绍歌曲/音频
▪ intro segment with various speech segments
▪ 包含各种语音片段的介绍片段
▪ commercial with speech segments
▪ 带有语音片段的商业广告
▪ advertisement with speech segments
▪ 带有语音片段的广告
▪ other: provide explanation
▪ 其他:提供解释
· Do not select Ignore for characters from movies, for example, whose voice belong to actual actors. Even if the character may have a robotic voice, such as Sonny from the movie “I, Robot”, or J.A.R.V.I.S from the Iron Man movies, these are still human voices. This means:
不要为电影中的角色选择“忽略”,例如,其声音属于实际演员。即使角色可能有机器人的声音,例如电影《我,机器人》中的桑尼,或钢铁侠电影中的 J.A.R.V.I.S,这些仍然是人的声音。这意味着:
▪ All human speech should be labeled if it contributes to the main storyline, dialogue, or conversational context of the podcast or show.
▪ 如果所有人类语音有助于播客或节目的主要故事情节、对话或对话上下文,则应对其进行标记。
▪ This includes voices performed by human actors—even if they represent fictional or robotic characters—as long as their dialogue or monologue is integrated into the content or referenced as part of the conversation.
▪ 这包括由人类演员表演的声音,即使他们代表虚构或机器人角色, 只要他们的对话或独白被整合到内容中或作为对话的一部分引用即可。
▪ Do not use Ignore for these cases, even if the character is fictional or uses a robotic voice, as long as their speech is relevant to the storyline.
▪ 在这些情况下,即使角色是虚构的或使用机器人语音,也不要使用 Ignore, 只要他们的语音与故事情节相关。
▪ Only label a segment as Ignore if the human speech occurs in a commercial or advertisement that is unrelated to the main storyline or the conversational content of the podcast or show.
▪ 仅当人类语音出现在与播客或节目的主要故事情节或对话内容无关的广告或广告中时,才将片段标记为 Ignore。
▪ In summary: tag all speech that is part of the main narrative or conversation, regardless of whether the voice is natural, robotic, or fictional, and only Ignore isolated, unrelated commercial content.
▪ 总之:标记属于主要叙述或对话一部分的所有语音,无论语音是自然的、机器人的还是虚构的,并且仅忽略孤立的、不相关的商业内容。
NOTE: A segment can overlap completely with another speech segment (for example, speaker 1 is talking, speaker 2 backchannels; speaker 1 does not make a pause longer than 2 seconds).
注意:一个段落可以与另一个语音段落完全重叠(例如,说话人 1 正在讲话,说话人 2 返回声道;说话人 1 的暂停时间不超过 2 秒)。
You must use careful judgement to identify if the pause is significant or not. A pause is considered significant if the speaker pauses for over 2 seconds. If they simply speak slowly, make a pause due to an interruption for under 2 seconds, or to think about their next word, then the pause for under 2 seconds, none of this significant and should not be considered as a new segment.
您必须仔细判断以确定暂停是否重要。如果说话人暂停超过 2 秒,则认为暂停很重要。如果他们只是说得很慢,由于中断不到 2 秒而暂停,或者考虑他们的下一个词,然后暂停不到 2 秒,这些都没有,不应被视为新的片段。
For further details on the Timestamp Annotation and the Speaker Labeling steps, see the Best Practices section.
有关 Timestamp Annotation 和 Speaker Labeling 步骤的更多详细信息,请参阅 最佳实践 部分。
If you select Unknown or Ignore, you should add Notes in the appropriate text field in the UI. You are encouraged to use them for the following:
如果选择 Unknown 或 Ignore,则应在 UI 的相应文本字段中添加 Notes。我们鼓励您将它们用于以下用途:
• For Unknown labels, input any relevant elements or details that impacted your ability to
• 对于“未知”标签,请输入影响您
identify a speaker
识别说话人
• For Ignore labels, identify the criteria which were met that resulted in this label.
• 对于 Ignore labels,确定生成此标签的满足条件。
Step 3: Indicate the total number of unique speakers
第 3 步:指示唯一说话人的总数
After you have completed the audio segmentation, indicate the total number of distinct speakers labeled as “Speaker 1,” “Speaker 2,” “Speaker 3,” etc., across the entire audio.Only include segments that have been assigned a specific Speaker label. Do not count segments labeled as “Unknown” or “Ignore” in your total.
完成音频分段后,请指示整个音频中标记为“扬声器 1”、“扬声器 2”、“扬声器 3”等的不同扬声器的总数 。仅包含已分配特定 Speaker 标签的区段。不要计算总数中标记为 “Unknown” 或 “Ignore” 的区段。
Enter the total number of distinct speakers in the text box How many unique speakers did you identify in the whole audio? Only enter whole numbers in the text box.
在文本框中输入不同说话人的总数 您在整个音频中识别了多少个唯一说话人?仅在 文本框中输入整数 。
Step 4: Indicate if the audio file starts in the middle of speech
第 4 步:指示音频文件是否从语音中间开始
If the audio file for the task starts in the middle of speech, meaning someone is talking when the audio file begins, select Yes.
如果任务的音频文件在语音中间开始,这意味着在音频文件开始时有人正在说话,请选择 Yes(是)。
If the audio file for the task does not start in the middle of speech, meaning no one is talking when the audio file begins, select No.
如果任务的音频文件未在语音中间开始,这意味着音频文件开始时没有人在说话,请选择 No(否)。
This applies both when the audio cuts a word off, or when the speaker is cut off mid-sentence.
这既适用于音频中断单词时,也适用于说话者在句子中途中断时。
- Word cut off example: “…nk this is an example of…”. Here, the word “think” is cut, beginning at the middle of a word.
- 断词示例:“...NK 这是一个例子......”。在这里,单词 “think” 被剪切,从一个单词的中间开始。
- Mid-sentence cut off example: “…is an example of…”. Here the sentence is clearly continuing before “is an example”, but here the audio is starts after it starts.
- 句子中间截断示例:“...是...“的一个例子。这里的句子显然是在 “is an example” 之前继续的,但这里的音频是在它开始之后开始的。
Step 5: Indicate if the audio file ends in the middle of speech
第 5 步:指示音频文件是否在语音中间结束
If the audio file for the task ends in the middle of speech, meaning someone is talking when the audio file ends, select Yes.
如果任务的音频文件在语音中间结束,这意味着音频文件结束时有人正在说话,请选择 Yes(是)。
If the audio file for the task does not end in the middle of speech, meaning no one is talking when the audio file ends, select No.
如果任务的音频文件未在语音中间结束,这意味着音频文件结束时没有人在说话,请选择 No(否)。
This applies both when the audio cuts a word off, or when the speaker is cut off mid-sentence.
这既适用于音频中断单词时,也适用于说话者在句子中途中断时。
- Word cut off example: “I think this is an examp…”. Here, the word “example” is cut.
- 断词示例:“我认为这是一个示例 ...”。在这里,“示例”一词被删减。
- Mid-sentence cut off example: “I think this is an example of…”. Here the sentence is clearly continuing, but here the audio is cut before it could end.
- 句子中间截断示例:“我认为这是一个例子......”。这里的句子显然是在继续,但这里的音频在结束之前就被剪掉了。
Step 6: Review and submit the task
第 6 步:查看并提交任务
The final task output should include the start and end timestamps for each speaker's turn (including overlaps) and the associated speaker labels, as well as any notes for the Unknown/Ignore labels.
最终任务输出应包括每个发言者回合的开始和结束时间戳(包括重叠)和关联的发言者标签,以及 Unknown/Ignore 标签的任何注释。
Best Practices
最佳实践
To annotate a task in this project, use the following best practices:
要批注此项目中的任务,请使用以下最佳实践:
• Mark the speaker index for each segment. The first speaker of the segment will be Speaker 1, the second unique speaker would be Speaker 2, and so on.
• 标记每个区段的说话人索引。区段的第一个发言人将是发言人 1,第二个唯一发言人将是发言人 2,依此类推。
• Pay close attention to instances of overlapping speech, where two or more speakers are talking simultaneously. In these cases, note when one speaker stops talking and another starts, even if the overlap occurs.
• 密切注意语音重叠的情况,即两个或多个说话者同时说话。在这些情况下,请注意一个说话人何时停止讲话而另一个说话人开始说话,即使发生重叠也是如此。
· When you hear overlapped speech between multiple speakers (for example, Speaker 1 and Speaker 2), note down when Speaker 1 stops speaking and Speaker 2 starts speaking. Speaker 2 can start speaking earlier than when Speaker 1 can stop speaking.
当您听到多个说话人(例如,说话人 1 和说话人 2)之间重叠的语音时,记下说话人 1 停止说话而说话人 2 开始说话的时间。说话人 2 可以比 说话人 1 可以停止说话的时间更早开始说话。
• Ensure that start and end times, as well as speaker labels, are accurately and consistently applied. Such as, Speaker 1’s voice should always receive the Speaker 1 label.
• 确保准确一致地应用开始和结束时间以及扬声器标签。例如,说话人 1 的语音应始终带有说话人 1 标签。
• Provide clear and comprehensive annotations that accurately capture the conversational dynamics, including all speaker turns and overlaps.
• 提供清晰全面的注释,准确捕捉对话动态,包括所有说话人轮换和重叠。
• Speaker utterance boundaries (start and end time) should be accurate to 250 ms precision.
• 说话人话语边界(开始和结束时间)应精确到 250 毫秒 。
• Label the following verbal sounds as long as they are a part of a natural dialogue or interaction between speakers:
• 标记以下语音,只要它们是说话者之间自然对话或互动的一部分:
· Laughter from another person, as this is feedback for what another speaker is saying;
来自另一个人的笑声,因为这是对另一个说话者所说的反馈;
· Backchannel is when the person listening produces audible vocal activity to signal active listening, such as: uh-huh, mmmm, and all other sounds that show active listening by providing relevant feedback to what is being said, without using words (they can show approval, disagreement, encouragement, and so on).
反向通道是指听众产生可听见的声音活动来表示积极倾听,例如:uh-huh、mmmm 和所有其他通过为所说内容提供相关反馈来表示积极倾听的声音,而无需使用语言(它们可以表示赞同、不同意、鼓励等)。
• Label audible and obvious breathing with the corresponding speech when applicable. The main focus of segmenting is to capture conversation. If there is no audible and obvious breathing after speech, do not search for it. If there is audible and obvious breathing that impacts the conversation, such as a gasp, label it with the corresponding speech.
• 在适用时用相应的语音标记可听见和明显的呼吸。 细分的主要重点是捕获对话。如果说话后没有可听见和明显的呼吸声,请不要搜索。如果有影响对话的可听见和明显的呼吸声,例如喘息,请用相应的语音进行标记。
· A segment end should include audible non-verbal sounds (such as breath or continuation of sibilant ‘s’ sounds), and that they should be part of the same segment.
段落结尾应包括可听见的非语言声音(例如呼吸或咝咝声 's' 音的延续),并且它们应属于同一段落。
· Any vocalization that conveys meaning, functions as feedback or acknowledgment, or is clearly tied to the speaker’s intent or delivery (e.g., sighs, gasps, heavy inhales, “sarcastic” throat sounds) should be labeled as part of speech.
任何传达含义、起到反馈或承认功能或明显与说话者的意图或表达有关的发声 (例如,叹息、喘息、重吸气、“ 讽刺”喉咙声音)都应该标记为词性。
• Do not label any of the following human sounds:
• 请勿标记以下任何人声:
· Speech from crowds: an audience laughing or cheering;
人群中的演讲:观众大笑或欢呼;
· Crying;
哭;
· Coughing;
咳嗽;
▪ If more than two people are simultaneously talking or making verbal sounds (laughter, "uh-huh," "mmmm," and so on), accurate labeling can be very difficult and time-consuming. Just label what you can, but do not spend too much time on it.
▪ 如果两个以上的人同时说话或发出口头声音(笑声、“嗯嗯”、“ 嗯嗯 ”等),则准确标记可能非常困难且耗时。只需标记你能标记的内容,但不要花太多时间在上面。
• Do not create multiple segments if an intro, ad, and so on (which should be marked as
• 如果片头、广告等(应标记为
“Ignore”) contains multiple voices. Create a single segment.
“Ignore”) 包含多个声部。创建单个区段。
• If a single speaker takes a pause longer than 2 seconds and then resumes, mark the segment before and segment after into their own segments.
• 如果单个发言者的暂停时间超过 2 秒,然后继续播放,请将 Before 和 Segment After 标记为各自的段落。
· A less than 2-second pause should not result in the creation of a new segment.
少于 2 秒的暂停不应导致创建新区段。
• To decide if a voice should be ignored use the context present in the audio. For example, if a podcast presenter says a commercial will follow, and then you hear what sounds like a commercial, select Ignore.
• 要决定是否应忽略某个声部,请使用音频中存在的上下文。例如,如果播客主持人说随后将有广告,然后您听到听起来像广告的声音,请选择 Ignore (忽略)。
· Also, select Ignore for any opening segment or clip that comes before the main content of the provided audio content (for example, a podcast) if it includes outside speakers, meaning a speaker other than the podcast host, for example. Such segments often arise at the beginning of a podcast (or TV or radio show) episode, before the host actually starts speaking to introduce the episode.
此外,如果它包含外部扬声器,即播客主持人以外的扬声器,则为位于所提供音频内容的主要内容(例如,播客)之前的任何片头片段或剪辑选择 Ignore。这样的片段通常出现在播客(或电视或广播节目)剧集的开头,在主持人真正开始讲话介绍该剧集之前。
▪ In these cases, if context supports that it is the podcast host speaking, or if the only audio is the podcast host giving a monologue that could be an advertisement or sponsorship, do not select Ignore. Give a Speaker label in these instances. These cases may be ambiguous and require you to use your best judgement to determine if it is the podcast host, or an external speaker from an outside production.
▪ 在这些情况下,如果上下文支持是播客主持人在讲话,或者如果唯一的音频是播客主持人发表的独白,这可能是广告或赞助,请不要选择 Ignore (忽略)。在这些情况下,提供 Speaker 标签。这些情况可能模棱两可,需要您根据最佳判断来确定是播客主持人 ,还是外部制作的外部演讲者。
·Ignore pre-recorded segments (for example, [sports event (could contain player voices, crowd cheering) + commentators (1+)]), that are not a part of the conversation.
忽略预先录制的片段(例如,[体育赛事(可能包含球员声音、人群欢呼声)+ 评论员 (1+)]),这些片段不属于对话的一部分。
▪ Such segments may start in the foreground, could pass into the background (for example, when podcast hosts talk over the recording), and then stop or fade away -- the key is whether or not the speakers have been arranged to be part of the dialogue with the primary speaker or speakers, or not.
▪ 这样的片段可能从前台开始,可以传递到背景(例如,当播客主持人在录音中讲话时),然后停止或淡出——关键是演讲者是否被安排成为与主要演讲者或演讲者对话的一部分。
▪ The guideline to ignore does not apply to all pre-recorded segments. If the pre-recorded segments are interleaved somewhat naturally as part of the foreground conversation (for instance, a prerecorded interview on a topic discussed in the audio) segment and label as if the pre-recorded segments are part of the conversation.
▪ 忽略准则不适用于所有预先录制的区段。如果预先录制的区段作为前台对话的一部分(例如,关于音频中讨论的主题的预先录制的采访)自然地交错,则区段和标签就像预先录制的区段是对话的一部分一样。
· Echoing voices—when they are part of the speaker’s original turn—should be labeled as part of the same segment, with the end time placed at the final audible echo rather than the end of the spoken word.
回声(当它们是说话人原始轮次的一部分时)应标记为同一段落的一部分,结束时间位于最终可听见的回声处,而不是口语单词的结尾处。
▪ E.g., in “Wondrium... um…um”, where the two last “um” pieces are echoes, the end of the segment should be on the last “um”, not on the end of the initial word, “Wondrium”.
▪ 例如,在“Wondrium...嗯......um“,其中最后两个 ”um“ 片段是回声,则段落的结尾应该在最后一个 ”um“ 上,而不是在首词 ”Wondrium“ 的末尾。