Image that reads free open-source ai voice generators.

Elevenlabs is a great AI voice generator but it comes with a hefty price tag and a fairly barebones user interface.
Elevenlabs 是一个出色的人工智能语音生成器,但它价格昂贵,用户界面也比较简陋。

Here, you will find unlimited & free Elevenlab alternatives:
在这里,你可以找到无限的及免费的 Elevenlab 替代品:

  • Free Text-to-Speech tools plus open-source options
    免费的文本转语音工具加上开源选项
  • Fast and unlimited voice generation
    快速而无限的语音生成
  • Easy to setup (no-code or low-code)
    简单易设(无代码或低代码)

Here my top 3 free AI voice generators. Keep reading for a detailed overview.
以下是我要推荐的三大免费 AI 语音生成器。继续阅读以获取详细概述。

Tool 工具Best For 最佳选择
Coqui TTS 科 QUI TTSGeneral purpose voice generation; fantasy screenplays; their xtts-v2 (huggingface link) model reaches 11lab quality in voice cloning; each model comes with its own usage terms (XTTS is non-commercial)
通用语音生成;奇幻剧本;他们的 xtts-v2(huggingface 链接)模型在声音克隆方面达到 11lab 的质量标准;每个模型都有自己的使用条款(XTTS 为非商业用途)
Mycroft Mimic3 我的克罗伊特模拟器 3Personal voice assistant; Works offline
个人语音助手;离线工作
Tortoise 乌龟Best quality but slow (Alternative: Playht turbo, a faster freemium TTS)
最佳质量但速度慢(替代方案:Playht 涡轮,更快的免费 TTS)
Top 3 Elevenlabs Alternatives
最佳 Elevenlabs 替代方案前 3 名

For those who prioritize user interfaces and additional features, and don’t mind exploring plans that might come with costs, check out our roundup of the Best AI Voice Generators with free plans available.
对于重视用户界面和额外功能,不介意考虑可能带有一些费用计划的人来说,可以查看我们整理的最佳 AI 语音生成器列表,其中包含免费计划。

If you are okay with writing python code, open AI TTS is 6x cheaper than Eleven Labs and just as good:
如果你不介意编写 Python 代码,那么 Open AI TTS 比 Eleven Labs 便宜 6 倍,而且效果一样好:

1. Coqui TTS 1. 科 QUI TTS

Meet Coqui TTS. It’s a simple tool that helps you turn text into speech. You can start for free with its Python library which supports 100s of TTS models.
认识一下 Coqui TTS。这是一个简单的工具,可以帮助你将文本转化为语音。你可以免费开始使用它的 Python 库,该库支持数百种 TTS 模型。

Image shows coquitts platform

Key Features 主要特性

  • Easy to use: Available as a free python library, and paid API and webapp.
    简单易用:提供免费的 Python 库,以及付费的 API 和网页应用。
  • Multilingual: Supports 13 languages.
    支持 13 种语言的多语言功能。
  • Multi-speaker TTS: Add multiple characters to voiceover.
    多说话者 TTS:为配音添加多个角色。
  • Advanced timeline editor: Adjust pitch, loudness and emotions, for each sentence, word or character.
    高级时间线编辑器:可以调整每个句子、单词或字符的音高、音量和情感。
  • Voice cloning: Clone any voice from 3 seconds of audio and add to your collection.
    语音克隆:仅需 3 秒的音频即可克隆任何声音并将其添加到您的收藏中。
  • Prompt 2 Voice: Generate voices from prompt.
    提示 2 语音:根据提示生成语音。
  • Support for large number of TTS models including:
    支持大量 TTS 模型,包括:
    • Tortoise 乌龟
    • Bark 翻译文本:狗吠声
    • Tacotron
    • Fastspeech and more. 快速语音和更多。
    • xtts-v2

Note: The Coqui code is released under the MPL license. What does this mean? The TTS code and models have explicit licenses. TTS as a code base is under MPL2.0 (allows commercial use) and each model has its own license (may not allow commercial use). The model creator chooses the license.
注意:Coqui 代码是根据 MPL 许可证发布的。这意味着什么?TTS 代码和模型有明确的许可证。TTS 代码库本身遵循 MPL 2.0 许可证(允许商业使用),而每个模型都有自己的许可证(可能不允许商业使用)。模型创建者自行选择许可证。


Example: models from Meta are under a Creative Commons non-commercial license. But the XTTS Model does not allow you to use it commercially without paying for a license. 😢You can buy a commercial use license from Coqui.
示例:Meta 的模型采用创意共享非商业许可。但 XTTS 模型不允许你在未购买许可证的情况下进行商业使用。😢你可以从 Coqui 购买商业使用许可证。

Pros 优点

  • Easy-to-use colab notebook.
    易于使用的 Colab 笔记本。
  • Multiple emotional tones and styles
    多种情感色调和风格
  • You can generate your own voices from text prompts plus fuse two voices using Voice fusion.
    您可以从文本提示生成自己的语音,并使用语音融合将两种声音融合在一起。
  • Voice cloning is fast and high quality.
    语音克隆既快速又高质量。
  • Best voices for fantasy/storytelling use cases.
    最适合奇幻/讲故事场景的声音。

Cons 缺点

  • Commercial license for XTTS model is paid.
    XTTS 模型的商业许可证是付费的。

2. Bark by Suno.ai 源文:2. Suno.ai 的 Bark 翻译结果:2. Suno.ai 的 Bark

Bark is like your personal studio for creating voices and music. You don’t need to pay anything to get started.
Bark 是你的个人声音和音乐创作工作室,开始使用完全不需要付费。

Key Features 主要特性

  • Lots of Choices: Over 100 voice presets to pick from, plus new ones from other users on Discord.
    多种选择:超过 100 种语音预设可供选择,还有来自 Discord 上其他用户的新增预设。
  • Smart Language Handling: Bark can handle texts in many languages, even if they’re mixed together.
    智能语言处理:Bark 可以处理多种语言的文本,即使它们混合在一起。
  • Sings Too: Not just talk—Bark can create singing voices.
    不仅会说—Bark 还能创造歌唱声音。

Pros 优点

  • Sounds Real: Whether it’s speaking in different languages or making music, Bark sounds like the real thing.
    听起来很真实:无论是说不同语言还是制作音乐,Bark 的声音都像真的一样。
  • Expressive: It can laugh, sigh, cry—just like a person.
    表现力强:它可以笑、叹气、哭,就像人一样。
  • Commercial Use: You can use Bark for your projects, even to make money.
    商业使用:您可以将 Bark 用于您的项目,甚至可以赚钱。
  • Community Support: Join the Discord to meet others and find new voice presets.
    社区支持:加入 Discord 以结识他人并发现新的语音预设。
  • Big Library: There’s a huge collection of voice prompts to explore.
    大图书馆:这里有大量的语音提示可供探索。

Cons 缺点

  • No Web App Yet: You’ll need to use Colab or Discord to try it out for now, but it’s still free and easy.
    目前还没有网页应用:你现在需要使用 Colab 或 Discord 来尝试,但仍然是免费和方便的。

3. Tortoise TTS 3. 乌龟文本转语音

Tortoise TTS is all about making text sound as natural as it can get. It’s a text-to-speech model that James Betker designed to make voices that sound really true-to-life.
乌龟 TTS 的目标是使文本听起来尽可能自然。这是一个由 James Betker 设计的文本转语音模型,旨在创造出真正逼真的声音。

Key Features 主要特点

  • High Fidelity Voice Cloning: Create voices that sound just like the input audio sample.
    高保真语音克隆:创建听起来与输入音频样本完全一样的声音。
  • Realistic AI Voiceovers: Make your text come to life with voices that are hard to tell apart from real humans.
    逼真的 AI 语音:让您的文本通过难以与真人区别的声音栩栩如生。
  • Elevenlab reportedly uses a fine-tuned clone of Tortoise TTS.
    据报道,Elevenlab 使用了 Tortoise TTS 的微调克隆版本。

Pros 优点

  • Top-Notch Voices: The voices you can create are super clear and sound great.
    顶级音质:您可以创建的声音非常清晰,听起来很棒。
  • Master at Cloning: It’s really good at making new voices from just a small bit of audio from someone. This is perfect for making lots of different voices, even famous ones.
    克隆大师:它非常擅长仅从一小段音频中创造新的声音。这非常适合制作各种不同的声音,甚至是著名的声音。
  • Quality Voices: The voices you make with it are of very high quality.
    优质语音:使用它制作的语音质量非常高。
  • Control How It Speaks: You can adjust how the voice talks—its tone, feeling, speed, and more—by changing the text prompt you give it (Like typing “I am sad” in text makes the ai voice sound sadder).
    控制发音方式:你可以通过改变给它的文本提示来调整声音的语气、情感、速度等(比如在文本中输入“我很伤心”会让 AI 语音听起来更悲伤)。

Cons 缺点

  • Just English: Right now, it can only make voices in English and can’t make sound effects.
    目前,它只能制作英语语音,无法制作音效。
  • It can be tough to get it set up and it’s pretty slow.
    设置起来可能很困难,而且速度相当慢。

James stopped working on Tortoise (at least in public) in view of ethical considerations (the model is really good, and he fears it may be used for fraud if optimized for faster output).
詹姆斯出于道德考虑(模型非常有效,他担心如果优化以提高输出速度,可能会被用于欺诈)而停止了 Tortoise 的工作(至少在公开场合是这样)。

But it is still a good model to try out and to read through the code from an engineering standpoint.
但这仍然是一个值得尝试和从工程角度阅读代码的好模型。

4. Play.ht Playground 4. Play.ht 游乐场

Play.ht gives you a world of voices—907 AI voices in 142 languages and accents. It’s great for reaching a wide audience, from local dialects to global languages.
Play.ht 为您提供了一个充满声音的世界——142 种语言和口音的 907 个人工智能声音。这非常适合接触广泛的受众,从地方方言到全球语言。

Including this because at time of writing their free plan is pretty generous. But this is not open source.
包括这一点,因为在撰写本文时,他们的免费计划非常慷慨。但这个不是开源的。

Key Features 主要特性

  • Lots of Voices: Play.ht has a huge library of 907 AI voices that cover 142 languages and accents. That means you can find the perfect voice for any audience, including local languages like Malayalam and Telugu.
    众多声音:Play.ht 拥有庞大的 907 种 AI 语音库,涵盖 142 种语言和口音。这意味着您可以为任何受众找到完美的声音,包括马拉雅拉姆语和泰卢固语等地方语言。
  • Just Like Real: The voices are made to sound just like a person’s voice. This is great for when you want someone to listen to your audiobook or learn something new and feel like someone real is talking to them.
    就像真实的人一样:这些声音被制作得就像一个人的声音一样。这非常适合当你希望有人听你的有声书或学习新知识时,感觉像是一个真实的人在与他们交谈。
  • Pick Your Voice Style: No matter what you’re making—a news report, a chat with customers, or anything else—there’s a voice style ready for you. You can choose from styles like Newscaster, Conversational, or Customer Support, among others.
    选择您的语音风格:无论您是在制作新闻报道、与客户聊天还是其他任何内容,都有适合您的语音风格。您可以从新闻播报、对话式、客服支持等风格中选择。
  • Clone Voices Well: If you need a voice that sounds like a specific person, you can clone it with Play.ht. This is an extra feature you can add on, and it does a really good job of copying voices.
    克隆声音出色:如果您需要一个听起来像特定人的声音,可以通过 Play.ht 进行克隆。这是一个可额外添加的功能,它在复制声音方面做得非常好。
  • SEO-Optimized Audio Articles: Enhance your website’s accessibility and search engine presence by converting text articles into audio formats using Play.ht’s convenient audio widget.
    SEO 优化的音频文章:使用 Play.ht 的便捷音频小工具将文本文章转化为音频格式,提高网站的可访问性和搜索引擎排名。
  • Custom Pronunciation Library: Address the common issue of mispronunciation by voice generators by building a custom pronunciation guide within Play.ht, ensuring your audio content sounds just right.
    自定义发音库:通过在 Play.ht 中构建自定义发音指南,解决语音生成器常见的发音问题,确保您的音频内容听起来恰到好处。
  • Direct Podcast Distribution: Streamline your workflow by distributing your audio directly to popular platforms such as iTunes, Spotify, and Google Podcasts from the Play.ht dashboard, eliminating the need for multiple upload/download steps.
    直接播客分发:通过 Play.ht 仪表板直接将音频分发到 iTunes、Spotify 和 Google 播客等流行平台,简化工作流程,避免多次上传/下载步骤。

Pros 优点

  • Precision in Pronunciation: It excels in accurately pronouncing technical words and acronyms, making it pretty useful for educational content.
    精准发音:它擅长准确发音技术词汇和缩写,非常适合教育内容。
  • Generous Free Tier: Dip your toes in with a free plan that includes 2500 words.
    慷慨的免费层:您可以使用包括 2500 个单词的免费计划初步体验。
  • Word Limit Flexibility: With basic plans offering 3 million characters per year, you won’t easily run out of capacity.
    字数限制灵活性:基础计划提供每年 300 万字符,您不会轻易用完容量。
  • Authenticity in Voices: The ultra-realistic voices are fine-tuned to closely mimic human intonation and emotion.
    真实的人声:超逼真的声音经过精细调整,以紧密模仿人类的语调和情感。
  • Multilingual Voice Cloning: Not only does it clone voices, but it does so across multiple languages, a feature not commonly found elsewhere.
    多语言语音克隆:它不仅能够克隆声音,而且可以在多种语言之间进行,这是其他地方不常见的功能。
  • Diverse Language Support: Extensive collection of non-English language options, like Hindi.
    多语言支持:广泛的非英语语言选择,如印地语。

Cons 缺点

  • The starting plan is at $30, which might be steep for users with minimal voiceover needs.
    起始计划为 30 美元,对于只需少量配音的用户来说,可能会有些昂贵。

5. Mycroft Mimic 3

Mimic 3 is a tool that respects your privacy and is completely open-source, which means anyone can use or modify it.
Mimic 3 是一个尊重您隐私并且完全开源的工具,这意味着任何人都可以使用或修改它。

It’s a neural Text to Speech (TTS) engine, designed to deliver high-quality voice output that you can use right from your own devices, without needing an internet connection.
这是一款神经网络文本转语音(TTS)引擎,旨在提供高质量的语音输出,您可以在自己的设备上直接使用,而无需互联网连接。

They’re also working on a cloud version for those who prefer simplicity or have devices with limited processing power.
他们还在为喜欢简单或设备处理能力有限的人开发云版本。

Pros 优点

  • Quality Voices: The voice output is clear and natural-sounding.
    优质语音:语音输出清晰且自然。
  • It can run completely offline.
    它可以完全离线运行。
  • Suitable for low-end hardware.
    适用于低端硬件。

Cons 缺点

  • Voices are not very expressive.
    声音不是非常有表现力。
Try Mycroft Mimic 尝试使用 Mycroft Mimic

6. silero-models 6. silero 模型

Silero Models offers pre-trained models that make Speech-to-Text (STT) and Text-to-Speech (TTS) tasks straightforward for businesses.
Silero Models 提供预先训练好的模型,使企业能够轻松地进行语音转文本(STT)和文本转语音(TTS)任务。

They pride themselves on providing STT services that are on par with, and sometimes even surpass, the quality of Google’s offerings, all without the complexity typically associated with such technology.
他们自豪地提供与谷歌服务相当,有时甚至超越其质量的 STT 服务,而且没有通常与这种技术相关的复杂性。

Key Features 主要特点

  • High-Quality STT: Their Speech to Text is refreshingly easy to use—just check their benchmarks (on Github) to see how they stack up against the competition.
    高质量的 STT:他们的语音转文本使用起来非常简单,只需查看他们在 GitHub 上的基准测试,就可以看到他们与竞争对手的对比情况。
  • Hassle-Free TTS: Silero provides Text to Speech models that are ready to use with just one line of code, boasting a broad selection of voices and a simple, dependency-free setup.
    无麻烦的 TTS:Silero 提供了只需一行代码即可使用的文本转语音模型,拥有广泛的语音选择和简单的、无需依赖的设置。
  • Efficient and Fast: These models are optimized for speed, running faster than real-time speech on a single CPU thread, with support for both 16kHz and 8kHz audio.
    高效且快速:这些模型经过优化,单个 CPU 线程运行速度超过实时语音,支持 16kHz 和 8kHz 音频。

Pros 优点

  • No Complex Setup: You won’t need to deal with Kaldi, compilations, or lengthy instructions to get started.
    无需复杂设置:您无需处理 Kaldi、编译或繁琐的说明即可开始使用。
  • High-Performance Speech: The end-to-end pipeline ensures the speech sounds natural, and you don’t need a GPU or any training to begin.
    高性能语音:端到端的流程确保语音听起来自然,您无需 GPU 或任何训练即可开始使用。
  • Language Support: It supports Russian, English, German, and Spanish, and has the potential to be extended further.
    语言支持:它支持俄语、英语、德语和西班牙语,并有潜力进一步扩展。
  • Text Readability: Their model can insert punctuation and capitalization effectively, making texts more readable.
    他们的模型可以有效插入标点符号和大写,使文本更易读。

7. MockingBird MockingBird

MockingBird is a Python-based project that specializes in cloning voices quickly—just 5 seconds—and enables the generation of speech in real time. It’s built for working with Chinese, providing seamless real-time voice cloning.
MockingBird 是一个基于 Python 的项目,专长于快速克隆声音——只需 5 秒——并能够实时生成语音。它适用于中文,提供无缝的实时语音克隆。

Key Features 主要特性

  • Chinese Language Support: Works with Mandarin and tested on several datasets (aidatatang_200zh, magicdata, aishell3, data_aishell).
    中文语言支持:支持普通话,并已在多个数据集上进行测试(aidatatang_200zh,magicdata,aishell3,data_aishell)。
  • PyTorch Compatibility: Good to go with PyTorch 1.9.0 and performs well on NVIDIA GPUs like Tesla T4 and GTX 2060.
    PyTorch 兼容性:与 PyTorch 1.9.0 兼容良好,并在 NVIDIA GPU,如 Tesla T4 和 GTX 2060 上表现出色。
  • Cross-Platform Functionality: Runs on Windows, Linux, and even M1 MACOS.
    跨平台功能:可在 Windows、Linux 甚至 M1 MacOS 上运行。
  • User-Friendly and High-Quality: Easy to get started with just a new synthesizer training, using a pre-trained encoder/vocoder for quality voice cloning.
    用户友好且高质量:只需新的合成器训练,即可轻松上手,使用预先训练的编码器/解码器实现高质量的声音克隆。
  • Webserver Integration: Ready to roll for online use, letting you serve up voice clones and manage them remotely.
    网络服务器集成:在线使用准备就绪,可让您提供语音克隆并远程管理它们。

MockingBird stands out for its rapid voice cloning capability, particularly for Chinese Mandarin, and its ease of use across different platforms and technologies.
MockingBird 以其快速的语音克隆能力,特别是对于中文普通话,以及在不同平台和技术上的易用性而脱颖而出。


8. Microsoft VALL-E-X 8. 微软 VALL-E-X

VALL-E-X is a Python project that implements Microsoft’s VALL-E X zero-shot TTS model, which can generate speech in any language without any training data.
VALL-E-X 是一个 Python 项目,实现了微软的 VALL-E X 零样本 TTS 模型,能够在没有任何训练数据的情况下生成任何语言的语音。

Key Features 主要特点

  • Zero-shot TTS: Can generate speech in any language without prior training data.
    零样本 TTS:无需预先训练数据,可以生成任何语言的语音。
  • In-context Learning: Adapts to new voices and languages swiftly using just a 3-second speech sample.
    上下文学习:只需 3 秒的语音样本,就能迅速适应新声音和语言。
  • High Performance: Surpasses other systems in naturalness and speaker similarity.
    高性能:在自然度和说话者相似性方面超越其他系统。
  • Emotion and Environment Preservation: Retains the original speaker’s emotion and the recording’s acoustic quality.
    情感与环境保真:保留原始说话人的情感和录音的音质。
  • Multilingual Capabilities: Enables cross-lingual synthesis and speech-to-speech translation while maintaining voice characteristics.
    多语言能力:支持跨语言合成功能和语音到语音的翻译,同时保持语音特性。

9. Pyttsx3

Pyttsx3 is a versatile text-to-speech library that works seamlessly with both Python 2 and 3. It’s a reliable tool for offline speech generation, supporting various TTS engines and allowing users to create speech without the need for an internet connection.
Pyttsx3 是一个功能多样的文本转语音库,可以无缝配合 Python 2 和 3 使用。它是一个可靠的离线语音生成工具,支持多种 TTS 引擎,让用户在没有互联网连接的情况下也能创建语音。

Key Features 主要特点

  • Offline Capability: Functions without the need for an internet connection.
    离线功能:无需互联网连接即可运行。
  • Supports Multiple TTS Engines: Compatible with Sapi5, nsss, and espeak.
    支持多种 TTS 引擎:兼容 Sapi5、nsss 和 espeak。
  • Cross-Version Support: Works with older and newer versions of Python.
    跨版本支持:兼容旧版和新版的 Python。

10. Nvidia NeMo

vidia NeMo is a powerful toolkit for those in the field of conversational AI. This Python-based project offers resources for speech recognition, synthesis, natural language processing, and more, making it an essential tool for both researchers and developers.
Vidia NeMo 是一个强大的对话 AI 工具包。这个基于 Python 的项目提供了语音识别、合成、自然语言处理等资源,对研究人员和开发者来说都是必不可少的工具。

Key Features 主要特点

  • Conversational AI Focus: Designed specifically for speech and language models.
    对话式 AI 重点:专为语音和语言模型设计。
  • Comprehensive Toolkit: Includes support for ASR, TTS, LLMs, and NLP.
    综合工具包:包括对 ASR、TTS、LLMs和 NLP 的支持。
  • Research-Friendly: Facilitates the reuse and development of conversational AI models.
    研究友好:促进对话 AI 模型的再利用和发展。
  • Pretrained Models: Offers a range of pretrained models to accelerate development.
    预训练模型:提供一系列预训练模型以加速开发。

11. DiffSinger

DiffSinger is a pioneering Python project that implements a neural model dedicated to creating synthetic singing voices. It’s designed to generate a singing voice that can be customized and controlled, offering new possibilities for digital music production.
DiffSinger 是一个开创性的 Python 项目,它实现了一个专门用于生成合成歌唱声音的神经模型。该设计旨在产生可定制和控制的歌唱声音,为数字音乐制作提供了新的可能性。

Key Features 主要特点

  • Neural Singing Voice Synthesis: Specializes in generating digital singing voices.
    神经网络歌唱语音合成:专门用于生成数字歌唱声音。
  • Model Control: Allows for fine-tuning and personalization of the synthetic voice.
    模型控制:允许对合成语音进行精细调整和个性化设置。