claudio.uk

📚 Convert E-books into audiobooks with Kokoro
📚 使用 Kokoro 將電子書轉換成有聲書

Posted on 14 Jan 2025 by Claudio Santini
發布於 2025 年 1 月 14 日，作者：Claudio Santini

Kokoro v0.19 is a recently published text-to-speech model with just 82M params and very high-quality output. It's released under Apache licence and was trained on <100 hours of audio. It currently supports american, british english, french, korean, japanese and mandarin, in a bunch of very good voices.
Kokoro v0.19 是一款近期發表的文字轉語音模型，僅有 8200 萬個參數，卻擁有極高品質的輸出。它採用 Apache 授權釋出，訓練資料少於 100 小時的音訊。目前支援美式英語、英式英語、法語、韓語、日語和普通話，並提供多種優質語音。

An example of the quality:
音質範例：

I've always dreamed of converting my ebook library into audiobooks. Especially for those niche books that you cannot find in audiobook format. Since Kokoro is pretty fast, I thought this may finally be doable. I've created a small tool called Audiblez (in honor of the popular audiobook platform) that parses .epub files and converts the body of the book into nicely narrated audio files.
我一直夢想著將我的電子書庫轉換成有聲書，尤其是那些在有聲書格式中找不到的冷門書籍。由於 Kokoro 運算速度很快，我想這終於可以實現了。我創建了一個小型工具，稱為 Audiblez（向熱門有聲書平台致敬），它可以解析 .epub 檔案，並將書籍正文轉換成語音清晰的有聲檔案。

On my M2 MacBook Pro, it takes about 2 hours to convert to mp3 the Selfish Gene by Richard Dawkins, which is about 100,000 words (or 600,000 characters), at a rate of about 80 characters per second.
在我的 M2 MacBook Pro 上，將理查·道金斯的《自私的基因》（約 10 萬字，或 60 萬字元）轉換成 mp3 約需 2 小時，速度約為每秒 80 個字元。

How to install and run
如何安裝和執行

If you have Python 3 on your computer, you can install it with pip. Be aware that it won't work with Python 3.13.
如果你電腦上有 Python 3，你可以用 pip 安裝它。請注意，它不支援 Python 3.13。

Then you also need to download a couple of additional files in the same folder, which are about ~360MB:
接著，您還需要下載幾個約 ~360MB 的額外檔案到同一個資料夾中：

pip install audiblez
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx
wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json

Then, to convert an epub file into an audiobook, just run:
接著，要將 epub 檔案轉換成有聲書，只需執行：

audiblez book.epub -l en-gb -v af_sky

It will first create a bunch of book_chapter_1.wav, book_chapter_2.wav, etc. files in the same directory, and at the end it will produce a book.m4b file with the whole book you can listen with VLC or any audiobook player. It will only produce the .m4b file if you have ffmpeg installed on your machine.
它會先在同一個目錄中創建一堆 book_chapter_1.wav 、 book_chapter_2.wav 等檔案，最後會產生一個 book.m4b 檔案，裡面包含整本書，你可以用 VLC 或任何有聲書播放器收聽。只有在你電腦上安裝了 ffmpeg 的情況下，才會產生 .m4b 檔案。

Supported Languages 支援語言

Use -l option to specify the language, available language codes are: 🇺🇸 en-us, 🇬🇧 en-gb, 🇫🇷 fr-fr, 🇯🇵 ja, 🇰🇷 kr and 🇨🇳 cmn.
使用 -l 選項指定語言，可用的語言代碼為：🇺🇸 en-us 、🇬🇧 en-gb 、🇫🇷 fr-fr 、🇯🇵 ja 、🇰🇷 kr 和 🇨🇳 cmn 。

Supported Voices 支援語音

Use -v option to specify the voice: available voices are af, af_bella, af_nicole, af_sarah, af_sky, am_adam, am_michael, bf_emma, bf_isabella, bm_george, bm_lewis. You can try them here: https://huggingface.co/spaces/hexgrad/Kokoro-TTS
使用 -v 選項指定語音：可用的語音有 af 、 af_bella 、 af_nicole 、 af_sarah 、 af_sky 、 am_adam 、 am_michael 、 bf_emma 、 bf_isabella 、 bm_george 、 bm_lewis 。你可以在這裡試聽：https://huggingface.co/spaces/hexgrad/Kokoro-TTS

Chapter Detection 章節偵測

Chapter detection is a bit janky, but it manages to find the core chapters in most .epub I tried, skipping the cover, index, appendix etc.
章節偵測有點粗糙，但它成功地在大部分我嘗試過的 .epub 檔案中找到了主要的章節，跳過了封面、索引、附錄等。
If you find it doesn't include the chapter you are interested into, try to play with the is_chapter function in the code. Often it skips the preface or intro, and I'm not sure if it's a bug or a feature.
如果你發現它沒有包含你感興趣的章節，可以嘗試修改程式碼中的 is_chapter 函數。它經常會跳過前言或引言，我不確定這是個錯誤還是功能。

Source 原始碼

See Audiblez project on GitHub.
查看 GitHub 上的 Audiblez 專案。

There are still some rough edges, but it works well enough for me. Future improvements could include:
還有些地方不夠完善，不過對我來說已經夠用了。未來的改進可以包含：

Better chapter detection, or allow users to include/exclude chapters.
更好的章節偵測功能，或是允許使用者自行包含/排除章節。
Add chapter navigation to m4b file (that looks hard, cause ffmpeg doesn't do it)
為 m4b 檔案加入章節導覽功能（這看起來很難，因為 ffmpeg 無法做到）。
Add narration for images using some image-to-text model
使用影像轉文字模型為圖片加入旁白。

Code is short enough to be included here:
程式碼簡短到可以放在這裡：

#!/usr/bin/env python3
# audiblez - A program to convert e-books into audiobooks using
# Kokoro-82M model for high-quality text-to-speech synthesis.
# by Claudio Santini 2025 - https://claudio.uk

import argparse
import sys
import time
import shutil
import subprocess
import soundfile as sf
import ebooklib
import warnings
import re
from pathlib import Path
from string import Formatter
from bs4 import BeautifulSoup
from kokoro_onnx import Kokoro
from ebooklib import epub
from pydub import AudioSegment


def main(kokoro, file_path, lang, voice):
    filename = Path(file_path).name
    with warnings.catch_warnings():
        book = epub.read_epub(file_path)
    title = book.get_metadata('DC', 'title')[0][0]
    creator = book.get_metadata('DC', 'creator')[0][0]
    intro = f'{title} by {creator}'
    print(intro)
    chapters = find_chapters(book)
    print('Found chapters:', [c.get_name() for c in chapters])
    texts = extract_texts(chapters)
    has_ffmpeg = shutil.which('ffmpeg') is not None
    if not has_ffmpeg:
        print('\033[91m' + 'ffmpeg not found. Please install ffmpeg to create mp3 and m4b audiobook files.' + '\033[0m')
    total_chars = sum([len(t) for t in texts])
    print('Started at:', time.strftime('%H:%M:%S'))
    print(f'Total characters: {total_chars:,}')
    print('Total words:', len(' '.join(texts).split(' ')))

    i = 1
    chapter_mp3_files = []
    for text in texts:
        chapter_filename = filename.replace('.epub', f'_chapter_{i}.wav')
        chapter_mp3_files.append(chapter_filename)
        if Path(chapter_filename).exists():
            print(f'File for chapter {i} already exists. Skipping')
            i += 1
            continue
        print(f'Reading chapter {i} ({len(text):,} characters)...')
        if i == 1:
            text = intro + '.\n\n' + text
        start_time = time.time()
        samples, sample_rate = kokoro.create(text, voice=voice, speed=1.0, lang=lang)
        sf.write(f'{chapter_filename}', samples, sample_rate)
        end_time = time.time()
        delta_seconds = end_time - start_time
        chars_per_sec = len(text) / delta_seconds
        remaining_chars = sum([len(t) for t in texts[i - 1:]])
        remaining_time = remaining_chars / chars_per_sec
        print(f'Estimated time remaining: {strfdelta(remaining_time)}')
        print('Chapter written to', chapter_filename)
        print(f'Chapter {i} read in {delta_seconds:.2f} seconds ({chars_per_sec:.0f} characters per second)')
        progress = int((total_chars - remaining_chars) / total_chars * 100)
        print('Progress:', f'{progress}%')
        i += 1
    if has_ffmpeg:
        create_m4b(chapter_mp3_files, filename)


def extract_texts(chapters):
    texts = []
    for chapter in chapters:
        xml = chapter.get_body_content()
        soup = BeautifulSoup(xml, features='lxml')
        chapter_text = ''
        html_content_tags = ['title', 'p', 'h1', 'h2', 'h3', 'h4']
        for child in soup.find_all(html_content_tags):
            inner_text = child.text.strip() if child.text else ""
            if inner_text:
                chapter_text += inner_text + '\n'
        texts.append(chapter_text)
    return texts


def is_chapter(c):
    name = c.get_name().lower()
    part = r"part\d{1,3}"
    if re.search(part, name):
        return True
    ch = r"ch\d{1,3}"
    if re.search(ch, name):
        return True
    if 'chapter' in name:
        return True


def find_chapters(book, verbose=True):
    chapters = [c for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT and is_chapter(c)]
    if verbose:
        for item in book.get_items():
            if item.get_type() == ebooklib.ITEM_DOCUMENT:
                # print(f"'{item.get_name()}'" + ', #' + str(len(item.get_body_content())))
                print(f'{item.get_name()}'.ljust(60), str(len(item.get_body_content())).ljust(15), 'X' if item in chapters else '-')
    if len(chapters) == 0:
        print('Not easy to find the chapters, defaulting to all available documents.')
        chapters = [c for c in book.get_items() if c.get_type() == ebooklib.ITEM_DOCUMENT]
    return chapters


def strfdelta(tdelta, fmt='{D:02}d {H:02}h {M:02}m {S:02}s'):
    remainder = int(tdelta)
    f = Formatter()
    desired_fields = [field_tuple[1] for field_tuple in f.parse(fmt)]
    possible_fields = ('W', 'D', 'H', 'M', 'S')
    constants = {'W': 604800, 'D': 86400, 'H': 3600, 'M': 60, 'S': 1}
    values = {}
    for field in possible_fields:
        if field in desired_fields and field in constants:
            values[field], remainder = divmod(remainder, constants[field])
    return f.format(fmt, **values)


def create_m4b(chaptfer_files, filename):
    tmp_filename = filename.replace('.epub', '.tmp.m4a')
    if not Path(tmp_filename).exists():
        combined_audio = AudioSegment.empty()
        for wav_file in chaptfer_files:
            audio = AudioSegment.from_wav(wav_file)
            combined_audio += audio
        print('Converting to Mp4...')
        combined_audio.export(tmp_filename, format="mp4", codec="aac", bitrate="64k")
    final_filename = filename.replace('.epub', '.m4b')
    print('Creating M4B file...')
    proc = subprocess.run(['ffmpeg', '-i', f'{tmp_filename}', '-c', 'copy', '-f', 'mp4', f'{final_filename}'])
    Path(tmp_filename).unlink()
    if proc.returncode == 0:
        print(f'{final_filename} created. Enjoy your audiobook.')
        print('Feel free to delete the intermediary .wav chapter files, the .m4b is all you need.')


def cli_main():
    if not Path('kokoro-v0_19.onnx').exists() or not Path('voices.json').exists():
        print('Error: kokoro-v0_19.onnx and voices.json must be in the current directory. Please download them with:')
        print('wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx')
        print('wget https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json')
        sys.exit(1)
    kokoro = Kokoro('kokoro-v0_19.onnx', 'voices.json')
    voices = list(kokoro.get_voices())
    voices_str = ', '.join(voices)
    epilog = 'example:\n' + \
             '  audiblez book.epub -l en-us -v af_sky'
    default_voice = 'af_sky' if 'af_sky' in voices else voices[0]
    parser = argparse.ArgumentParser(epilog=epilog, formatter_class=argparse.RawDescriptionHelpFormatter)
    parser.add_argument('epub_file_path', help='Path to the epub file')
    parser.add_argument('-l', '--lang', default='en-gb', help='Language code: en-gb, en-us, fr-fr, ja, ko, cmn')
    parser.add_argument('-v', '--voice', default=default_voice, help=f'Choose narrating voice: {voices_str}')
    if len(sys.argv) == 1:
        parser.print_help(sys.stderr)
        sys.exit(1)
    args = parser.parse_args()
    main(kokoro, args.epub_file_path, args.lang, args.voice)


if __name__ == '__main__':
    cli_main()