Balacoon TTS self-hosted service
Balacoon offers a Docker image for Text-to-Speech endpoints, designed to run on NVIDIA GPU machines. This pre-built image enables fast synthesis with minimal latency. Batching is utilized to process multiple synthesis requests concurrently, maximizing hardware resource utilization.
Getting balacoon/tts_server image
Pull the image from Docker Hub. Follow up with NVIDIA docs for GPU and driver requirements.
docker pull balacoon/tts_server:0.2
Starting the endpoint
Similar to synthesis with balacoon_tts package, you will need a single addon file with models and resources required for synthesis. Search for those, marked with _gpu suffix at HuggingFace Hub.
# get the addon compiled for GPU
wget https://huggingface.co/balacoon/tts/resolve/main/en_us_cmartic_jets_gpu.addon
Start a container from the previously pulled image:
docker run --gpus all -it --rm -e CUDA_VISIBLE_DEVICES=0 --network host -v $PWD:/workspace \
balacoon/tts_server:0.1 balacoon_tts_server 0.0.0.0 3333 16 16 /workspace/en_us_cmartic_jets_gpu.addon
Where balacoon_tts_server
is a precompiled binary which takes following positional arguments:
- ip-address, we use non-routable meta-address for the host in this example.
- port on which the service is accessible.
- threads, number of threads that process input requests in parallel.
- batch-size, maximum batch size, during batching of input requests.
- addon-path, location of the addon with models and resources.
Playing with number of threads and batch-size allows you to balance latency and hardware utilization. Too high batch-size can cause Out-Of-Memory however, so you have to tune it for resources available.
WARNING: several first requests will be taking unusually long due to so called “warming up”, which happens under the hood for each new batch size observed.
Sending synthesis request to the endpoint
TTS endpoint that you started is a websocket server, which gets json with a text as a request, and sends back a raw waveform. An example how to send a request using python:
import json
import asyncio
import websockets
async def run_request(text: str, out_path: str):
"""
Sends text for synthesis and saves produced audio into a file
"""
async with websockets.connect("ws://localhost:3333") as websocket:
request = json.dumps({"text": text, "speaker": "slt"})
await websocket.send(request)
with open(out_path, "wb") as fp:
while True:
try:
data = await websocket.recv()
except websockets.exceptions.ConnectionClosed:
break
if data is None:
break
fp.write(data)
# it will write raw waveform to a file
# to convert it to wav, you can use sox:
# sox -r 24000 -c 1 -b 16 -e signed-integer output.raw output.wav
asyncio.run(run_request('hello world', 'output.raw'))