In order to build pronunciation addon:
get the repo
git clone firstname.lastname@example.org:balacoon/learn_to_pronounce.git
build docker that manages all the dependencies
# if "build-fe" is specified, balacoon_frontend # is built from sources. You need special access for it # which you likely dont have. bash docker/build.sh [--build-fe]
get pronunciation resources. Adjust those if needed, but don’t forget to share changes as a contribution. In order to promote multi-linguality, a unified phoneme set is used by the balacoon. You can find more information on decisions made in the post. If you want to build a pronunciation generation for a new lexicon, you would need to perform mapping into Balacoon unified phoneme set. Check info on mapping of CMUDict as an example.
# resources are stored as submodules, pick one you need # from resources dir git submodule update --init resources/en_us_pronunciation/
launch docker and execute addon creation (includes lexicon packing, FST-based pronunciation generation training). To takes some time to run the training. At the end evaluation on withheld words is executed (if test_words are specified in resourced directory). Accuracy of pronunciation generation heavily depends on the language.
# script is really simple shortcut to start container. Adjust it # if needed bash docker/run.sh # check that everything works on a toy lexicon. learn_to_pronounce --locale en_us --work-dir toy_work_dir \ --resources resources/en_us_pronunciation/toy/ # if everything finishes without errors, time build complete addon. # check arguments of learn_to_pronounce to learn more on usage. learn_to_pronounce --locale en_us --out en_us_pronunciation.addon \ --resources resources/en_us_pronunciation/cmudict
learn_to_pronounce contains interactive demos that showcase how to use obtained artifacts.
# generating pronunciation with trained fst: demo_fst --fst work_dir/pronunciation.fst # using whole addon: looks up word in lexicon, if not found # generates pronunciation with FST-based model. # additionally, can spell words letter-by-letter demo_pronounce --addon en_us_pronunciation.addon [--spelling]