learn_to_pronounce.resources.AbstractProvider

class learn_to_pronounce.resources.AbstractProvider(resources_dir: str)[source]

shows what should be implemented in resources directory, so it can be used by pronunciation learning recipe.

__init__(resources_dir: str)[source]
Parameters
resources_dir: str

Directory with pronunciation resources (lexicon, phonemes, graphemes, etc)

abstract get_graphemes() List[str][source]

Getter for set of graphemes (letters) for given pronunciation resource

Returns
graphemes: List[str]

Complete set of letters for pronunciation resource. Can be derived from lexicon.

abstract get_lexicon(words: Optional[List[str]] = None) PronunciationDictionary[source]

Getter for lexicon - dictionary where pronunciation for the word can be looked up.

Parameters
words: List[str] = None

If provided, filters out all the other words from lexicon, keeping only those in the list. Is useful to read lexicon for model training only.

Returns
pd: PronunciationDictionary

parsed lexicon as PronunciationDictionary (from pronunciation_generation) object

abstract get_phonemes() List[str][source]

Getter for set of phonemes for given pronunciation resource.

Returns
phonemes: List[str]

Complete set of phonemes for pronunciation resource. If it’s not among resources, can be derived from lexicon

abstract get_spelling_lexicon() PronunciationDictionary[source]

Getter for spelling lexicon - dictionary with words being spelled letter by letter, rather than pronounced. Usually spelling lexicon is very simple, up to just pronunciations of separate letters.

Returns
sp: PronunciationDictionary

parsed spelling lexicon, similar to get_lexicon()

abstract get_test_words() Optional[List[str]][source]

Getter for list of words from lexicon (get_lexicon()) that should be used for evaluation of pronunciation generation. If not specified in resources directory - no evaluation will be carried out.

Returns
words: List[str]

list of words to be used in evaluation or None

abstract get_train_words() List[str][source]

Getter for list of words from lexicon (get_lexicon()) that should be used in training of pronunciation generation. If list is not explicitly specified in resources directory, all the words from lexicon should be used.

Returns
words: List[str]

list of words to be used in training or None