learn_to_pronounce.resources.AbstractProvider

class learn_to_pronounce.resources.AbstractProvider(resources_dir: str)[source]

shows what should be implemented in resources directory, so it can be used by pronunciation learning recipe.

__init__(resources_dir: str)[source]

Parameters

resources_dir: str: Directory with pronunciation resources (lexicon, phonemes, graphemes, etc)

abstract get_graphemes() → List[str][source]

Getter for set of graphemes (letters) for given pronunciation resource

Returns

graphemes: List[str]: Complete set of letters for pronunciation resource. Can be derived from lexicon.

abstract get_lexicon(words: Optional[List[str]] = None) → PronunciationDictionary[source]

Getter for lexicon - dictionary where pronunciation for the word can be looked up.

Parameters

words: List[str] = None: If provided, filters out all the other words from lexicon, keeping only those in the list. Is useful to read lexicon for model training only.

Returns

pd: PronunciationDictionary: parsed lexicon as PronunciationDictionary (from pronunciation_generation) object

abstract get_phonemes() → List[str][source]

Getter for set of phonemes for given pronunciation resource.

Returns

phonemes: List[str]: Complete set of phonemes for pronunciation resource. If it’s not among resources, can be derived from lexicon

abstract get_spelling_lexicon() → PronunciationDictionary[source]

Getter for spelling lexicon - dictionary with words being spelled letter by letter, rather than pronounced. Usually spelling lexicon is very simple, up to just pronunciations of separate letters.

Returns

sp: PronunciationDictionary: parsed spelling lexicon, similar to get_lexicon()

abstract get_test_words() → Optional[List[str]][source]

Getter for list of words from lexicon (get_lexicon()) that should be used for evaluation of pronunciation generation. If not specified in resources directory - no evaluation will be carried out.

Returns

words: List[str]: list of words to be used in evaluation or None

abstract get_train_words() → List[str][source]

Getter for list of words from lexicon (get_lexicon()) that should be used in training of pronunciation generation. If list is not explicitly specified in resources directory, all the words from lexicon should be used.

Returns

words: List[str]: list of words to be used in training or None