learn_to_pronounce.resources.DefaultProvider

class learn_to_pronounce.resources.DefaultProvider(resources_dir, encoding='utf-8')[source]

default implementation of resources provider. If resources provider is not part of resource directory, the default provider will be used. That expects resources to have specific names and be formatted in specific way. If custom format for resources is used, custom resource provider should be implemented.

__init__(resources_dir, encoding='utf-8')[source]

Parameters

resources_dir: str: Directory with pronunciation resources (lexicon, phonemes, graphemes, etc)

LEXICON_FILE_NAME = 'lexicon': name of the file with pronunciation dictionary

PHONEMES_FILE_NAME = 'phonemes': name of the file with list of phonemes

TEST_WORDS = 'test_words': name of the file with words for evaluation of pronunciation generation

TRAIN_WORDS = 'train_words': name of the file with words to be used for training of pronunciation generation

get_graphemes() → List[str][source]: AbstractProvider.get_graphemes()

get_lexicon(words: Optional[List[str]] = None) → PronunciationDictionary[source]: AbstractProvider.get_lexicon()

get_phonemes() → List[str][source]: AbstractProvider.get_phonemes()

get_spelling_lexicon() → PronunciationDictionary[source]: AbstractProvider.get_spelling_lexicon()

get_test_words() → Optional[List[str]][source]: AbstractProvider.get_test_words()

get_train_words() → List[str][source]: AbstractProvider.get_train_words()

parse_lexicon(path: str, words: Optional[Iterable[str]] = None) → PronunciationDictionary[source]

Helper function that parses lexicon from a file. Expected format is:

Where <tag> - is optional, <pronunciation> - sequence of phonemes separated with spaces.

Parameters

path: str: path to parse lexicon from
words: Iterable[str]: list of words to include into returned PronunciationDictionary or None to include all.

Returns

pd: PronunciationDictionary: pronunciation dictionary object from pronunciation_generation

static parse_lexicon_line(line: str) → Tuple[str, str, str][source]

Helper function that parses lexicon line

Parameters

line: str: line read from lexicon file within get_lexicon()

Returns

word: str: string representation of a word
tag: str: tag of the pronunciation, empty string if pronunciation variant is not tagged
phonemes: str: phonemes representing pronunciation separated by space