learn_to_pronounce.resources.DefaultProvider

class learn_to_pronounce.resources.DefaultProvider(resources_dir, encoding='utf-8')[source]

default implementation of resources provider. If resources provider is not part of resource directory, the default provider will be used. That expects resources to have specific names and be formatted in specific way. If custom format for resources is used, custom resource provider should be implemented.

__init__(resources_dir, encoding='utf-8')[source]
Parameters
resources_dir: str

Directory with pronunciation resources (lexicon, phonemes, graphemes, etc)

LEXICON_FILE_NAME = 'lexicon'

name of the file with pronunciation dictionary

PHONEMES_FILE_NAME = 'phonemes'

name of the file with list of phonemes

TEST_WORDS = 'test_words'

name of the file with words for evaluation of pronunciation generation

TRAIN_WORDS = 'train_words'

name of the file with words to be used for training of pronunciation generation

get_graphemes() List[str][source]

AbstractProvider.get_graphemes()

get_lexicon(words: Optional[List[str]] = None) PronunciationDictionary[source]

AbstractProvider.get_lexicon()

get_phonemes() List[str][source]

AbstractProvider.get_phonemes()

get_spelling_lexicon() PronunciationDictionary[source]

AbstractProvider.get_spelling_lexicon()

get_test_words() Optional[List[str]][source]

AbstractProvider.get_test_words()

get_train_words() List[str][source]

AbstractProvider.get_train_words()

parse_lexicon(path: str, words: Optional[Iterable[str]] = None) PronunciationDictionary[source]

Helper function that parses lexicon from a file. Expected format is:

<word> <tag> <pronunciation>

Where <tag> - is optional, <pronunciation> - sequence of phonemes separated with spaces.

Parameters
path: str

path to parse lexicon from

words: Iterable[str]

list of words to include into returned PronunciationDictionary or None to include all.

Returns
pd: PronunciationDictionary

pronunciation dictionary object from pronunciation_generation

static parse_lexicon_line(line: str) Tuple[str, str, str][source]

Helper function that parses lexicon line

Parameters
line: str

line read from lexicon file within get_lexicon()

Returns
word: str

string representation of a word

tag: str

tag of the pronunciation, empty string if pronunciation variant is not tagged

phonemes: str

phonemes representing pronunciation separated by space