learn_to_pronounce.resources.DefaultProvider
- class learn_to_pronounce.resources.DefaultProvider(resources_dir, encoding='utf-8')[source]
default implementation of resources provider. If resources provider is not part of resource directory, the default provider will be used. That expects resources to have specific names and be formatted in specific way. If custom format for resources is used, custom resource provider should be implemented.
- __init__(resources_dir, encoding='utf-8')[source]
- Parameters
- resources_dir: str
Directory with pronunciation resources (lexicon, phonemes, graphemes, etc)
- LEXICON_FILE_NAME = 'lexicon'
name of the file with pronunciation dictionary
- PHONEMES_FILE_NAME = 'phonemes'
name of the file with list of phonemes
- TEST_WORDS = 'test_words'
name of the file with words for evaluation of pronunciation generation
- TRAIN_WORDS = 'train_words'
name of the file with words to be used for training of pronunciation generation
- parse_lexicon(path: str, words: Optional[Iterable[str]] = None) PronunciationDictionary [source]
Helper function that parses lexicon from a file. Expected format is:
<word> <tag> <pronunciation>
Where <tag> - is optional, <pronunciation> - sequence of phonemes separated with spaces.
- Parameters
- path: str
path to parse lexicon from
- words: Iterable[str]
list of words to include into returned PronunciationDictionary or None to include all.
- Returns
- pd: PronunciationDictionary
pronunciation dictionary object from pronunciation_generation
- static parse_lexicon_line(line: str) Tuple[str, str, str] [source]
Helper function that parses lexicon line
- Parameters
- line: str
line read from lexicon file within
get_lexicon()
- Returns
- word: str
string representation of a word
- tag: str
tag of the pronunciation, empty string if pronunciation variant is not tagged
- phonemes: str
phonemes representing pronunciation separated by space