learn_to_normalize.grammar_utils.GrammarLoader

class learn_to_normalize.grammar_utils.GrammarLoader(grammars_dir: str)[source]

Loads normalization grammars from directory of specific structure

__init__(grammars_dir: str)[source]
get_configs() Tuple[str, str, str][source]

Loads configurations required by text_normalization

Returns
configs: Tuple[str, str, str]

Loaded proto configurations as strings. There are 3 configurations required by text_normalization package: tokenizer configuration - defines name of the grammar and main rule verbalizer configuration - defines name of grammar and main rule verbalizer serialization specification - fields of tokenized semiotic classes

get_grammar(module_str: str, class_name: str) BaseFst[source]

Loads grammar from grammar dir based on module name and class name of the grammar

Returns
grammar: BaseFST

grammar loaded by the name and initialized

get_tokenizer(work_dir: str) bytes[source]

Exports tokenizer/classifier, stores FAR on disk, returns serialized FAR

Parameters
work_dir: str

directory to store tokenizer FAR to

Returns
res: bytes

serialized tokenizer

get_verbalizer(work_dir: str) bytes[source]

Exports verbalizer, stores FAR on disk, returns serialized FAR

Parameters
work_dir: str

directory to store verbalizer FAR to

Returns
res: bytes

serialized verbalizer