learn_to_normalize.evaluation.google_data.ParsedUtterance
- class learn_to_normalize.evaluation.google_data.ParsedUtterance[source]
A data structure that contains unnormalized and normalized tokens parsed from a Google data file. This class also contains knowledge how google data conventions map to Balacoon text_normalization formats.
- add_token(tag: str, unnormalized: str, normalized: str)[source]
once a line from data file is read, add that info into currently parsed utterance
- get_normalized() str [source]
getter to return normalized utterance as a single string. essentially a ground truth for text normalization. concatenates previously accumulated normalized tokens
- Returns
- norm: str
string with normalized utterance
- get_tokens_num()[source]
getter that returns number of tokens that were added to this utterance
- Returns
- num: int
number of tokens added
- get_unnormalized() str [source]
getter to return unnomralized utterance as a single string concatenates previously accumulated unnormalized tokens
- Returns
- unnorm: str
string with unnormalized utterance