class learn_to_normalize.evaluation.google_data.ParsedUtterance[source]

A data structure that contains unnormalized and normalized tokens parsed from a Google data file. This class also contains knowledge how google data conventions map to Balacoon text_normalization formats.

add_token(tag: str, unnormalized: str, normalized: str)[source]

once a line from data file is read, add that info into currently parsed utterance

get_normalized() str[source]

getter to return normalized utterance as a single string. essentially a ground truth for text normalization. concatenates previously accumulated normalized tokens

norm: str

string with normalized utterance


getter that returns number of tokens that were added to this utterance

num: int

number of tokens added

get_unnormalized() str[source]

getter to return unnomralized utterance as a single string concatenates previously accumulated unnormalized tokens

unnorm: str

string with unnormalized utterance

has_semiotic_class(tag: str) bool[source]

checks if this utterance has particular semiotic class

tag: str

semiotic class to look for

flag: bool

True if this utterance has requested semiotic class

is_empty() bool[source]

checks if any tokens where added to the utterance

flag: bool

True if no tokens where added to this utterance