learn_to_normalize.evaluation.google_data.ParsedUtterance

class learn_to_normalize.evaluation.google_data.ParsedUtterance[source]

A data structure that contains unnormalized and normalized tokens parsed from a Google data file. This class also contains knowledge how google data conventions map to Balacoon text_normalization formats.

__init__()[source]
add_token(tag: str, unnormalized: str, normalized: str)[source]

once a line from data file is read, add that info into currently parsed utterance

get_normalized() str[source]

getter to return normalized utterance as a single string. essentially a ground truth for text normalization. concatenates previously accumulated normalized tokens

Returns
norm: str

string with normalized utterance

get_tokens_num()[source]

getter that returns number of tokens that were added to this utterance

Returns
num: int

number of tokens added

get_unnormalized() str[source]

getter to return unnomralized utterance as a single string concatenates previously accumulated unnormalized tokens

Returns
unnorm: str

string with unnormalized utterance

has_semiotic_class(tag: str) bool[source]

checks if this utterance has particular semiotic class

Parameters
tag: str

semiotic class to look for

Returns
flag: bool

True if this utterance has requested semiotic class

is_empty() bool[source]

checks if any tokens where added to the utterance

Returns
flag: bool

True if no tokens where added to this utterance