learn_to_normalize.evaluation.google_data.ParsedUtterance

class learn_to_normalize.evaluation.google_data.ParsedUtterance[source]

A data structure that contains unnormalized and normalized tokens parsed from a Google data file. This class also contains knowledge how google data conventions map to Balacoon text_normalization formats.

__init__()[source]

add_token(tag: str, unnormalized: str, normalized: str)[source]: once a line from data file is read, add that info into currently parsed utterance

get_normalized() → str[source]

getter to return normalized utterance as a single string. essentially a ground truth for text normalization. concatenates previously accumulated normalized tokens

Returns

norm: str: string with normalized utterance

get_tokens_num()[source]

getter that returns number of tokens that were added to this utterance

Returns

num: int: number of tokens added

get_unnormalized() → str[source]

getter to return unnomralized utterance as a single string concatenates previously accumulated unnormalized tokens

Returns

unnorm: str: string with unnormalized utterance

has_semiotic_class(tag: str) → bool[source]

checks if this utterance has particular semiotic class

Parameters

tag: str: semiotic class to look for

Returns

flag: bool: True if this utterance has requested semiotic class

is_empty() → bool[source]

checks if any tokens where added to the utterance

Returns

flag: bool: True if no tokens where added to this utterance