en_us_normalization.production.classify.multi_token.AttachedTokensFst

class en_us_normalization.production.classify.multi_token.AttachedTokensFst(cardinal: Optional[CardinalFst] = None, abbreviation: Optional[AbbreviationFst] = None, word: Optional[WordFst] = None)[source]

Attached tokens tries to deal with multi-token string which have dash as a separator or doesn’t have any separator at all. For example “look33” or “AT&T-wireless”. This FST takes advantage of the fact that boundary between some semiotic classes is fairly obvious.

Examples of input / output:

  • look33 -> tokens { name: “look” } tokens { cardinal { count: “33” } }

  • AT&T-wireless -> tokens { name: “AT and T” } tokens { name: “wireless” }

__init__(cardinal: Optional[CardinalFst] = None, abbreviation: Optional[AbbreviationFst] = None, word: Optional[WordFst] = None)[source]

constructor of transducer handling attached (merged) tokens

Parameters
cardinal: CardinalFst

a cardinal to reuse

abbreviation: AbbreviationFst

abbreviation to reuse

word: WordFst

word to reuse