en_us_normalization.production.classify.multi_token.AttachedTokensFst

class en_us_normalization.production.classify.multi_token.AttachedTokensFst(cardinal: Optional[CardinalFst] = None, abbreviation: Optional[AbbreviationFst] = None, word: Optional[WordFst] = None)[source]

Attached tokens tries to deal with multi-token string which have dash as a separator or doesn’t have any separator at all. For example “look33” or “AT&T-wireless”. This FST takes advantage of the fact that boundary between some semiotic classes is fairly obvious.

Examples of input / output:

look33 -> tokens { name: “look” } tokens { cardinal { count: “33” } }
AT&T-wireless -> tokens { name: “AT and T” } tokens { name: “wireless” }

__init__(cardinal: Optional[CardinalFst] = None, abbreviation: Optional[AbbreviationFst] = None, word: Optional[WordFst] = None)[source]

constructor of transducer handling attached (merged) tokens

Parameters

cardinal: CardinalFst: a cardinal to reuse
abbreviation: AbbreviationFst: abbreviation to reuse
word: WordFst: word to reuse