en_us_normalization.production.classify.multi_token.AttachedTokensFst
- class en_us_normalization.production.classify.multi_token.AttachedTokensFst(cardinal: Optional[CardinalFst] = None, abbreviation: Optional[AbbreviationFst] = None, word: Optional[WordFst] = None)[source]
Attached tokens tries to deal with multi-token string which have dash as a separator or doesn’t have any separator at all. For example “look33” or “AT&T-wireless”. This FST takes advantage of the fact that boundary between some semiotic classes is fairly obvious.
Examples of input / output:
look33 -> tokens { name: “look” } tokens { cardinal { count: “33” } }
AT&T-wireless -> tokens { name: “AT and T” } tokens { name: “wireless” }
- __init__(cardinal: Optional[CardinalFst] = None, abbreviation: Optional[AbbreviationFst] = None, word: Optional[WordFst] = None)[source]
constructor of transducer handling attached (merged) tokens
- Parameters
- cardinal: CardinalFst
a cardinal to reuse
- abbreviation: AbbreviationFst
abbreviation to reuse
- word: WordFst
word to reuse