en_us_normalization.production.classify.DecimalFst
- class en_us_normalization.production.classify.DecimalFst(cardinal: Optional[CardinalFst] = None)[source]
Finite state transducer for classifying decimal, i.e. numbers with fractional part. There are 3 options to accept in fst:
both integer and fractional part are present, for ex. “12.5006”
only fractional part is present, for ex. “.35”
only integer part is present, for ex. “12”. This one can be handled by cardinal semiotic class, but it is kept in decimal as well, since decimal can be a part of composite semiotic class, such as measure
Integer part of decimal - can be any cardinal or a single “0” for cases such as “0.5” Fractional part can be any sequence of digits after the dot
Optionally decimal can have quantity after the number. There are two options: full form (for ex. “12 thousands”) or short version (for ex. “12k”). Supported quantities are stored in data/magnitudes.tsv
Examples for decimals and their tagging:
-12.5006 -> decimal { negative: “true” integer_part: “12” fractional_part: “5006” }
13k -> decimal { integer_part: “13” quantity: “thousands” }
TODO: add handling of abbreviated quantities, for ex. .5B -> decimal { fractional_part: “5” quantity: “billion” }
- __init__(cardinal: Optional[CardinalFst] = None)[source]
constructor for decimal fst
- Parameters
- cardinal: CardinalFst
a cardinal fst to reuse digits fst from it. If not provided, will be initialized from scratch.