en_us_normalization.production.classify.RomanFst
- class en_us_normalization.production.classify.RomanFst(cardinal: Optional[CardinalFst] = None)[source]
Finite state transducer for classifying romans (III, IV, etc). In order to convert roman numbers, mappings from data files are used:
roman/digit_teen.tsv - contains mapping for numbers from 1 to 49.
roman/ties.tsv - contains mapping for dozens, i.e. 50, 60, …
roman/hunderds.tsv - contains mapping for hundreds, i.e. 100, 200, …
Roman transducer reuses cardinal transducer to accept digits. Depending on the context, specifically predecessing word, it should be possible to define if the roman digit is cardinal or ordinal.
roman/cardinal_prefixes.tsv - contains cardinal prefixes, such as “Chapter”
roman/ordinal_prefixes.tsv - contains ordinal prefixes, such as “George”
In case roman number doesn’t have a known prefix, i.e. standalone roman number, it should be treated carefully. Typical mistakes:
roman number can be confused with abbreviation
roman number that consists of a single character, such as “I”.
“XXX” - denotes pornographic materials, should have bigger weight
Examples of transducer input/output:
IV -> roman { cardinal { count: “4” } }
George I -> roman { prefix: “george” ordinal { order: “1” } }
CHAPTER XIX -> roman { prefix: “chapter” cardinal { count: “1” } }
- __init__(cardinal: Optional[CardinalFst] = None)[source]
cosntructor for roman numbers transducer
- Parameters
- cardinal: CardinalFst
transducer for cardinal numbers to reuse. if not provided, will be created from scratch