en_us_normalization.production.classify.DateFst
- class en_us_normalization.production.classify.DateFst[source]
Finite state transducer for classifying dates. Dates can be written in a lot of different ways:
conventional date in a form of 2012/12/12. Multiple separators are possible: “/”, “-”, “.” There might be a confusion which one is which, especially if year is written with two digits. There are few things that can help with proper identification: months are in range 01-12, days are in range 1-31. In case of ambiguity proceed with MDY format, but this should be locale-dependent.
written date in a form of jan. 5, 2012. Here the order may wary. Depending on the order, specific style should be used (see configs/verbalizer_serialization_spec.ascii_proto). Few tricks for this date format: 1) day can be ordinal, i.e. have suffix “th”, “st”, etc.; 2) year can have era attached i.e. “960 BC”. In this format, some fields are optional: i.e. year may be missing, day can be missing.
stand-alone years - very tricky to detect, because it requires context to understand that the number is a year. Fortunately transducing year as a cardinal wouldn’t be a huge deal. however for modern years, such as 1995 or 2012, its better to tag those as years and verbalize in blocks of 2 digits.
decades - when whole decade is meant, which is marked with “s” at the end. year has to end with “0”. optionally first two digits of the year are ommitted and replaced with apostroph. Possible examples: 1960s or ’60s. Era field is reused to mark decades, not to introduce separate field
Examples of the date normalization:
jan. 5, 2012 -> date { month: “january” day: “5” year: “2012” }
jan. 5 -> date { month: “january” day: “5” }
jan. 5th -> date { month: “january” day: “5” }
5 january 2012 -> date { day: “5” month: “january” year: “2012” style_spec_name: “dmy” }
5 january 960 B.C. -> date { day: “5” month: “january” year: “960” era: “BC” style_spec_name: “dmy” }
2012-01-05 -> date { year: “2012” month: “january” day: “5” style_spec_name: “dmy” }
2012.01.05 -> date { year: “2012” month: “january” day: “5” style_spec_name: “dmy” }
2012/01/05 -> date { year: “2012” month: “january” day: “5” style_spec_name: “dmy” }
12/01/05 -> date { year: “12” month: “january” day: “5” style_spec_name: “dmy” }
2012 -> date { year: “2012” }
1960s -> date { year: “1960” era: “s” }
’60s -> date { year: “60” era: “s” }