class en_us_normalization.production.classify.AddressFst[source]

Finite state transducer for classifying address. Address consists of multiple slots, most of which are optional. Those slots are:

  1. house number - mandatory

  2. street, consisting of name, type (road, street, square, etc), pre- or post- directional (N for north) - mandatory

  3. suite - apartment or house number, consists of type and number (for ex. Apt #23) - optional

  4. town - possibly multi-word town (for ex. San-Francisco) - optional

  5. state - usually abbreviated state (for ex. CA) - optional

  6. zip-code - 5-digit number with optional dash-separated 4-digits extension (for ex. 45149-3214). Another option for zip-code is british format zip code, such as “SW1W 0NY”. That one consitst of outcode and incode separated by space - optional

Examples of addresses and their parsing:

  • 1599 Curabitur Rd. Bandera South Dakota 45149 -> address { house: “1599” street_name: “Curabitur” street_type: “road” town: “Bandera” state: “South Dakota” zip: “45149”}

  • 123 N Malanyuka St. SE, Apt #23 San-Francisco CA 45149-3214 -> address { house: “123” pre_directional: “north” street_name: “Malanyuka” street_type: “street” post_directional: “south east” suite_type: “apartment” suite_number: “23” town: “San Francisco” state: “california” zip: “451493214”}