debug_standardize_address
Name
debug_standardize_address — Returns a json formatted text listing the parse tokens and standardizations
Synopsis
    
     text
     
      debug_standardize_address
     
     (
    
    text
    
     lextab
    
    , text
    
     gaztab
    
    , text
    
     rultab
    
    , text
    
     micro
    
    , text
    
     macro=NULL
    
    
     )
    
    ;
   
Description
This is a function for debugging address standardizer rules and lex/gaz mappings. It returns a json formatted text that includes the matching rules, mapping of tokens, and best standardized address stdaddr form of an input address utilizing lex table table name, gaz table , and rules table table names and an address.
   For single line addresses use just
   
    micro
   
  
   For two line address A
   
    micro
   
   consisting of standard first line of postal address e.g.
   
    house_num street
   
   , and a macro consisting of standard postal second line of an address e.g
   
    city, state postal_code country
   
   .
  
Elements returned in the json document are
- input_tokens
 - 
     
For each word in the input address, returns the position of the word, token categorization of the word, and the standard word it is mapped to. Note that for some input words, you might get back multiple records because some inputs can be categorized as more than one thing.
 - rules
 - 
     
The set of rules matching the input and the corresponding score for each. The first rule (highest scoring) is what is used for standardization
 - stdaddr
 - 
     
The standardized address elements stdaddr that would be returned when running standardize_address
 
Availability: 3.4.0
   
    
   
   This method needs address_standardizer extension.
  
Examples
Using address_standardizer_data_us extension
CREATE EXTENSION address_standardizer_data_us; -- only needs to be done once
Variant 1: Single line address and returning the input tokens
SELECT it->>'pos' AS position, it->>'word' AS word, it->>'stdword' AS standardized_word,
            it->>'token' AS token, it->>'token-code' AS token_code
    FROM jsonb(
            debug_standardize_address('us_lex',
                'us_gaz', 'us_rules', 'One Devonshire Place, PH 301, Boston, MA 02109')
                 ) AS s, jsonb_array_elements(s->'input_tokens') AS it;
  position | word | standardized_word | token | token_code ----------+------------+-------------------+--------+------------ 0 | ONE | 1 | NUMBER | 0 0 | ONE | 1 | WORD | 1 1 | DEVONSHIRE | DEVONSHIRE | WORD | 1 2 | PLACE | PLACE | TYPE | 2 3 | PH | PATH | TYPE | 2 3 | PH | PENTHOUSE | UNITT | 17 4 | 301 | 301 | NUMBER | 0 (7 rows)
Variant 2: Multi line address and returning first rule input mappings and score
SELECT (s->'rules'->0->>'score')::numeric AS score, it->>'pos' AS position,
        it->>'input-word' AS word, it->>'input-token' AS input_token, it->>'mapped-word' AS standardized_word,
            it->>'output-token' AS output_token
    FROM jsonb(
            debug_standardize_address('us_lex',
                'us_gaz', 'us_rules', 'One Devonshire Place, PH 301', 'Boston, MA 02109')
                 ) AS s, jsonb_array_elements(s->'rules'->0->'rule_tokens') AS it;
  score | position | word | input_token | standardized_word | output_token ----------+----------+------------+-------------+-------------------+-------------- 0.876250 | 0 | ONE | NUMBER | 1 | HOUSE 0.876250 | 1 | DEVONSHIRE | WORD | DEVONSHIRE | STREET 0.876250 | 2 | PLACE | TYPE | PLACE | SUFTYP 0.876250 | 3 | PH | UNITT | PENTHOUSE | UNITT 0.876250 | 4 | 301 | NUMBER | 301 | UNITT (5 rows)