ref/archive

ref/archive

Loading From an Archive

This command instructs pgloader to load data from one or more files contained in an archive. Currently the only supported archive format is ZIP, and the archive might be downloaded from an HTTP URL.

Here’s an example:

LOAD ARCHIVE
   FROM /Users/dim/Downloads/GeoLiteCity-latest.zip
   INTO postgresql:///ip4r

   BEFORE LOAD
     DO $$ create extension if not exists ip4r; $$,
        $$ create schema if not exists geolite; $$,

     EXECUTE 'geolite.sql'

   LOAD CSV
        FROM FILENAME MATCHING ~/GeoLiteCity-Location.csv/
             WITH ENCODING iso-8859-1
             (
                locId,
                country,
                region     null if blanks,
                city       null if blanks,
                postalCode null if blanks,
                latitude,
                longitude,
                metroCode  null if blanks,
                areaCode   null if blanks
             )
        INTO postgresql:///ip4r?geolite.location
             (
                locid,country,region,city,postalCode,
                location point using (format nil "(~a,~a)" longitude latitude),
                metroCode,areaCode
             )
        WITH skip header = 2,
             fields optionally enclosed by '"',
             fields escaped by double-quote,
             fields terminated by ','

  AND LOAD CSV
        FROM FILENAME MATCHING ~/GeoLiteCity-Blocks.csv/
             WITH ENCODING iso-8859-1
             (
                startIpNum, endIpNum, locId
             )
        INTO postgresql:///ip4r?geolite.blocks
             (
                iprange ip4r using (ip-range startIpNum endIpNum),
                locId
             )
        WITH skip header = 2,
             fields optionally enclosed by '"',
             fields escaped by double-quote,
             fields terminated by ','

   FINALLY DO
     $$ create index blocks_ip4r_idx on geolite.blocks using gist(iprange); $$;

The archive command accepts the following clauses and options.

Archive Source Specification: FROM

Filename or HTTP URI where to load the data from. When given an HTTP URL the linked file will get downloaded locally before processing.

If the file is a zip file, the command line utility unzip is used to expand the archive into files in \$TMPDIR, or /tmp if \$TMPDIR is unset or set to a non-existing directory.

Then the following commands are used from the top level directory where the archive has been expanded.

Archive Sub Commands

  • command [ AND command … ]

    A series of commands against the contents of the archive, at the moment only CSV,`‘FIXED` and DBF commands are supported.

    Note that commands are supporting the clause FROM FILENAME MATCHING which allows the pgloader command not to depend on the exact names of the archive directories.

    The same clause can also be applied to several files with using the spelling FROM ALL FILENAMES MATCHING and a regular expression.

    The whole matching clause must follow the following rule:

    FROM [ ALL FILENAMES | [ FIRST ] FILENAME ] MATCHING
    

Archive Final SQL Commands

  • FINALLY DO

    SQL Queries to run once the data is loaded, such as CREATE INDEX.