tutorial/fixed
tutorial/fixed
Loading Fixed Width Data File with pgloader
Some data providers still use a format where each column is specified with a starting index position and a given length. Usually the columns are blank-padded when the data is shorter than the full reserved range.
The Command
To load data with pgloader you need to define in a command the operations in some details. Here’s our example for loading Fixed Width Data, using a file provided by the US census.
You can find more files from them at the Census 2000 Gazetteer Files.
Here’s our command:
LOAD ARCHIVE
FROM http://www.census.gov/geo/maps-data/data/docs/gazetteer/places2k.zip
INTO postgresql:///pgloader
BEFORE LOAD DO
$$ drop table if exists places; $$,
$$ create table places
(
usps char(2) not null,
fips char(2) not null,
fips_code char(5),
loc_name varchar(64)
);
$$
LOAD FIXED
FROM FILENAME MATCHING ~/places2k.txt/
WITH ENCODING latin1
(
usps from 0 for 2,
fips from 2 for 2,
fips_code from 4 for 5,
"LocationName" from 9 for 64 [trim right whitespace],
p from 73 for 9,
h from 82 for 9,
land from 91 for 14,
water from 105 for 14,
ldm from 119 for 14,
wtm from 131 for 14,
lat from 143 for 10,
long from 153 for 11
)
INTO postgresql:///pgloader?places
(
usps, fips, fips_code, "LocationName"
);
The Data
This command allows loading the following file content, where we are only showing the first couple of lines:
AL0100124Abbeville city 2987 1353 40301945 120383 15.560669 0.046480 31.566367 -85.251300
AL0100460Adamsville city 4965 2042 50779330 14126 19.606010 0.005454 33.590411 -86.949166
AL0100484Addison town 723 339 9101325 0 3.514041 0.000000 34.200042 -87.177851
AL0100676Akron town 521 239 1436797 0 0.554750 0.000000 32.876425 -87.740978
AL0100820Alabaster city 22619 8594 53023800 141711 20.472605 0.054715 33.231162 -86.823829
AL0100988Albertville city 17247 7090 67212867 258738 25.951034 0.099899 34.265362 -86.211261
AL0101132Alexander City city 15008 6855 100534344 433413 38.816529 0.167342 32.933157 -85.936008
Loading the data
Let’s start the pgloader command with our census-places.load command file:
$ pgloader census-places.load
... LOG Starting pgloader, log system is ready.
... LOG Parsing commands from file "/Users/dim/dev/pgloader/test/census-places.load"
... LOG Fetching 'http://www.census.gov/geo/maps-data/data/docs/gazetteer/places2k.zip'
... LOG Extracting files from archive '//private/var/folders/w7/9n8v8pw54t1gngfff0lj16040000gn/T/pgloader//places2k.zip'
table name read imported errors time
----------------- --------- --------- --------- --------------
download 0 0 0 1.494s
extract 0 0 0 1.013s
before load 2 2 0 0.013s
----------------- --------- --------- --------- --------------
places 25375 25375 0 0.499s
----------------- --------- --------- --------- --------------
Total import time 25375 25375 0 3.019s
We can see that pgloader did download the file from its HTTP URL location then unziped it before the loading itself.
Note that the output of the command has been edited to facilitate its browsing online.