-
Notifications
You must be signed in to change notification settings - Fork 162
ncdmp
ncdmp
(options)* (field_mapping)+
The program ncdmp(1) converts an input dump file (.dmp
) into an output
dump file which can be used to initialize and populate a nanocube
data
structure.
These control whether the format in which the output .dmp
records will be
written:
-
--encoding
=type: Set the encoding of the output.dmp
file. Possible encoding types are as follows:t text
b binaryImportant: input text file must end with a new line, otherwise the last line is not processed.
-
--max
=num: Convert and dump only the first num records from the input file to the output file. -
--cat
: Copy all input fields as output fields. -
-v
,--version
: Show ncdmp version and exit.
A field mapping describes how generate fields in the output dump file. These ouput fields are typed in a way compatible with a nanocube. Their typename indicate whether they are dimension fields or variable fields. It is expected at least one time dimension fields and one dimension that is not time and one variable in each conversion. Here is the syntax of field_mappings:
-
dim-dmq
=name,lat-field,lon-field,quadtree-levels
This dimension is generated by reading latitude and longitude fields from the input file (floating point numbers representing world coordinates in degrees), converting those into mercator coordinates, and then converting the mercator coordinates into a grid cell address in a grid of 2^quadtree-levels-by-2^quadtree-levels. The output field is of typenc_dim_quadtree_X
where X=quadtree-levels. -
dim-tbin
=name,time-field,tbin,num-bytes
Maps the input time into an integral number based on a reference timestamp and a length of time. For example "tbin-spec" might be "2010_1h" indicating that if the encoded input is in the first hour of 2010 it will be mapped to 0, if it is on the second hour of 2010, and so on. The output field will be of the type "nc_dim_time_X" where X = num-bytes. A metadata pair is added to the output metadata dictionary (e.g. "tbin" -> "2010_1h") -
dim-hour
=name,time-field
Generates one output field of type "nc_dim_cat_1" based on an input field of type "time" the values will be the hour of the day encoded on the time field: 0 to 23. -
dim-weekday
=name,time-field
Generates one output field of type "nc_dim_cat_1" based on an input field of type "time" the values will be the day of the week encoded on the input time field assuming local time conversion. The output values are 1 (Mon) to 7 (Sun). -
dim-month
=name,time-field
Generates one output field of type "nc_dim_cat_1" based on an input field of type "time" the values will be the month of the year encoded on the input time field. The output values are 1 (Jan) to 12 (Dec). -
dim-cat
=name,uintX-field
The output will be anc_dim_cat_Y
where Y = X/8. The valnames associated with the input field will be associated with output field as well (same meaning for the values). -
var-uint
=name,uintX-field
The output will be anc_dim_cat_Y
where "Y = X/8". The valnames associated with the input field will be associated with output field as well (same meaning for the values). -
var-one
=name,num-bytes
The output will be anc_var_uint_X
where "Y = X/8". The value will be one. -
copy
=name,field-name
Copy the field-name to the output dump file using name.
Convert taxi trips with source and destination information into a nanocube compatible dump file. First a test with only 10 entries in text mode:
$ cat taxi.dmp | \
ncdmp --max=10 --encoding=t \
dim-dmq=src,src_lat,src_lon,20 \
dim-dmq=dst,dst_lat,dst_lon,20 \
dim-cat=tip,tip \
dim-hour=hour,pickup_time \
dim-tbin=time,pickup_time,2010_1h,2 \
var-uint=distance,distance,8 \
var-one=trips,6 \
> taxi_sample_for_nanocube.dmp
And here is how one would start a nanocube instance with all the data (note that ncdmp must encode in binary format when sending data to a nanocube):
$ cat taxi.dmp | ncdmp \
--encoding=b \
dim-dmq=src,src_lat,src_lon,20 \
dim-dmq=dst,dst_lat,dst_lon,20 \
dim-cat=tip,tip \
dim-hour=hour,pickup_time \
dim-tbin=time,pickup_time,2010_1h,2 \
var-uint=distance,distance,8 \
var-one=trips,6 | ncserve --port=29512
This is an example of a dump file:
name: taxi
encoding: text
field: src_lat float
field: src_lon float
field: dst_lat float
field: dst_lon float
field: tip uint8
valname: tip 0 tip_0_1
valname: tip 1 tip_1_2
valname: tip 2 tip_2_3
valname: tip 3 tip_3+
field: pickup_time time
field: distance uint64
-50.0 20.0 -50.1 20.1 0 2010-01-01T00:00+03:15 100
-30.0 30.0 -40.1 30.1 1 2012-12-12T12:24-03 200
And here is an example of a dump file compatible with a nanocube:
name: taxi
encoding: text
metadata: tbin 2010_1h
metadata: src_origin degrees_mercator_quadtree_20
metadata: dst_origin degrees_mercator_quadtree_20
field: src nc_dim_quadtree_20
field: dst nc_dim_quadtree_20
field: tip nc_dim_cat_1
valname: tip 0 tip_0_1
valname: tip 1 tip_1_2
valname: tip 2 tip_2_3
valname: tip 3 tip_3+
field: time nc_dim_time_2
field: trips nc_var_uint_6
field: distance nc_var_uint_8
21231 123123 1231231 12312321 0 123 1 100
21231 123123 1231231 12312321 0 123 1 200
ncdmp is Copyright (C) 2013 AT&T Intellectual Property [email protected]
nanocube(2)