Skip to content
Lauro Lins edited this page Oct 23, 2013 · 2 revisions

ncdmp(1) -- converts a dump file into a nanocube compatible one

SYNOPSIS

ncdmp (options)* (field_mapping)+

DESCRIPTION

The program ncdmp(1) converts an input dump file (.dmp) into an output dump file which can be used to initialize and populate a nanocube data structure.

OPTIONS

These control whether the format in which the output .dmp records will be written:

  • --encoding=type: Set the encoding of the output .dmp file. Possible encoding types are as follows:

    t text
    b binary

    Important: input text file must end with a new line, otherwise the last line is not processed.

  • --max=num: Convert and dump only the first num records from the input file to the output file.

  • --cat: Copy all input fields as output fields.

  • -v, --version: Show ncdmp version and exit.

FIELD MAPPINGS

A field mapping describes how generate fields in the output dump file. These ouput fields are typed in a way compatible with a nanocube. Their typename indicate whether they are dimension fields or variable fields. It is expected at least one time dimension fields and one dimension that is not time and one variable in each conversion. Here is the syntax of field_mappings:

  • dim-dmq=name,lat-field,lon-field,quadtree-levels
    This dimension is generated by reading latitude and longitude fields from the input file (floating point numbers representing world coordinates in degrees), converting those into mercator coordinates, and then converting the mercator coordinates into a grid cell address in a grid of 2^quadtree-levels-by-2^quadtree-levels. The output field is of type nc_dim_quadtree_X where X=quadtree-levels.

  • dim-tbin=name,time-field,tbin,num-bytes
    Maps the input time into an integral number based on a reference timestamp and a length of time. For example "tbin-spec" might be "2010_1h" indicating that if the encoded input is in the first hour of 2010 it will be mapped to 0, if it is on the second hour of 2010, and so on. The output field will be of the type "nc_dim_time_X" where X = num-bytes. A metadata pair is added to the output metadata dictionary (e.g. "tbin" -> "2010_1h")

  • dim-hour=name,time-field
    Generates one output field of type "nc_dim_cat_1" based on an input field of type "time" the values will be the hour of the day encoded on the time field: 0 to 23.

  • dim-weekday=name,time-field
    Generates one output field of type "nc_dim_cat_1" based on an input field of type "time" the values will be the day of the week encoded on the input time field assuming local time conversion. The output values are 1 (Mon) to 7 (Sun).

  • dim-month=name,time-field
    Generates one output field of type "nc_dim_cat_1" based on an input field of type "time" the values will be the month of the year encoded on the input time field. The output values are 1 (Jan) to 12 (Dec).

  • dim-cat=name,uintX-field
    The output will be a nc_dim_cat_Y where Y = X/8. The valnames associated with the input field will be associated with output field as well (same meaning for the values).

  • var-uint=name,uintX-field
    The output will be a nc_dim_cat_Y where "Y = X/8". The valnames associated with the input field will be associated with output field as well (same meaning for the values).

  • var-one=name,num-bytes
    The output will be a nc_var_uint_X where "Y = X/8". The value will be one.

  • copy=name,field-name
    Copy the field-name to the output dump file using name.

EXAMPLES

Convert taxi trips with source and destination information into a nanocube compatible dump file. First a test with only 10 entries in text mode:

$ cat taxi.dmp | \  
     ncdmp --max=10 --encoding=t \  
        dim-dmq=src,src_lat,src_lon,20 \  
        dim-dmq=dst,dst_lat,dst_lon,20 \  
        dim-cat=tip,tip \  
        dim-hour=hour,pickup_time \  
        dim-tbin=time,pickup_time,2010_1h,2 \  
        var-uint=distance,distance,8 \  
        var-one=trips,6 \  
        > taxi_sample_for_nanocube.dmp  

And here is how one would start a nanocube instance with all the data (note that ncdmp must encode in binary format when sending data to a nanocube):

$ cat taxi.dmp | ncdmp \  
        --encoding=b \  
        dim-dmq=src,src_lat,src_lon,20 \  
        dim-dmq=dst,dst_lat,dst_lon,20 \  
        dim-cat=tip,tip \  
        dim-hour=hour,pickup_time \  
        dim-tbin=time,pickup_time,2010_1h,2 \  
        var-uint=distance,distance,8 \  
        var-one=trips,6 | ncserve --port=29512

DUMP FILES

This is an example of a dump file:

name: taxi
encoding: text
field:   src_lat float
field:   src_lon float
field:   dst_lat float
field:   dst_lon float
field:   tip uint8
valname: tip 0 tip_0_1
valname: tip 1 tip_1_2
valname: tip 2 tip_2_3
valname: tip 3 tip_3+
field:   pickup_time time
field:   distance uint64

-50.0 20.0 -50.1 20.1 0 2010-01-01T00:00+03:15 100
-30.0 30.0 -40.1 30.1 1 2012-12-12T12:24-03 200

And here is an example of a dump file compatible with a nanocube:

name: taxi
encoding: text
metadata: tbin 2010_1h
metadata: src_origin degrees_mercator_quadtree_20
metadata: dst_origin degrees_mercator_quadtree_20
field:    src nc_dim_quadtree_20
field:    dst nc_dim_quadtree_20
field:    tip nc_dim_cat_1
valname:  tip 0 tip_0_1
valname:  tip 1 tip_1_2
valname:  tip 2 tip_2_3
valname:  tip 3 tip_3+
field:    time nc_dim_time_2
field:    trips nc_var_uint_6
field:    distance nc_var_uint_8

21231 123123 1231231 12312321 0 123 1 100
21231 123123 1231231 12312321 0 123 1 200

COPYRIGHT

ncdmp is Copyright (C) 2013 AT&T Intellectual Property [email protected]

SEE ALSO

nanocube(2)