Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hdf/netcdf extractor for clowder #145

Closed
robkooper opened this issue Aug 15, 2016 · 57 comments
Closed

hdf/netcdf extractor for clowder #145

robkooper opened this issue Aug 15, 2016 · 57 comments
Assignees
Milestone

Comments

@robkooper
Copy link
Member

Given the netcdf files we receive we should have an extrctor that takes the properties of the netcdf/hdf file and inserts them as metadata in clowder.

Another user in the CyberGIS group has brought this up as well as an interesting item.

Also see https://opensource.ncsa.illinois.edu/jira/browse/CATS-628

@ghost ghost assigned yanliu-chn Aug 16, 2016
@robkooper robkooper assigned fwang29 and unassigned yanliu-chn Aug 17, 2016
@ghost ghost added the 1 - Ready label Aug 17, 2016
@yanliu-chn
Copy link

do you want the generic metadata for netcdf? here is the output of gdalinfo from @czender Charles' hyperspectral netcdf output, see if this is what you want:

Driver: HDF5Image/HDF5 Dataset
Files: output/0596c17f-2e4c-4d43-9d77-cde8ffbde663.nc
Size is 1600, 468
Coordinate System is `'
Metadata:
  Conventions=CF-1.5
  created_by=ubuntu
  gantry_system_fixed_metadata_gantry_fixed_data_1=Todo
  gantry_system_fixed_metadata_gantry_fixed_data_2=Todo
  gantry_system_fixed_metadata_System_manufacturer=LemnaTec Corp.
  gantry_system_variable_metadata_Camnera_box_light_1_is_on=True
  gantry_system_variable_metadata_Camnera_box_light_2_is_on=True
  gantry_system_variable_metadata_Camnera_box_light_3_is_on=True
  gantry_system_variable_metadata_Camnera_box_light_4_is_on=True
  gantry_system_variable_metadata_Gantry_Speed_in_]_Direction=0
  gantry_system_variable_metadata_Position_in_]_Direction=0.97
  gantry_system_variable_metadata_Time=04/07/2016 16:15:45
  header_info_AOI_height=960
  header_info_AOI_left=480
  header_info_AOI_top=600
  header_info_AOI_width=1600
  header_info_Array_Pixel_Pitch=6.5
  header_info_AverageDispersion=0.63986398
  header_info_bands=955
  header_info_byte_order=0
  header_info_Col_binning=1
  header_info_data_type=12
  header_info_default_bands={140,234,500}
  header_info_description={[HEADWALL Hyperspec III]}
  header_info_file_type=ENVI Standard
  header_info_FrameIndex=frameIndex.txt
  header_info_header_offset=0
  header_info_HSIII_VERSION=E51215 vs64
  header_info_interleave=bil
  header_info_Lens_EFL=17
  header_info_Lens_folder=
  header_info_lines=468
  header_info_Nuc_folder=
  header_info_Pixel0=3.100546185
  header_info_POST_AOI_height=955
  header_info_POST_AOI_left=0
  header_info_POST_AOI_top=5
  header_info_POST_AOI_width=1600
  header_info_POST_Col_binning=1
  header_info_POST_Row_binning=1
  header_info_Row_binning=1
  header_info_samples=1600
  header_info_sensor_type=Unknown
  header_info_Serial_Number=SN-G4-384
  history=Wed Aug 17 23:14:41 2016: ncks -A /tmp/terraref_tmp_jsn.nc.pid3692.fl0
0.tmp /tmp/terraref_tmp_att.nc.pid3692.fl00.tmp
Wed Aug 17 23:14:01 2016: python /home/ubuntu/terraref-hyperspectral-input-sampl
e/0596c17f-2e4c-4d43-9d77-cde8ffbde663_raw /tmp/terraref_tmp_jsn.nc.pid3692.fl00
.tmp
  history_of_appended_files=Wed Aug 17 23:14:41 2016: Appended file /tmp/terrare
f_tmp_jsn.nc.pid3692.fl00.tmp had following "history" attribute:
Wed Aug 17 23:14:01 2016: python /home/ubuntu/terraref-hyperspectral-input-sampl
e/0596c17f-2e4c-4d43-9d77-cde8ffbde663_raw /tmp/terraref_tmp_jsn.nc.pid3692.fl00
.tmp
  NCO="4.6.1"
  Project=TERRAREF
  sensor_fixed_metadata_sensor_description=Todo
  sensor_fixed_metadata_sensor_manufacturer=Headwall Scientific
  sensor_fixed_metadata_sensor_product_name=VNIR
  sensor_fixed_metadata_sensor_purpose=Todo
  sensor_fixed_metadata_sensor_serial_number=Todo
  sensor_variable_metadata_constmirrorpos=0
  sensor_variable_metadata_createdatacube=0
  sensor_variable_metadata_exposure=45
  sensor_variable_metadata_frameperiod=50
  sensor_variable_metadata_speed=100
  sensor_variable_metadata_startpos=-70
  sensor_variable_metadata_stoppos=70
  sensor_variable_metadata_useexternaltrigger=0
  sensor_variable_metadata_userotatingmirror=0
  terraref_hostname=hyperspectral-ex-vm
  terraref_script=terraref.sh
  terraref_version=4.6.1
  title=None given (supply with --trr ttl="Title")
  user_given_metadata_and_so_on_and_so_on...=...
  user_given_metadata_experiment_info_1=...
  user_given_metadata_first_wheat_test_by_Markus_Radermacher=
  _NCProperties=version=1|netcdflibversion=4.4.1|hdf5libversion=1.8.17
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,  468.0)
Upper Right ( 1600.0,    0.0)
Lower Right ( 1600.0,  468.0)
Center      (  800.0,  234.0)
Band 1 Block=1600x1 Type=UInt16, ColorInterp=Undefined
  Metadata:
    xps_img_long_name=Exposure counts
    xps_img_meaning=Counts on scale from 0 to 2^16-1 = 65535
    xps_img_units=1
    xps_img__Netcdf4Dimid=0
Band 2 Block=1600x1 Type=UInt16, ColorInterp=Undefined
  Metadata:
    xps_img_long_name=Exposure counts
    xps_img_meaning=Counts on scale from 0 to 2^16-1 = 65535
    xps_img_units=1
    xps_img__Netcdf4Dimid=0
Band 3 Block=1600x1 Type=UInt16, ColorInterp=Undefined
  Metadata:
    xps_img_long_name=Exposure counts
    xps_img_meaning=Counts on scale from 0 to 2^16-1 = 65535
    xps_img_units=1
    xps_img__Netcdf4Dimid=0
Band 4 Block=1600x1 Type=UInt16, ColorInterp=Undefined
  Metadata:
    xps_img_long_name=Exposure counts
    xps_img_meaning=Counts on scale from 0 to 2^16-1 = 65535
    xps_img_units=1
    xps_img__Netcdf4Dimid=0
Band 5 Block=1600x1 Type=UInt16, ColorInterp=Undefined
  Metadata:
    xps_img_long_name=Exposure counts
    xps_img_meaning=Counts on scale from 0 to 2^16-1 = 65535
    xps_img_units=1
    xps_img__Netcdf4Dimid=0
Band 6 Block=1600x1 Type=UInt16, ColorInterp=Undefined
  Metadata:
    xps_img_long_name=Exposure counts
    xps_img_meaning=Counts on scale from 0 to 2^16-1 = 65535
    xps_img_units=1
    xps_img__Netcdf4Dimid=0
...

@czender
Copy link
Contributor

czender commented Aug 18, 2016

FYI gdalinfo above collapses (loses) the group structure of the netCDF4 metadata, and represents it as a flat namespace by joining path names together with underscores. For example,
gantry_system_fixed_metadata is a group.

@yanliu-chn
Copy link

i see. @czender , is there a tool to extract structured metadata for netcdf?

@czender
Copy link
Contributor

czender commented Aug 18, 2016

This extracts all metadata:
ncks --cdl -m -M /home/zender/a33641c2-8a1e-4a63-9d33-ab66717d6b8a.nc
This extracts only metadata pertinent to variable "y":
ncks --cdl -v y -m /home/zender/a33641c2-8a1e-4a63-9d33-ab66717d6b8a.nc

@yanliu-chn
Copy link

cool. thanks! @czender does ncks support json output format?

@czender
Copy link
Contributor

czender commented Aug 18, 2016

No, ncks does not output json. It can output CDL and NcML.

@yanliu-chn
Copy link

i see. Thanks!

@dlebauer
Copy link
Member

FYI ncdump-json on github.

@robkooper
Copy link
Member Author

Another option is to use https://github.com/hay/xml2json and do ncks -xml file | xml2json

@gsrohde
Copy link

gsrohde commented Aug 19, 2016

If you do use that library, you should be aware that it throws away some of the XML information. For example, it converts the ordered sequence of elements <a>text</a ><b>text</b> into the JavaScript object (hash) { "a": "text", "b": "text" }. If the order of elements in your XML documents is not significant, then of course this doesn't really matter. But there may be other cases where information is thrown away, cases not so readily discernible from the documentation. I'm not that familiar with the netcdf/hdf formats, so I can't really evaluate ncdump-json in this regard.

@ghost ghost assigned Zodiase and yanliu-chn and unassigned Zodiase and fwang29 Sep 22, 2016
@ghost
Copy link

ghost commented Sep 22, 2016

convert xml to JSON

@czender
Copy link
Contributor

czender commented Sep 22, 2016

Typo above: the ncks commands to produce XML output are

ncks --xml in.nc # entire file
ncks --xml -m in.nc # variable and group metadata
ncks --xml -m -M # variable and group and global metadata

Hopefully you can pipe these to xml2json as Rob suggests...

@Zodiase
Copy link
Contributor

Zodiase commented Sep 22, 2016

Do we want the entire output file or just some specific variables or groups?

Also is there a metadata field name that should be used? The entire output JSON be stuffed in there.

@czender
Copy link
Contributor

czender commented Sep 22, 2016

The people requesting the feature should answer this :)

@yanliu-chn
Copy link

@dlebauer ncks -m output:

netcdf 0596c17f-2e4c-4d43-9d77-cde8ffbde663 {
  dimensions:
    wavelength = 955 ;
    wvl_nvr = 1024 ;
    x = 1600 ;
    y = 468 ;
  variables:
    float flx_dwn(wavelength) ;
      flx_dwn:long_name = "Downwelling spectral irradiance" ;
      flx_dwn:standard_name = "surface_downwelling_radiative_flux_per_unit_wavelength_in_air" ;
      flx_dwn:units = "watt meter-2 meter-1" ;
    float flx_sns(wvl_nvr) ;
      flx_sns:long_name = "Flux sensitivity of each band (irradiance per count)" ;
      flx_sns:provenance = "EnvironmentalLogger calibration information from file S05673_08062015.IrradCal provided by TinoDornbusch and discussed here: https://github.com/terraref/reference-data/issues/30#issuecomment-217518434" ;
      flx_sns:units = "joule count-1" ;
    double frametime(y) ;
      frametime:units = "days since 1970-01-01 00:00:00" ;
      frametime:calender = "gregorian" ;
    float rfl_img(wavelength,y,x) ;
      rfl_img:long_name = "Reflectance of image" ;
      rfl_img:meaning = "Counts on scale from 0 to 2^16-1 = 65535" ;
      rfl_img:standard_name = "surface_albedo" ;
      rfl_img:units = "1" ;
    float rfl_wht(wavelength) ;
      rfl_wht:long_name = "Reflectance of white reference" ;
      rfl_wht:units = "1" ;
    double wavelength(wavelength) ;
      wavelength:long_name = "Hyperspectral Wavelength" ;
      wavelength:units = "meter" ;
      wavelength:standard_name = "radiation_wavelength" ;
    float wvl_dlt(wvl_nvr) ;
      wvl_dlt:long_name = "Bandwidth of environmental sensor" ;
      wvl_dlt:notes = "Bandwidth, also called dispersion, is between 0.455-0.495 nm across all channels. Values computed as differences between midpoints of adjacent band-centers." ;
      wvl_dlt:standard_name = "bandwidth" ;
      wvl_dlt:units = "meter" ;
    float wvl_nvr(wvl_nvr) ;
      wvl_nvr:long_name = "Wavelength of environmental sensor" ;
      wvl_nvr:provenance = "EnvironmentalLogger calibration information from file S05673_08062015.IrradCal provided by TinoDornbusch and discussed here: https://github.com/terraref/reference-data/issues/30#issuecomment-217518434" ;
      wvl_nvr:standard_name = "sensor_band_central_radiation_wavelength" ;
      wvl_nvr:units = "meter" ;
    double x(x) ;
      x:algorithm = "CSZ implemented these fake data to be replaced by real formula once available." ;
      x:long_name = "North-south offset from start position" ;
      x:units = "meter" ;
    ushort xps_drk(wavelength,x) ;
      xps_drk:long_name = "Exposure from dark reference sheet/panel" ;
      xps_drk:units = "Counts on scale from 0 to 2^16-1 = 65535" ;
    ushort xps_img(wavelength,y,x) ;
      xps_img:long_name = "Exposure counts" ;
      xps_img:meaning = "Counts on scale from 0 to 2^16-1 = 65535" ;
      xps_img:units = "1" ;
    ushort xps_wht(wavelength,x) ;
      xps_wht:long_name = "Exposure from white reference sheet/panel" ;
      xps_wht:units = "Counts on scale from 0 to 2^16-1 = 65535" ;
    double y(y) ;
      y:algorithm = "Based on https://github.com/terraref/computing-pipeline/issues/144. y is defined as 0.9853 mm per pixel. Exact number is 0.98526434004512529576754637665 mm." ;
      y:long_name = "East-west offset from start position" ;
      y:units = "meter" ;
  group: gantry_system_fixed_metadata {
  } // group /gantry_system_fixed_metadata
  group: gantry_system_variable_metadata {
    variables:
      double u ;
        u:long_name = "Gantry_Speed_in_X_Direction" ;
        u:units = "meter second-1" ;
      double v ;
        v:long_name = "Gantry_Speed_in_Y_Direction" ;
        v:units = "meter second-1" ;
      double w ;
        w:long_name = "Gantry_Speed_in_Z_Direction" ;
        w:units = "meter second-1" ;
      double x ;
        x:long_name = "Position_in_X_Direction" ;
        x:units = "meter" ;
      double y ;
        y:long_name = "Position_in_Y_Direction" ;
        y:units = "meter" ;
      double z ;
        z:long_name = "Position_in_Z_Direction" ;
        z:units = "meter" ;
  } // group /gantry_system_variable_metadata
  group: header_info {
    variables:
      double blue_band_index ;
      double green_band_index ;
      double red_band_index ;
  } // group /header_info
  group: sensor_fixed_metadata {
  } // group /sensor_fixed_metadata
  group: sensor_variable_metadata {
    variables:
      double constmirrorpos ;
        constmirrorpos:long_name = "constmirrorpos" ;
      double createdatacube ;
        createdatacube:long_name = "createdatacube" ;
      double exposure ;
        exposure:long_name = "exposure" ;
        exposure:red_band_index = 140l ;
        exposure:green_band_index = 234l ;
        exposure:blue_band_index = 500l ;
      double frameperiod ;
        frameperiod:long_name = "frameperiod" ;
      double speed ;
        speed:long_name = "speed" ;
      double startpos ;
        startpos:long_name = "startpos" ;
      double stoppos ;
        stoppos:long_name = "stoppos" ;
      double useexternaltrigger ;
        useexternaltrigger:long_name = "useexternaltrigger" ;
      double userotatingmirror ;
        userotatingmirror:long_name = "userotatingmirror" ;
  } // group /sensor_variable_metadata
  group: user_given_metadata {
  } // group /user_given_metadata
} // group /

@yanliu-chn
Copy link

ncks -m -M output:

netcdf 0596c17f-2e4c-4d43-9d77-cde8ffbde663 {
  dimensions:
    wavelength = 955 ;
    wvl_nvr = 1024 ;
    x = 1600 ;
    y = 468 ;
  variables:
    float flx_dwn(wavelength) ;
      flx_dwn:long_name = "Downwelling spectral irradiance" ;
      flx_dwn:standard_name = "surface_downwelling_radiative_flux_per_unit_wavelength_in_air" ;
      flx_dwn:units = "watt meter-2 meter-1" ;
    float flx_sns(wvl_nvr) ;
      flx_sns:long_name = "Flux sensitivity of each band (irradiance per count)" ;
      flx_sns:provenance = "EnvironmentalLogger calibration information from file S05673_08062015.IrradCal provided by TinoDornbusch and discussed here: https://github.com/terraref/reference-data/issues/30#issuecomment-217518434" ;
      flx_sns:units = "joule count-1" ;
    double frametime(y) ;
      frametime:units = "days since 1970-01-01 00:00:00" ;
      frametime:calender = "gregorian" ;
    float rfl_img(wavelength,y,x) ;
      rfl_img:long_name = "Reflectance of image" ;
      rfl_img:meaning = "Counts on scale from 0 to 2^16-1 = 65535" ;
      rfl_img:standard_name = "surface_albedo" ;
      rfl_img:units = "1" ;
    float rfl_wht(wavelength) ;
      rfl_wht:long_name = "Reflectance of white reference" ;
      rfl_wht:units = "1" ;
    double wavelength(wavelength) ;
      wavelength:long_name = "Hyperspectral Wavelength" ;
      wavelength:units = "meter" ;
      wavelength:standard_name = "radiation_wavelength" ;
    float wvl_dlt(wvl_nvr) ;
      wvl_dlt:long_name = "Bandwidth of environmental sensor" ;
      wvl_dlt:notes = "Bandwidth, also called dispersion, is between 0.455-0.495 nm across all channels. Values computed as differences between midpoints of adjacent band-centers." ;
      wvl_dlt:standard_name = "bandwidth" ;
      wvl_dlt:units = "meter" ;
    float wvl_nvr(wvl_nvr) ;
      wvl_nvr:long_name = "Wavelength of environmental sensor" ;
      wvl_nvr:provenance = "EnvironmentalLogger calibration information from file S05673_08062015.IrradCal provided by TinoDornbusch and discussed here: https://github.com/terraref/reference-data/issues/30#issuecomment-217518434" ;
      wvl_nvr:standard_name = "sensor_band_central_radiation_wavelength" ;
      wvl_nvr:units = "meter" ;
    double x(x) ;
      x:algorithm = "CSZ implemented these fake data to be replaced by real formula once available." ;
      x:long_name = "North-south offset from start position" ;
      x:units = "meter" ;
    ushort xps_drk(wavelength,x) ;
      xps_drk:long_name = "Exposure from dark reference sheet/panel" ;
      xps_drk:units = "Counts on scale from 0 to 2^16-1 = 65535" ;
    ushort xps_img(wavelength,y,x) ;
      xps_img:long_name = "Exposure counts" ;
      xps_img:meaning = "Counts on scale from 0 to 2^16-1 = 65535" ;
      xps_img:units = "1" ;
    ushort xps_wht(wavelength,x) ;
      xps_wht:long_name = "Exposure from white reference sheet/panel" ;
      xps_wht:units = "Counts on scale from 0 to 2^16-1 = 65535" ;
    double y(y) ;
      y:algorithm = "Based on https://github.com/terraref/computing-pipeline/issues/144. y is defined as 0.9853 mm per pixel. Exact number is 0.98526434004512529576754637665 mm." ;
      y:long_name = "East-west offset from start position" ;
      y:units = "meter" ;
  // global attributes:
    :title = "None given (supply with --trr ttl=\"Title\")" ;
    :created_by = "yanliu" ;
    :Conventions = "CF-1.5" ;
    :Project = "TERRAREF" ;
    :terraref_script = "terraref.sh" ;
    :terraref_hostname = "cg-gpu01" ;
    :terraref_version = "4.6.0" ;
    :history = "Thu Sep  1 11:09:33 2016: ncap2 -A -S /gpfs/largeblockFS/projects/arpae/sw/computing-pipeline/scripts/hyperspectral/terraref.nco /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid44592.fl00.tmp /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid44592.fl00.tmp\n",
      "Thu Sep  1 11:09:31 2016: ncks -A /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid44592.fl00.tmp /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid44592.fl00.tmp\n",
      "Thu Sep 01 11:09:30 2016: python input/0596c17f-2e4c-4d43-9d77-cde8ffbde663_raw /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid44592.fl00.tmp" ;
    :NCO = "\"4.6.0\"" ;
    :history_of_appended_files = "Thu Sep  1 11:09:33 2016: Appended file /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid44592.fl00.tmp had following \"history\" attribute:\n",
      "Thu Sep  1 11:09:31 2016: ncks -A /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid44592.fl00.tmp /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid44592.fl00.tmp\n",
      "Thu Sep 01 11:09:30 2016: python input/0596c17f-2e4c-4d43-9d77-cde8ffbde663_raw /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid44592.fl00.tmp\n",
      "Thu Sep  1 11:09:31 2016: Appended file /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid44592.fl00.tmp had following \"history\" attribute:\n",
      "Thu Sep 01 11:09:30 2016: python input/0596c17f-2e4c-4d43-9d77-cde8ffbde663_raw /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid44592.fl00.tmp\n" ;
    :nco_openmp_thread_number = 1 ;
  group: gantry_system_fixed_metadata {
    // group attributes:
      :gantry_fixed_data_2 = "Todo" ;
      :gantry_fixed_data_1 = "Todo" ;
      :System_manufacturer = "LemnaTec Corp." ;
  } // group /gantry_system_fixed_metadata
  group: gantry_system_variable_metadata {
    variables:
      double u ;
        u:long_name = "Gantry_Speed_in_X_Direction" ;
        u:units = "meter second-1" ;
      double v ;
        v:long_name = "Gantry_Speed_in_Y_Direction" ;
        v:units = "meter second-1" ;
      double w ;
        w:long_name = "Gantry_Speed_in_Z_Direction" ;
        w:units = "meter second-1" ;
      double x ;
        x:long_name = "Position_in_X_Direction" ;
        x:units = "meter" ;
      double y ;
        y:long_name = "Position_in_Y_Direction" ;
        y:units = "meter" ;
      double z ;
        z:long_name = "Position_in_Z_Direction" ;
        z:units = "meter" ;
    // group attributes:
      :Camnera_box_light_4_is_on = "True" ;
      :Position_in_\]_Direction = "0.97" ;
      :Camnera_box_light_2_is_on = "True" ;
      :Camnera_box_light_1_is_on = "True" ;
      :Gantry_Speed_in_\]_Direction = "0" ;
      :Time = "04/07/2016 16:15:45" ;
      :Camnera_box_light_3_is_on = "True" ;
  } // group /gantry_system_variable_metadata
  group: header_info {
    variables:
      double blue_band_index ;
      double green_band_index ;
      double red_band_index ;
    // group attributes:
      :HSIII_VERSION = "E51215 vs64" ;
      :POST_AOI_left = "0" ;
      :Col_binning = "1" ;
      :AOI_width = "1600" ;
      :Row_binning = "1" ;
      :FrameIndex = "frameIndex.txt" ;
      :AOI_height = "960" ;
      :header_offset = "0" ;
      :Lens_EFL = "17" ;
      :Serial_Number = "SN-G4-384" ;
      :samples = "1600" ;
      :byte_order = "0" ;
      :Lens_folder = "" ;
      :description = "{[HEADWALL Hyperspec III]}" ;
      :default_bands = "{140,234,500}" ;
      :bands = "955" ;
      :POST_Row_binning = "1" ;
      :POST_AOI_width = "1600" ;
      :file_type = "ENVI Standard" ;
      :Nuc_folder = "" ;
      :data_type = "12" ;
      :AverageDispersion = "0.63986398" ;
      :POST_Col_binning = "1" ;
      :Array_Pixel_Pitch = "6.5" ;
      :sensor_type = "Unknown" ;
      :POST_AOI_height = "955" ;
      :lines = "468" ;
      :interleave = "bil" ;
      :AOI_top = "600" ;
      :Pixel0 = "3.100546185" ;
      :AOI_left = "480" ;
      :POST_AOI_top = "5" ;
  } // group /header_info
  group: sensor_fixed_metadata {
    // group attributes:
      :sensor_serial_number = "Todo" ;
      :sensor_purpose = "Todo" ;
      :sensor_product_name = "VNIR" ;
      :sensor_description = "Todo" ;
      :sensor_manufacturer = "Headwall Scientific" ;
  } // group /sensor_fixed_metadata
  group: sensor_variable_metadata {
    variables:
      double constmirrorpos ;
        constmirrorpos:long_name = "constmirrorpos" ;
      double createdatacube ;
        createdatacube:long_name = "createdatacube" ;
      double exposure ;
        exposure:long_name = "exposure" ;
        exposure:red_band_index = 140l ;
        exposure:green_band_index = 234l ;
        exposure:blue_band_index = 500l ;
      double frameperiod ;
        frameperiod:long_name = "frameperiod" ;
      double speed ;
        speed:long_name = "speed" ;
      double startpos ;
        startpos:long_name = "startpos" ;
      double stoppos ;
        stoppos:long_name = "stoppos" ;
      double useexternaltrigger ;
        useexternaltrigger:long_name = "useexternaltrigger" ;
      double userotatingmirror ;
        userotatingmirror:long_name = "userotatingmirror" ;
    // group attributes:
      :exposure = "45" ;
      :startpos = "-70" ;
      :frameperiod = "50" ;
      :userotatingmirror = "0" ;
      :speed = "100" ;
      :useexternaltrigger = "0" ;
      :constmirrorpos = "0" ;
      :createdatacube = "0" ;
      :stoppos = "70" ;
  } // group /sensor_variable_metadata
  group: user_given_metadata {
    // group attributes:
      :first_wheat_test_by_Markus_Radermacher = "" ;
      :experiment_info_1 = "..." ;
      :and_so_on_and_so_on... = "..." ;
  } // group /user_given_metadata
} // group /

@dlebauer
Copy link
Member

Some of this content is a direct dump from the metadata file provided with the raw data, and that does not need to be duplicated. I think the key new parts are dimensions, variables, and global attributes from ncks -m -M

@ghost ghost added 2 - Working <= 5 and removed 1 - Ready labels Sep 22, 2016
@ghost ghost added this to the September 2016 milestone Sep 22, 2016
@ghost ghost changed the title hdf/netcdf extractor for clowder hdf/netcdf (hyperspectral) extractor for clowder Sep 27, 2016
@yanliu-chn
Copy link

I have deployed nco-4.6.2-beta01 on ROGER. The test works!!! Thank you, @czender !

Here is how to use it. Please see if the json output looks good.

module purge
module load gdal-stack-2.7.10 nco # i changed default nco version to 4.6.2-beta01
ncks --jsn -m -M 0596c17f-2e4c-4d43-9d77-cde8ffbde663.nc
{
  "dimensions": {
    "wavelength": 955,
    "wvl_nvr": 1024,
    "x": 1600,
    "y": 468
    },
    "flx_dwn": {
      "dims": ["wavelength"],
      "type": "float",
      "long_name": "Downwelling spectral irradiance",
      "standard_name": "surface_downwelling_radiative_flux_per_unit_wavelength_in_air",
      "units": "watt meter-2 meter-1"
    },
    "flx_sns": {
      "dims": ["wvl_nvr"],
      "type": "float",
      "long_name": "Flux sensitivity of each band (irradiance per count)",
      "provenance": "EnvironmentalLogger calibration information from file S05673_08062015.IrradCal provided by TinoDornbusch and discussed here: https://github.com/terraref/reference-data/issues/30#issuecomment-217518434",
      "units": "joule count-1"
    },
    "frametime": {
      "dims": ["y"],
      "type": "double",
      "units": "days since 1970-01-01 00:00:00",
      "calender": "gregorian"
    },
    "rfl_img": {
      "dims": ["wavelength","y","x"],
      "type": "float",
      "long_name": "Reflectance of image",
      "meaning": "Counts on scale from 0 to 2^16-1 = 65535",
      "standard_name": "surface_albedo",
      "units": "1"
    },
    "rfl_wht": {
      "dims": ["wavelength"],
      "type": "float",
      "long_name": "Reflectance of white reference",
      "units": "1"
    },
    "wavelength": {
      "dims": ["wavelength"],
      "type": "double",
      "long_name": "Hyperspectral Wavelength",
      "units": "meter",
      "standard_name": "radiation_wavelength"
    },
    "wvl_dlt": {
      "dims": ["wvl_nvr"],
      "type": "float",
      "long_name": "Bandwidth of environmental sensor",
      "notes": "Bandwidth, also called dispersion, is between 0.455-0.495 nm across all channels. Values computed as differences between midpoints of adjacent band-centers.",
      "standard_name": "bandwidth",
      "units": "meter"
    },
    "wvl_nvr": {
      "dims": ["wvl_nvr"],
      "type": "float",
      "long_name": "Wavelength of environmental sensor",
      "provenance": "EnvironmentalLogger calibration information from file S05673_08062015.IrradCal provided by TinoDornbusch and discussed here: https://github.com/terraref/reference-data/issues/30#issuecomment-217518434",
      "standard_name": "sensor_band_central_radiation_wavelength",
      "units": "meter"
    },
    "x": {
      "dims": ["x"],
      "type": "double",
      "algorithm": "CSZ implemented these fake data to be replaced by real formula once available.",
      "long_name": "North-south offset from start position",
      "units": "meter"
    },
    "xps_drk": {
      "dims": ["wavelength","x"],
      "type": "short",
      "long_name": "Exposure from dark reference sheet/panel",
      "units": "Counts on scale from 0 to 2^16-1 = 65535"
    },
    "xps_img": {
      "dims": ["wavelength","y","x"],
      "type": "short",
      "long_name": "Exposure counts",
      "meaning": "Counts on scale from 0 to 2^16-1 = 65535",
      "units": "1"
    },
    "xps_wht": {
      "dims": ["wavelength","x"],
      "type": "short",
      "long_name": "Exposure from white reference sheet/panel",
      "units": "Counts on scale from 0 to 2^16-1 = 65535"
    },
    "y": {
      "dims": ["y"],
      "type": "double",
      "algorithm": "Based on https://github.com/terraref/computing-pipeline/issues/144. y is defined as 0.9853 mm per pixel. Exact number is 0.98526434004512529576754637665 mm.",
      "long_name": "East-west offset from start position",
      "units": "meter"
    },
    "attrs": {
      "title": "None given (supply with --trr ttl=\"Title\")",
      "created_by": "yanliu",
      "Conventions": "CF-1.5",
      "Project": "TERRAREF",
      "terraref_script": "terraref.sh",
      "terraref_hostname": "cg-gpu01",
      "terraref_version": "4.6.0",
      "history": "Thu Sep  1 11:09:33 2016: ncap2 -A -S /gpfs/largeblockFS/projects/arpae/sw/computing-pipeline/scripts/hyperspectral/terraref.nco /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid44592.fl00.tmp /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid44592.fl00.tmp\nThu Sep  1 11:09:31 2016: ncks -A /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid44592.fl00.tmp /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid44592.fl00.tmp\nThu Sep 01 11:09:30 2016: python input/0596c17f-2e4c-4d43-9d77-cde8ffbde663_raw /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid44592.fl00.tmp",
      "NCO": "\"4.6.0\"",
      "history_of_appended_files": "Thu Sep  1 11:09:33 2016: Appended file /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid44592.fl00.tmp had following \"history\" attribute:\nThu Sep  1 11:09:31 2016: ncks -A /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid44592.fl00.tmp /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_att.nc.pid44592.fl00.tmp\nThu Sep 01 11:09:30 2016: python input/0596c17f-2e4c-4d43-9d77-cde8ffbde663_raw /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid44592.fl00.tmp\nThu Sep  1 11:09:31 2016: Appended file /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid44592.fl00.tmp had following \"history\" attribute:\nThu Sep 01 11:09:30 2016: python input/0596c17f-2e4c-4d43-9d77-cde8ffbde663_raw /gpfs_scratch/arpae/imaging_spectrometer/terraref_tmp_jsn.nc.pid44592.fl00.tmp\n",
      "nco_openmp_thread_number": 1
    },
    "groups": {
    "gantry_system_fixed_metadata": {
      "attrs": {
        "gantry_fixed_data_2": "Todo",
        "gantry_fixed_data_1": "Todo",
        "System_manufacturer": "LemnaTec Corp."
      }
      },
    "gantry_system_variable_metadata": {
      "u": {
        "type": "double",
        "long_name": "Gantry_Speed_in_X_Direction",
        "units": "meter second-1"
      },
      "v": {
        "type": "double",
        "long_name": "Gantry_Speed_in_Y_Direction",
        "units": "meter second-1"
      },
      "w": {
        "type": "double",
        "long_name": "Gantry_Speed_in_Z_Direction",
        "units": "meter second-1"
      },
      "x": {
        "type": "double",
        "long_name": "Position_in_X_Direction",
        "units": "meter"
      },
      "y": {
        "type": "double",
        "long_name": "Position_in_Y_Direction",
        "units": "meter"
      },
      "z": {
        "type": "double",
        "long_name": "Position_in_Z_Direction",
        "units": "meter"
      },
      "attrs": {
        "Camnera_box_light_4_is_on": "True",
        "Position_in_]_Direction": "0.97",
        "Camnera_box_light_2_is_on": "True",
        "Camnera_box_light_1_is_on": "True",
        "Gantry_Speed_in_]_Direction": "0",
        "Time": "04/07/2016 16:15:45",
        "Camnera_box_light_3_is_on": "True"
      }
      },
    "header_info": {
      "blue_band_index": {
        "type": "double"
      },
      "green_band_index": {
        "type": "double"
      },
      "red_band_index": {
        "type": "double"
      },
      "attrs": {
        "HSIII_VERSION": "E51215 vs64",
        "POST_AOI_left": "0",
        "Col_binning": "1",
        "AOI_width": "1600",
        "Row_binning": "1",
        "FrameIndex": "frameIndex.txt",
        "AOI_height": "960",
        "header_offset": "0",
        "Lens_EFL": "17",
        "Serial_Number": "SN-G4-384",
        "samples": "1600",
        "byte_order": "0",
        "Lens_folder": "",
        "description": "{[HEADWALL Hyperspec III]}",
        "default_bands": "{140,234,500}",
        "bands": "955",
        "POST_Row_binning": "1",
        "POST_AOI_width": "1600",
        "file_type": "ENVI Standard",
        "Nuc_folder": "",
        "data_type": "12",
        "AverageDispersion": "0.63986398",
        "POST_Col_binning": "1",
        "Array_Pixel_Pitch": "6.5",
        "sensor_type": "Unknown",
        "POST_AOI_height": "955",
        "lines": "468",
        "interleave": "bil",
        "AOI_top": "600",
        "Pixel0": "3.100546185",
        "AOI_left": "480",
        "POST_AOI_top": "5"
      }
      },
    "sensor_fixed_metadata": {
      "attrs": {
        "sensor_serial_number": "Todo",
        "sensor_purpose": "Todo",
        "sensor_product_name": "VNIR",
        "sensor_description": "Todo",
        "sensor_manufacturer": "Headwall Scientific"
      }
      },
    "sensor_variable_metadata": {
      "constmirrorpos": {
        "type": "double",
        "long_name": "constmirrorpos"
      },
      "createdatacube": {
        "type": "double",
        "long_name": "createdatacube"
      },
      "exposure": {
        "type": "double",
        "long_name": "exposure",
        "red_band_index": 140,
        "green_band_index": 234,
        "blue_band_index": 500
      },
      "frameperiod": {
        "type": "double",
        "long_name": "frameperiod"
      },
      "speed": {
        "type": "double",
        "long_name": "speed"
      },
      "startpos": {
        "type": "double",
        "long_name": "startpos"
      },
      "stoppos": {
        "type": "double",
        "long_name": "stoppos"
      },
      "useexternaltrigger": {
        "type": "double",
        "long_name": "useexternaltrigger"
      },
      "userotatingmirror": {
        "type": "double",
        "long_name": "userotatingmirror"
      },
      "attrs": {
        "exposure": "45",
        "startpos": "-70",
        "frameperiod": "50",
        "userotatingmirror": "0",
        "speed": "100",
        "useexternaltrigger": "0",
        "constmirrorpos": "0",
        "createdatacube": "0",
        "stoppos": "70"
      }
      },
    "user_given_metadata": {
      "attrs": {
        "first_wheat_test_by_Markus_Radermacher": "",
        "experiment_info_1": "...",
        "and_so_on_and_so_on...": "..."
      }
      }
    }

}

@yanliu-chn
Copy link

@max-zilla To make change to dockerfile, here is the change you can refer to to replace the current nco build part:

RUN cd /srv/downloads && \
    wget -q https://github.com/nco/nco/archive/4.6.2-beta01.tar.gz -O nco-4.6.2-beta01.tar.gz && \
    tar xfz nco-4.6.2-beta01.tar.gz && \
    cd nco-4.6.2-beta01 && \
    ./configure NETCDF_ROOT=/srv/sw/netcdf-4.4.1 --prefix=/srv/sw/nco-4.6.2-beta01 --enable-ncap2 --enable-udunits2 && \
    make && make install
ENV PATH="/srv/sw/nco-4.6.2-beta01/bin:${PATH}"
ENV LD_LIBRARY_PATH="/srv/sw/nco-4.6.2-beta01/lib:${LD_LIBRARY_PATH}"

@robkooper
Copy link
Member Author

maybe make a NCO_VERSION variable so we can easily update it.

@yanliu-chn
Copy link

good idea.

@ghost
Copy link

ghost commented Nov 14, 2016

can this issue be closed?

@ghost
Copy link

ghost commented Nov 30, 2016

@max-zilla and @jterstriep - can this issue be closed?

@max-zilla
Copy link
Contributor

@rachelshekar not yet. i'll try to get this updated this week - JD and I are dealing with some pipeline things today.

@ghost ghost modified the milestones: September 2016, December 2016 Nov 30, 2016
@max-zilla
Copy link
Contributor

Just updated the NCO version in dockerfile and added a 3rd output, JSON format, to netCDF extractor. Closing this issue,

@ghost ghost added cyberGIS and removed 1 - Ready labels Jan 3, 2017
@dlebauer
Copy link
Member

Is the extractor inserting these metadata into clowder?

In this sample dataset https://terraref.ncsa.illinois.edu/clowder/datasets?space=57e42cd44f0cff4b58dd3eea there is only gantry metadata, but the information from the netcdf header is not available.

@max-zilla
Copy link
Contributor

The hyperspectral netCDFs are crashing the VM when it tries to download them, since the VM only has 8GB memory.

This won't happen when Roger filesystem is mounted but i need to make sure with @jdmaloney that we are ready to do that again without filesystem errors.

@ghost ghost assigned max-zilla and unassigned Zodiase and yanliu-chn Jan 19, 2017
@max-zilla
Copy link
Contributor

This is redeployed and running now.

@ghost ghost removed the help wanted label Apr 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants