layout | title | author |
---|---|---|
page |
ORSO - file formats - discussions on specifications |
Jochen Stahn |
In the following you find a rather unsorted collection of feedback and ideas about the .ort
specs and the related orsopy
package.
last modified: 2023-05-05
For the optional columns of wavelength and angle of incidence, are lambda and alpha_i the expected/standardized names? Or should they be wavelength and incident_angle, as written in the header? Or is this discrepancy in the example required because column names must be different from values in the header?
(Jochen) See also below. This problem arises from the fact that we try to make it right in the header, and use the conventional terms in the column description.
The keywords in the header are taken form the physical quantity name, e.g. incidence_angle, while in the (optional) column description there are two possible (and recommended) entries:
name
andphysical_quantity
. Thename
is used to create the 1-line header right above the data array and thus a well-established symbol is the right choice there. Andphysical_quantity
is used to avoid all ambiguities.
If there is no current standard name (understanding that this is optional information), this should be made clear, with a statement/explanation of whether or not a standard name is expected in the future.
(Jochen) I agree that we should define a set of key words here and recommend their use. Suggestions can be found below.
A third important quantity for which there could be a standard name in the future is photon energy (for synchrotron x-ray experiments).
(Jochen) I agree.
The documentation states Value can be a list, but it cannot. (ComplexValue can be a list though). This discrepancy should be corrected either by modifying the documentation or the implementation, otherwise people could attempt to write files with data which cannot be handled by orsopy.
(Jochen) Wrong in specs. I'll correct this.
Specs are not clear about what should happen if there is a header entry and a column with the same name:
- is this invalid (see first point)
- do they have to be consistent or
- does the column overwrite any header information?
E.g., if a column is supplied, would it be required that, if that is also in the header,
- it is a range with matching min/max?
- That it is a value with matching average?
- Must it be left out of the header entirely?
- Should it be overwritten if a column is found,
- or should there be a pointer to a column, for example an optional keyword ‘column’, where the value or range could then be used by software which cannot support point-by-point calculation (the column data)?
My vote is for 3 implemented in this last way, for the purpose that the header can still contain some human-readable information useful for experimental reproducibility even if the contents are overwritten by a column.
(Jochen) Here we have to diferentiate between the data format rules and recommendations for software using this format.
The format allows for redundant and even for contradicting information. It is in the responsibility of the programmer to write out a physically consistend data file.
On the other hand we should give some recommendations like the ones mentioned above.
Personally I also prefer option 3, but without any further restrictions for the header.
We discussed the pointer from header to column entries in an early stage and it was dropped at some point. With a priorisation column over header this is clear to the software.
# measurement: # instrument_settings: # incident_angle: # min: <value> # max: <value> # details_at_column: alpha_i # unit: degalternatively to
details_at_column
one can use the already existingcomment
to create a human-readabla link.
item 1: What is the ORSO recommendation for using redundand information?
Ability to have an error defined for a quantity in the header, either implemented similar to how quantities are allowed to have a range, or similar to how columns are allowed to be an error of another column”
The following syntax is now implemented in orsopy:
# measurement: # instrument_settings: # incident_angle: # magnitude: 2.1 # unit: deg # error: # magnitude: 0.01 # error_type: resolution # distribution: gaussian # value_is: sigma
(Jochen)
We have a confusion of what we use as key words. Since the german terms are different I had probplems figuring out the correct English definitions...
What can be measured or calculated is a physical quantity.
E.g. the incident angle
This has a dimension = dim(physical quantity) relating it to a set of base quantities like length, time, charge, temperature etc. The dimension is no unit, nor can it be used to unambigiously describe a physical quantity (plane angle does not tell between scattering angle, incident angle, total reflection angle, ...).
dim( incident angle ) = plane angle
The physical quantity is often refered to by using a symbol.
one possible symbol for incident angle is
$\alpha_i$ (or alpha_i in the orso header)
The physical quantity is composed of a numerical magnitude times unit. Depending on the chosen unit, the numerical magnitude changes.
$\alpha_i = 2.3 \cdot \mathrm{deg}$
- For the column name we use the symbol (R, Qz, alpha_i, ...) rather than the physical quantity. But in the header above we use the latter as key words. Thus if the analysis software searches for example for information about the incident angle, it has to look in various places (this is intended) for different keys. A solution might be that the software searches for standardised
physical_quantity
entries in the column description which match the keys in the header.
- Where do we store e.g. the angles for stitched tof measurements? These are no longer used for processing, but may help future planning.
- x-ray data obtained with different attenuator settings.
In case this information is not provided in one of the optional columns or in the individual headers of multiple data sets, it can not be used by the analysis software. Good choices for this information might be extra entries e.g. in the
incident_angle
section:incident_angle: min: 1.0 max: 5.8 individual_magnitudes: [1.0, 2.7, 5.8] unit: deg
item 7: Do we introduce individual_magnitudes
as a new key within the class ValueRange
?
- hirarchy for looking up information (e.g. column beats header content)
- avoid contradicting information (e.g. single incident angle in the header for angle-disperse measurement)
item 8: Which of the keys discussed below should be included in the specs to (better) incorporate lab x-ray data files?
When attempting to convert the ASCII output files of various commercial lab x-ray reflectometers (diffractometers) it became obvious that the present dictionary misses several entries.
-
It is not exactely clear where to put the brand, model and probably configuration information.
experiment: title: ... instrument: type: x-ray lab source (neutron reflectometer, synchrotron diffractometer, ....) brand: Brucker model: Discovery hardware_indicator: 65519
-
The wavelength is often defiend via the anode material, the line(s) and probably the presence of a monochromator.
-
The scan modes might be
steps
orcontinous
. -
The slit sizes are reported to enable resolution calculation.
-
Often a long list of hardware settings is supplied, e.g. tube current, temperature, configuration, etc. These things do not really belong to a reduced data file, but we shoul at least recommend a place for these entries. In the example below I put it as a multy-line string in
instrument_settings.details
.
measurement:
instrument_settings:
incident_angle:
min: 0.1
max: 6.0
unit: deg
wavelength:
magnitude: 1.54184
unit: angstrom
anode: Cu
lines:
- name: K_alpha1
magnitude: 1.5405980
weight: 2/3
- name: K_alpha2
magnitude: 1.5444260
weight: 1/3
scan_type: continuous
details: |
"Configuration=Reflection-Transmission Spinner 3.0, Owner=user, Creation date=3/5/2021 8:12:09 AM"
"Goniometer=Theta/Theta; Minimum step size 2Theta:0.0001; Minimum step size Omega:0.0001"
"Sample stage=Reflection-transmission spinner 3.0; Minimum step size Phi:0.1"
-
Most present day files report the incident angle, the counting time and probably the attenuation factor as columns. We should define standard keys for the corresponding column descriptions.
- name: alpha_i unit: deg physical_quantity: incident_angle - name: alpha_f unit: deg physical_quantity: final_angle - name: two_theta unit: deg physical_quantity: scattering_angle - name: tme ? unit: s physical_quantity: counting_time - name: att ? physical_quantity: attenuation_factor
-
The
.ort
specs clearly separate data origin and data reduction. For lab reflectometers it often the same software for instrument control and reduction. -
Information about the facility, the owner and the sample is often missing.
suggestd by Artur, draft by Jochen
# columns:
...
# - flag_is:
# 0: electric field off
# 1: electric field on, positive
# 2: electric field on, negative
or
# columns:
...
# - flag_is:
# 0: ignored for fitting
# 1: used for fitting