+ + +
+
+ +
+
+ +
+ +
+ + +
+ +
+ + +
+
+ + + + + +
+ +
+

Tutorial for binning data from the SXP instrument at the European XFEL#

+
+

Preparation#

+
+

Import necessary libraries#

+
+
[1]:
+
+
+
%load_ext autoreload
+%autoreload 2
+from pathlib import Path
+import os
+import xarray as xr
+import numpy as np
+
+from sed import SedProcessor
+from sed.dataset import dataset
+
+%matplotlib widget
+import matplotlib.pyplot as plt
+
+
+
+
+
+

Get data paths#

+

The paths are such that if you are on Maxwell, it uses those. Otherwise data is downloaded in current directory from Zenodo.

+

Generally, if it is your beamtime, you can both read the raw data and write to processed directory. However, for the public data, you can not write to processed directory.

+
+
[2]:
+
+
+
beamtime_dir = "/gpfs/exfel/exp/SXP/202302/p004316/" # on Maxwell
+if os.path.exists(beamtime_dir) and os.access(beamtime_dir, os.R_OK):
+    path = beamtime_dir + "/raw/"
+    buffer_path = "Au_Mica/processed/"
+else:
+    # data_path can be defined and used to store the data in a specific location
+    dataset.get("Au_Mica") # Put in Path to a storage of at least 10 GByte free space.
+    path = dataset.dir
+    buffer_path = path + "/processed/"
+
+
+
+
+
+
+
+
+INFO - Not downloading Au_Mica data as it already exists at "/home/runner/work/sed/sed/docs/tutorial/datasets/Au_Mica".
+Set 'use_existing' to False if you want to download to a new location.
+INFO - Using existing data path for "Au_Mica": "/home/runner/work/sed/sed/docs/tutorial/datasets/Au_Mica"
+INFO - Au_Mica data is already present.
+
+
+
+
+

Config setup#

+

Here we get the path to the config file and setup the relevant directories. This can also be done directly in the config file.

+
+
[3]:
+
+
+
# pick the default configuration file for SXP@XFEL
+config_file = Path('../sed/config/sxp_example_config.yaml')
+assert config_file.exists()
+
+
+
+
+
[4]:
+
+
+
# here we setup a dictionary that will be used to override the path configuration
+config_override = {
+    "core": {
+        "paths": {
+            "data_raw_dir": path,
+            "data_parquet_dir": buffer_path,
+        },
+    },
+}
+
+
+
+
+
+

cleanup previous config files#

+

In this notebook, we will show how calibration parameters can be generated. Therefore we want to clean the local directory of previously generated files.

+

WARNING running the cell below will delete the “sed_config.yaml” file in the local directory. If these contain precious calibration parameters, DO NOT RUN THIS CELL.

+
+
[5]:
+
+
+
local_folder_config = Path('./sed_config.yaml')
+if local_folder_config.exists():
+    os.remove(local_folder_config)
+    print(f'deleted local config file {local_folder_config}')
+assert not local_folder_config.exists()
+
+
+
+
+
+
+
+
+deleted local config file sed_config.yaml
+
+
+
+
+
+

Load Au/Mica data#

+

Now we load a couple of scans from Au 4f core levels. Data will be processed to parquet format first, if not existing yet, and then loaded into the processor.

+
+
[6]:
+
+
+
sp = SedProcessor(
+    runs=["0058", "0059", "0060", "0061"],
+    config=config_override,
+    system_config=config_file,
+    collect_metadata=False,
+    verbose=True,
+)
+
+
+
+
+
+
+
+
+System config loaded from: [/home/runner/work/sed/sed/docs/sed/config/sxp_example_config.yaml]
+Default config loaded from: [/home/runner/work/sed/sed/sed/config/default.yaml]
+Reading files: 0 new files of 65 total.
+All files converted successfully!
+Filling nan values...
+loading complete in  0.13 s
+
+
+
+
+

Inspect the dataframe#

+

We first try to get an overview of the structure of the data. For this, we look at the loaded dataframe:

+
+
[7]:
+
+
+
sp.dataframe.head()
+
+
+
+
+
[7]:
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
trainIdpulseIdelectronIdtimeStampdldPosXdldPosYdldTimeStepsdelayStage
018621967352001.700983e+09174530246066-125.495093
118621967352501.700983e+0952417084830-125.495093
218621967353701.700983e+09338534039148-125.495093
318621967354101.700983e+09209923096542-125.495093
418621967354701.700983e+09171385018838-125.495093
+
+
+
+

Train IDs in scans#

+

Next, let’s look at the trainIDs contained in these runs

+
+
[8]:
+
+
+
plt.figure()
+ids=sp.dataframe.trainId.compute().values
+plt.plot(ids)
+plt.show()
+
+
+
+
+
+
+
+
+
+
+
+

Channel Histograms#

+

Let’s look at the single histograms of the main data channels

+
+
[9]:
+
+
+
sp.view_event_histogram(dfpid=3)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

PulseIds, ElectronIds#

+

To get a better understanding of the structure of the data, lets look at the histograms of microbunches and electrons. We see that we have more hits at later microbunches, and only few multi-hits.

+
+
[10]:
+
+
+
axes = ["pulseId", "electronId"]
+bins = [101, 11]
+ranges = [(-0.5, 800.5), (-0.5, 10.5)]
+sp.view_event_histogram(dfpid=1, axes=axes, bins=bins, ranges=ranges)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

We can also inspect the counts per train as function of the trainId and the pulseId, which gives us a good idea about the evolution of the count rate over the run(s)

+
+
[11]:
+
+
+
plt.figure()
+axes = ["trainId", "pulseId"]
+bins = [100, 100]
+ranges = [(ids.min()+1, ids.max()), (0, 800)]
+res = sp.compute(bins=bins, axes=axes, ranges=ranges)
+res.plot()
+
+
+
+
+
+
+
+
+
+
+
[11]:
+
+
+
+
+<matplotlib.collections.QuadMesh at 0x7f4a01caddf0>
+
+
+
+
+
+
+
+
+
+
+
+

Spectrum vs. MicrobunchId#

+

Let’s check the TOF spectrum as function of microbunch ID, to understand if the increasing hit probability has any influence on the spectrum.

+
+
[12]:
+
+
+
axes = ["dldTimeSteps", "pulseId"]
+bins = [200, 800]
+ranges = [(8000, 28000), (0, 800)]
+res = sp.compute(bins=bins, axes=axes, ranges=ranges)
+plt.figure()
+res.plot(robust=True)
+
+
+
+
+
+
+
+
+
+
+
[12]:
+
+
+
+
+<matplotlib.collections.QuadMesh at 0x7f4a01c4ab80>
+
+
+
+
+
+
+
+
+

We see that the background below the Au 4f core levels slightly changes with microbunch ID. The origin of this is not quite clear yet.

+
+
[13]:
+
+
+
plt.figure()
+(res.loc[{"pulseId":slice(0,50)}].sum(axis=1)/res.loc[{"pulseId":slice(0,50)}].sum(axis=1).mean()).plot()
+(res.loc[{"pulseId":slice(700,750)}].sum(axis=1)/res.loc[{"pulseId":slice(700,750)}].sum(axis=1).mean()).plot()
+plt.legend(("mbID=0-50", "mbID=700-750"))
+
+
+
+
+
[13]:
+
+
+
+
+<matplotlib.legend.Legend at 0x7f4a01ae6be0>
+
+
+
+
+
+
+
+
+
+
+

Energy Calibration#

+

We now load a bias series, where the sample bias was varied, effectively shifting the energy spectra. This allows us to calibrate the conversion between the digital values of the dld and the energy.

+
+

time-of-flight spectrum#

+

to compare with what we see on the measurement computer, we might want to plot the time-of-flight spectrum. This is done here.

+
+
[14]:
+
+
+
sp.append_tof_ns_axis()
+
+
+
+
+
+
+
+
+Adding time-of-flight column in nanoseconds to dataframe:
+Dask DataFrame Structure:
+               trainId pulseId electronId timeStamp dldPosX dldPosY dldTimeSteps delayStage  dldTime
+npartitions=65
+                uint64   int64      int64   float64  uint16  uint16       uint64    float64  float64
+                   ...     ...        ...       ...     ...     ...          ...        ...      ...
+...                ...     ...        ...       ...     ...     ...          ...        ...      ...
+                   ...     ...        ...       ...     ...     ...          ...        ...      ...
+                   ...     ...        ...       ...     ...     ...          ...        ...      ...
+Dask Name: assign, 12 graph layers
+
+
+

Now, to determine proper binning ranges, let’s have again a look at the event histograms:

+
+
[15]:
+
+
+
axes = ["dldTime"]
+bins = [150]
+ranges = [(-0.5, 150.5)]
+sp.view_event_histogram(dfpid=1, axes=axes, bins=bins, ranges=ranges)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

Load energy calibration files#

+

We now load a range of runs sequentially, that were recorded with different sample bias values, and load them afterwards into an xarray

+
+
[16]:
+
+
+
runs = ["0074", "0073", "0072", "0071", "0070", "0064", "0065", "0066", "0067", "0068", "0069"]
+biases = np.arange(962, 951, -1)
+data = []
+for run in runs:
+    sp.load(runs=[run])
+    axes = ["dldTimeSteps"]
+    bins = [2000]
+    ranges = [(1000, 25000)]
+    res = sp.compute(bins=bins, axes=axes, ranges=ranges)
+    data.append(res)
+
+biasSeries = xr.concat(data, dim=xr.DataArray(biases, dims="sampleBias", name="sampleBias"))
+
+
+
+
+
+
+
+
+Reading files: 0 new files of 4 total.
+All files converted successfully!
+Filling nan values...
+loading complete in  0.02 s
+
+
+
+
+
+
+
+
+
+
+
+
+
+Reading files: 0 new files of 4 total.
+All files converted successfully!
+Filling nan values...
+loading complete in  0.02 s
+
+
+
+
+
+
+
+
+
+
+
+
+
+Reading files: 0 new files of 4 total.
+All files converted successfully!
+Filling nan values...
+loading complete in  0.02 s
+
+
+
+
+
+
+
+
+
+
+
+
+
+Reading files: 0 new files of 4 total.
+All files converted successfully!
+Filling nan values...
+loading complete in  0.02 s
+
+
+
+
+
+
+
+
+
+
+
+
+
+Reading files: 0 new files of 4 total.
+All files converted successfully!
+Filling nan values...
+loading complete in  0.02 s
+
+
+
+
+
+
+
+
+
+
+
+
+
+Reading files: 0 new files of 11 total.
+All files converted successfully!
+Filling nan values...
+loading complete in  0.04 s
+
+
+
+
+
+
+
+
+
+
+
+
+
+Reading files: 0 new files of 11 total.
+All files converted successfully!
+Filling nan values...
+loading complete in  0.04 s
+
+
+
+
+
+
+
+
+
+
+
+
+
+Reading files: 0 new files of 4 total.
+All files converted successfully!
+Filling nan values...
+loading complete in  0.03 s
+
+
+
+
+
+
+
+
+
+
+
+
+
+Reading files: 0 new files of 4 total.
+All files converted successfully!
+Filling nan values...
+loading complete in  0.02 s
+
+
+
+
+
+
+
+
+
+
+
+
+
+Reading files: 0 new files of 4 total.
+All files converted successfully!
+Filling nan values...
+loading complete in  0.02 s
+
+
+
+
+
+
+
+
+
+
+
+
+
+Reading files: 0 new files of 4 total.
+All files converted successfully!
+Filling nan values...
+loading complete in  0.02 s
+
+
+
+
+
+
+
+
+
+
+

Load bias series#

+

Now we load the bias series xarray into the processor for calibration

+
+
[17]:
+
+
+
sp.load_bias_series(binned_data=biasSeries)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

find calibration parameters#

+

We now will fit the tof-energy relation. This is done by finding the maxima of a peak in the tof spectrum, and then fitting the square root relation to obtain the calibration parameters.

+
+
[18]:
+
+
+
ranges=(6380, 6700)
+ref_id=6
+sp.find_bias_peaks(ranges=ranges, ref_id=ref_id, apply=True)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
[19]:
+
+
+
sp.calibrate_energy_axis(
+    ref_id=5,
+    ref_energy=-33,
+    method="lmfit",
+    energy_scale='kinetic',
+    d={'value':1.1,'min': .2, 'max':5.0, 'vary':False},
+    t0={'value':-1E-8, 'min': -1E-6, 'max': 1e-4, 'vary':True},
+    E0={'value': 0., 'min': -1500, 'max': 1500, 'vary': True},
+)
+
+
+
+
+
+
+
+
+[[Fit Statistics]]
+    # fitting method   = leastsq
+    # function evals   = 25
+    # data points      = 11
+    # variables        = 2
+    chi-square         = 0.02957200
+    reduced chi-square = 0.00328578
+    Akaike info crit   = -61.1070499
+    Bayesian info crit = -60.3112593
+[[Variables]]
+    d:   1.1 (fixed)
+    t0: -9.9902e-08 +/- 2.6372e-10 (0.26%) (init = -1e-08)
+    E0: -1120.58964 +/- 0.59620132 (0.05%) (init = 0)
+[[Correlations]] (unreported correlations are < 0.100)
+    C(t0, E0) = -0.9996
+Quality of Calibration:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+E/TOF relationship:
+
+
+
+
+
+
+
+
+
+
+

Save calibration#

+

Now we save the calibration parameters into a local configuration file, that will be loaded in the next step

+
+
[20]:
+
+
+
sp.save_energy_calibration()
+
+
+
+
+
+
+
+
+Saved energy calibration parameters to "sed_config.yaml".
+
+
+
+
+
+

Bin data with energy axis#

+

Now that we have the calibration parameters, we can generate the energy axis for our dataset. We need to load it again, and apply the calibration

+
+
[21]:
+
+
+
sp.load(runs=np.arange(58, 62))
+sp.add_jitter()
+sp.filter_column("pulseId", max_value=756)
+sp.append_energy_axis()
+
+
+
+
+
+
+
+
+Reading files: 0 new files of 65 total.
+All files converted successfully!
+Filling nan values...
+loading complete in  0.06 s
+Adding energy column to dataframe:
+Using energy calibration parameters generated on 11/11/2024, 19:21:18
+Dask DataFrame Structure:
+               trainId pulseId electronId timeStamp  dldPosX  dldPosY dldTimeSteps delayStage   energy
+npartitions=65
+                uint64   int64      int64   float64  float64  float64      float64    float64  float64
+                   ...     ...        ...       ...      ...      ...          ...        ...      ...
+...                ...     ...        ...       ...      ...      ...          ...        ...      ...
+                   ...     ...        ...       ...      ...      ...          ...        ...      ...
+                   ...     ...        ...       ...      ...      ...          ...        ...      ...
+Dask Name: assign, 22 graph layers
+
+
+

Now, we can bin as function fo energy and delay stage position

+
+
[22]:
+
+
+
axes = ['energy', "delayStage"]
+bins = [200, 100]
+ranges = [[-37,-31], [-135, -115]]
+res = sp.compute(bins=bins, axes=axes, ranges=ranges, normalize_to_acquisition_time="delayStage")
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Calculate normalization histogram for axis 'delayStage'...
+
+
+
+
[23]:
+
+
+
fig, axs = plt.subplots(1, 1, figsize=(4, 3), constrained_layout=True)
+res.plot(ax=axs)
+
+
+
+
+
[23]:
+
+
+
+
+<matplotlib.collections.QuadMesh at 0x7f4a00062d60>
+
+
+
+
+
+
+
+
+
+
+

Correct delay stage offset.#

+

We can also offset the zero delay of the delay stage

+
+
[24]:
+
+
+
sp.add_delay_offset(constant=126.9)
+
+
+
+
+
+
+
+
+Adding delay offset to dataframe:
+Delay offset parameters:
+   Constant: 126.9
+Dask DataFrame Structure:
+               trainId pulseId electronId timeStamp  dldPosX  dldPosY dldTimeSteps delayStage   energy
+npartitions=65
+                uint64   int64      int64   float64  float64  float64      float64    float64  float64
+                   ...     ...        ...       ...      ...      ...          ...        ...      ...
+...                ...     ...        ...       ...      ...      ...          ...        ...      ...
+                   ...     ...        ...       ...      ...      ...          ...        ...      ...
+                   ...     ...        ...       ...      ...      ...          ...        ...      ...
+Dask Name: assign, 24 graph layers
+
+
+
+
[25]:
+
+
+
axes = ['energy', "delayStage"]
+bins = [200, 100]
+ranges = [[-37,-31], [-8, 8]]
+res = sp.compute(bins=bins, axes=axes, ranges=ranges, normalize_to_acquisition_time="delayStage")
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Calculate normalization histogram for axis 'delayStage'...
+
+
+
+
[26]:
+
+
+
res_sub = res - res.loc[{"delayStage": slice(-8, -1)}].mean(axis=1)
+fig, axs = plt.subplots(3, 1, figsize=(4, 8), constrained_layout=True)
+res.plot(ax=axs[0])
+res_sub.plot(ax=axs[1])
+res_sub.loc[{"energy":slice(-32.5,-32)}].sum(axis=0).plot(ax=axs[2])
+
+
+
+
+
[26]:
+
+
+
+
+[<matplotlib.lines.Line2D at 0x7f49efe7bc70>]
+
+
+
+
+
+
+
+
+
+
[ ]:
+
+
+

+
+
+
+
+
+ + +
+ + + + + + + +
+ + + + + + + +
+
+ +
+ +