Additional Data Formats? #156

ax3l · 2023-05-30T23:22:36Z

Thank you for the JOSS submission in openjournals/joss-reviews#5375 .

I really like the support of the IAEA data loaders.

Based on the extended abstract and linked motivating discussion in it, I was wondering:
I am personally curious if, for phase space data, the openPMD standard [1] [2] (disclaimer: I lead this effort) could be helpful as an additional input loader source? We have by now a relatively large selection of accelerator codes supporting openPMD as their output and also try to use it more in experimental laser-plasma accelerator work.

The paper summarizes so far:

[...] extensible library enabling import/analysis/export of PhaseSpace data of arbitrary format.

If one were to implement another loader, how much work would be needed?
I am looking at
https://bwheelz36.github.io/ParticlePhaseSpace/new_data_loader.html

and am further curious about data sizes: #158

Update: I found https://bwheelz36.github.io/ParticlePhaseSpace/code_docs.html#ParticlePhaseSpace.DataLoaders.Load_PandasData which might be pretty easy to couple to openPMD with https://github.com/openPMD/openPMD-api/blob/0.15.1/examples/11_particle_dataframe.py (example data sets here). (Our Pandas reader supports chunked processing - let's continue discussion on lasy loading/streaming/out-of-core processing in #158)

Note that the linked reference Kuschel, S. (2022). Postpic. https://github.com/skuschel/postpic implemented openPMD early on.
Minor correction: I think it should read (2014) as of the first release for this reference.

[1] https://github.com/openPMD
[2] https://www.openPMD.org

The text was updated successfully, but these errors were encountered:

bwheelz36 · 2023-05-31T00:24:17Z

Hey @ax3l - I have to admit that I was embarrassingly not actually aware of openPMD! It looks great.

It is fairly minimal amount of work to add new Loaders/Exporters (depending of course on how complex the data source is). I would be happy to take a look at loading openPMD data. I don't suppose you already have some files handy I could test on?
Also, I notice that openPMD supports multiple data formats. It might be quite some work to write a DataLoader that handled several formats, but as a proof of principle would it be acceptable to just demonstrate on one format?

ax3l · 2023-05-31T00:31:35Z

Hi @bwheelz36 , sorry for the edit in my original message.

I added a few example files and a probably four liner to load data via an edit :)

import openpmd_api as io

s = io.Series("../samples/git-sample/data%T.h5", io.Access.read_only)
electrons = s.iterations[400].particles["electrons"]  # 400 or another "step" in the data series

df = electrons.to_df()  # careful: all SI at this point

After finishing the docs, I would also be excited to attempt an exporter 🤩

(Please do not feel that my implementation questions as required for the JOSS review to pass. I am just truly curious and the other comments in between for the manuscript are more important to add please :) )

bwheelz36 · 2023-06-01T23:44:01Z

Hi @ax3l

That's all good - given there is a defined open dataset format, it absolutely makes sense that this package should support it.

Having said that - I'm a bit confused tbh. I'm trying to run the first read example from the openpmd-api site with the following code:

import openpmd_api as io
series = io.Series( "data%T.h5", io.Access.read_only)

I pointed this code to each of the three examples example-2d', example-3d', example-thetaMode - (it is actually not that clear from the example that this is what you are supposed to do?). In each case the data loads, but there is no information in the 'iterations' attribute?

ax3l · 2023-06-05T15:59:34Z

Hi @bwheelz36,

Thanks for trying the example datasets!
The iterations concept is explained here:
https://openpmd-api.readthedocs.io/en/latest/usage/concepts.html

there is no information in the 'iterations' attribute?

Please let me know if you have more questions on this in case I missed the point of the question :)

Once you open a data Series, you can loop over available iterations in it, read the data in each iteration, etc

fields: https://openpmd-api.readthedocs.io/en/latest/usage/firstread.html or (for our case here)
particles: https://github.com/openPMD/openPMD-api/blob/0.15.1/examples/2_read_serial.cpp#L48-L67
- or as data frame (preferred here): https://github.com/openPMD/openPMD-api/blob/0.15.1/examples/11_particle_dataframe.py#L30-L34

bwheelz36 · 2023-06-06T03:44:58Z

Hi @ax3l

Ok, here's an end to end example of what I tried. Maybe I'm doing something extremely stupid...

in a terminal:

# inside a fresh virtual environment
git clone https://github.com/openPMD/openPMD-example-datasets.git
cd openPMD-example-datasets
tar -zxvf example-2d.tar.gz
tar -zxvf example-3d.tar.gz
tar -zxvf example-thetaMode.tar.gz

pip install openpmd-api
python  # enter python session

inside python:

import openpmd_api as io

data_loc = "example-2d/hdf5/data%T.h5"
s = io.Series(data_loc, io.Access.read_only)

Here's the explorer view of s; it appears to simply have nothing in it?

ax3l · 2023-06-25T21:15:31Z

Oh that is wild, thanks for reporting!
We check against most of those files in CI, but maybe something slipped in that we did not cover :-o

I will double check this after my conferences and summer break.

franzpoeschel · 2023-07-07T08:34:19Z

For this, see my comment here:

The string representations of many classes are counterintuitive and have led to confusion, e.g. series.iterations printed will look as if it is empty

I guess that this issue is proved again..
The data is there, it just does not look like it:

>>> import openpmd_api as io
>>> s = io.Series("data%T.h5", io.Access.read_only)
>>> s.iterations
<openPMD.Attributable with '0' attributes>
>>> [index for index in s.iterations]
[255, 260, 265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360, 365, 370, 375, 380, 385, 390, 395, 400]

franzpoeschel · 2023-07-14T09:44:40Z

Fixed in openPMD/openPMD-api#1476

ax3l · 2023-07-17T22:42:56Z

Thank you for updating the representation strings, @franzpoeschel! This will be shipped with the next patch release, 0.15.2.

@bwheelz36 for your example above, all looks good and you can keep exploring what is inside the data series s like this:

for k_i, i in s.iterations.items():
    print("Iteration: {0}".format(k_i))

    for k_p, p in i.particles.items():
        print("  Particle species '{0}':".format(k_p))

inside the particle species p is then a record component that is a key-value pair of a string + record component, which can be accessed like a numpy array, e.g., u_x = p["momentum"]["x"][()] - note that s.flush() will fill the array u_x with actual data.

Even easier is the access as a data frame, as in the 11_particle_dataframe.py example:

for i in s.iterations:
    for p in i.particles:
        df = p.to_df()
        print(df)

ax3l · 2023-08-16T04:14:15Z

@bwheelz36 did this help? :)

bwheelz36 · 2023-08-16T06:14:31Z

Hi @ax3l - the first loop you posted above helps yes - it is clear there is some data there! in that example, doing p.to_df() gives a dataframe which would facilitate close to one-to-one read in to ParticlePhaseSpace.

the second loop crashes with AttributeError: 'int' object has no attribute 'particles'. I added a line if hasattr(i, 'particles'): however this was never entered...

Can I make sure I understand the intent behind iterations - each iteration would represent for instance a time interval?

franzpoeschel · 2023-08-16T08:21:37Z

the second loop crashes with AttributeError: 'int' object has no attribute 'particles'. I added a line if hasattr(i, 'particles'): however this was never entered...

I think that there is a slight bug in the second loop, try this one:

for it_index, it in s.iterations.items():
    for p in it.particles:
        df = p.to_df()
        print(df)

This was referenced May 30, 2023

Extended References on Toolkits & Standards #157

Closed

Data Sizes #158

Closed

bwheelz36 mentioned this issue Jul 5, 2023

[REVIEW]: ParticlePhaseSpace: A python package for streamlined import, analysis, and export of particle phase space data openjournals/joss-reviews#5375

Closed

franzpoeschel mentioned this issue Jul 14, 2023

Update __repr__ method of major objects in openPMD hierarchy openPMD/openPMD-api#1476

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional Data Formats? #156

Additional Data Formats? #156

ax3l commented May 30, 2023 •

edited

Loading

bwheelz36 commented May 31, 2023

ax3l commented May 31, 2023 •

edited

Loading

bwheelz36 commented Jun 1, 2023

ax3l commented Jun 5, 2023 •

edited

Loading

bwheelz36 commented Jun 6, 2023

ax3l commented Jun 25, 2023 •

edited

Loading

franzpoeschel commented Jul 7, 2023

franzpoeschel commented Jul 14, 2023

ax3l commented Jul 17, 2023 •

edited

Loading

ax3l commented Aug 16, 2023

bwheelz36 commented Aug 16, 2023

franzpoeschel commented Aug 16, 2023 •

edited

Loading

Additional Data Formats? #156

Additional Data Formats? #156

Comments

ax3l commented May 30, 2023 • edited Loading

bwheelz36 commented May 31, 2023

ax3l commented May 31, 2023 • edited Loading

bwheelz36 commented Jun 1, 2023

ax3l commented Jun 5, 2023 • edited Loading

bwheelz36 commented Jun 6, 2023

ax3l commented Jun 25, 2023 • edited Loading

franzpoeschel commented Jul 7, 2023

franzpoeschel commented Jul 14, 2023

ax3l commented Jul 17, 2023 • edited Loading

ax3l commented Aug 16, 2023

bwheelz36 commented Aug 16, 2023

franzpoeschel commented Aug 16, 2023 • edited Loading

ax3l commented May 30, 2023 •

edited

Loading

ax3l commented May 31, 2023 •

edited

Loading

ax3l commented Jun 5, 2023 •

edited

Loading

ax3l commented Jun 25, 2023 •

edited

Loading

ax3l commented Jul 17, 2023 •

edited

Loading

franzpoeschel commented Aug 16, 2023 •

edited

Loading