-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional Data Formats? #156
Comments
Hey @ax3l - I have to admit that I was embarrassingly not actually aware of openPMD! It looks great. It is fairly minimal amount of work to add new Loaders/Exporters (depending of course on how complex the data source is). I would be happy to take a look at loading openPMD data. I don't suppose you already have some files handy I could test on? |
Hi @bwheelz36 , sorry for the edit in my original message. I added a few example files and a probably four liner to load data via an edit :) import openpmd_api as io
s = io.Series("../samples/git-sample/data%T.h5", io.Access.read_only)
electrons = s.iterations[400].particles["electrons"] # 400 or another "step" in the data series
df = electrons.to_df() # careful: all SI at this point After finishing the docs, I would also be excited to attempt an exporter 🤩 (Please do not feel that my implementation questions as required for the JOSS review to pass. I am just truly curious and the other comments in between for the manuscript are more important to add please :) ) |
Hi @ax3l That's all good - given there is a defined open dataset format, it absolutely makes sense that this package should support it. Having said that - I'm a bit confused tbh. I'm trying to run the first read example from the openpmd-api site with the following code: import openpmd_api as io
series = io.Series( "data%T.h5", io.Access.read_only) I pointed this code to each of the three examples |
Hi @bwheelz36, Thanks for trying the example datasets!
Please let me know if you have more questions on this in case I missed the point of the question :) Once you open a data
|
Hi @ax3l Ok, here's an end to end example of what I tried. Maybe I'm doing something extremely stupid... in a terminal: # inside a fresh virtual environment
git clone https://github.com/openPMD/openPMD-example-datasets.git
cd openPMD-example-datasets
tar -zxvf example-2d.tar.gz
tar -zxvf example-3d.tar.gz
tar -zxvf example-thetaMode.tar.gz
pip install openpmd-api
python # enter python session inside python: import openpmd_api as io
data_loc = "example-2d/hdf5/data%T.h5"
s = io.Series(data_loc, io.Access.read_only) Here's the explorer view of s; it appears to simply have nothing in it? |
Oh that is wild, thanks for reporting! I will double check this after my conferences and summer break. |
For this, see my comment here:
I guess that this issue is proved again..
|
Fixed in openPMD/openPMD-api#1476 |
Thank you for updating the representation strings, @franzpoeschel! This will be shipped with the next patch release, @bwheelz36 for your example above, all looks good and you can keep exploring what is inside the data series for k_i, i in s.iterations.items():
print("Iteration: {0}".format(k_i))
for k_p, p in i.particles.items():
print(" Particle species '{0}':".format(k_p)) inside the particle species Even easier is the access as a data frame, as in the 11_particle_dataframe.py example: for i in s.iterations:
for p in i.particles:
df = p.to_df()
print(df) |
@bwheelz36 did this help? :) |
Hi @ax3l - the first loop you posted above helps yes - it is clear there is some data there! in that example, doing the second loop crashes with Can I make sure I understand the intent behind iterations - each iteration would represent for instance a time interval? |
I think that there is a slight bug in the second loop, try this one: for it_index, it in s.iterations.items():
for p in it.particles:
df = p.to_df()
print(df) |
Thank you for the JOSS submission in openjournals/joss-reviews#5375 .
I really like the support of the IAEA data loaders.
Based on the extended abstract and linked motivating discussion in it, I was wondering:
I am personally curious if, for phase space data, the openPMD standard [1] [2] (disclaimer: I lead this effort) could be helpful as an additional input loader source? We have by now a relatively large selection of accelerator codes supporting openPMD as their output and also try to use it more in experimental laser-plasma accelerator work.
The paper summarizes so far:
If one were to implement another loader, how much work would be needed?
I am looking at
https://bwheelz36.github.io/ParticlePhaseSpace/new_data_loader.html
and am further curious about data sizes: #158
Update: I found https://bwheelz36.github.io/ParticlePhaseSpace/code_docs.html#ParticlePhaseSpace.DataLoaders.Load_PandasData which might be pretty easy to couple to openPMD with https://github.com/openPMD/openPMD-api/blob/0.15.1/examples/11_particle_dataframe.py (example data sets here). (Our Pandas reader supports chunked processing - let's continue discussion on lasy loading/streaming/out-of-core processing in #158)
Note that the linked reference
Kuschel, S. (2022). Postpic.
https://github.com/skuschel/postpic implemented openPMD early on.Minor correction: I think it should read
(2014)
as of the first release for this reference.[1] https://github.com/openPMD
[2] https://www.openPMD.org
The text was updated successfully, but these errors were encountered: