Improve of design of filtering of branches when concatenating #1388

acampove · 2025-02-23T11:29:09Z

I am using version 5.5.2 and in the snippet below:

import uproot
import numpy as np

def _make_file(fname : str):
    n_entries = 10
    branch1_data = np.random.rand(n_entries)
    branch2_data = np.random.rand(n_entries)

    with uproot.recreate(fname) as f:
        f["tree"] = {
            "a_1": branch1_data,
            "a_2": branch2_data,
            "a_3": branch1_data,
            "a_4": branch2_data,  
            "b_1": branch1_data,
            "b_2": branch2_data,  
            "b_3": branch1_data,
            "b_4": branch2_data,  
        }

def main():
    _make_file('file_1.root')
    _make_file('file_2.root')

    df = uproot.concatenate({'file_1.root': 'tree', 'file_2.root' : 'tree'}, expressions={'a_1', 'a_2'}, filter_name='b*', library='pd')
    print(df)

if __name__ == "__main__":
    main()

I get columns a_1 and a_2.

From the user's POV, I want both the b and the a columns. Is it possible to modify the behavior of uproot to get an inclusive, rather than exclusive selection?

The text was updated successfully, but these errors were encountered:

pfackeldey · 2025-02-27T19:34:35Z

Hi @acampove,
expressions can be used to on-the-fly transform your data into a specific array/column. If you're not interested in that, you probably don't need to provide it.
Omitting expressions yields the expected dataframe:

df = uproot.concatenate({'file_1.root': 'tree', 'file_2.root' : 'tree'}, filter_name='b*', library='pd')
print(df)
#         b_1       b_2       b_3       b_4
# 0   0.096449  0.953901  0.096449  0.953901
# 1   0.637242  0.259867  0.637242  0.259867
# 2   0.515761  0.313249  0.515761  0.313249
# 3   0.740748  0.940448  0.740748  0.940448
# 4   0.869072  0.624719  0.869072  0.624719
# 5   0.041706  0.761446  0.041706  0.761446
# 6   0.883716  0.163284  0.883716  0.163284
# 7   0.156949  0.922057  0.156949  0.922057
# 8   0.651333  0.548299  0.651333  0.548299
# 9   0.622364  0.334150  0.622364  0.334150
# 10  0.937513  0.810083  0.937513  0.810083
# 11  0.055701  0.186211  0.055701  0.186211
# 12  0.611302  0.091394  0.611302  0.091394
# 13  0.862566  0.001212  0.862566  0.001212
# 14  0.710977  0.217308  0.710977  0.217308
# 15  0.250999  0.273506  0.250999  0.273506
# 16  0.286835  0.993268  0.286835  0.993268
# 17  0.990380  0.014993  0.990380  0.014993
# 18  0.256301  0.610082  0.256301  0.610082
# 19  0.690280  0.854935  0.690280  0.854935

To give you an example where expressions can be used for:

df = uproot.concatenate({'file_1.root': 'tree', 'file_2.root' : 'tree'}, expressions="sqrt(b_1**2 + a_1**2)", library='pd')
print(df)
#     sqrt(b_1**2 + a_1**2)
# 0                0.136400
# 1                0.901196
# 2                0.729396
# 3                1.047576
# ...

Best, Peter

pfackeldey · 2025-02-27T19:42:40Z

Oh, sorry, I just realized you wanted to have all b* but also a_1 and a_2. You can have this filter logic with a regex:

df = uproot.concatenate({'file_1.root': 'tree', 'file_2.root' : 'tree'}, filter_name="/(b.+)|(a_[1,2])/i", library='pd')
print(df)
#          a_1       a_2       b_1       b_2       b_3       b_4
# 0   0.404026  0.505566  0.404026  0.505566  0.404026  0.505566
# 1   0.806364  0.069890  0.806364  0.069890  0.806364  0.069890
# 2   0.966566  0.872194  0.966566  0.872194  0.966566  0.872194
# 3   0.226920  0.983254  0.226920  0.983254  0.226920  0.983254
# ...

NJManganelli · 2025-02-27T23:58:18Z

I think a list of filters, eg ["b*", "a_1", ...] should also work

acampove added the feature New feature or request label Feb 23, 2025

pfackeldey self-assigned this Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve of design of filtering of branches when concatenating #1388

Improve of design of filtering of branches when concatenating #1388

acampove commented Feb 23, 2025

pfackeldey commented Feb 27, 2025

pfackeldey commented Feb 27, 2025 •

edited

Loading

NJManganelli commented Feb 27, 2025

Improve of design of filtering of branches when concatenating #1388

Improve of design of filtering of branches when concatenating #1388

Comments

acampove commented Feb 23, 2025

pfackeldey commented Feb 27, 2025

pfackeldey commented Feb 27, 2025 • edited Loading

NJManganelli commented Feb 27, 2025

pfackeldey commented Feb 27, 2025 •

edited

Loading