Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Rework mne bids path match #1355

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

waldie11
Copy link

PR Description

The workflow of using pathlib.Path.rglob and subsequently filtering i.e. by each path starting with root/sub-* is rather non-performant, especially on polluted storage trees.

For starters, by using glob.glob recursively directly, it is possible to apply this filter equivalent immediately.

In an extension, I am suggesting to introduce the option of include_match to get_entity_vals.

Open to suggestions, or stripping this down.

Remark: I failed to join the CI as my email adress is considered work mail for some reason by MS.

Merge checklist

Maintainer, please confirm the following before merging.
If applicable:

  • All comments are resolved
  • This is not your own PR
  • All CIs are happy
  • PR title starts with [MRG]
  • whats_new.rst is updated
  • New contributors have been added to CITATION.cff
  • PR description includes phrase "closes <#issue-number>"

Copy link

welcome bot commented Dec 18, 2024

Hello! 👋 Thanks for opening your first pull request here!
Please read the contributor guide, and please follow the steps outlined in the "Instructions for first-time contributors" section therein. ❤️ We will try to get back to you soon. 🚴🏽‍♂️

pyproject.toml Outdated
@@ -52,6 +52,7 @@ doc = [
"numpydoc",
"openneuro-py",
"pandas",
"pathlib",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's part of the standard library and doesn't need to be listed here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, this comes up though 🤔

/home/circleci/project/mne_bids/path.py:docstring of mne_bids.BIDSPath.root:1: WARNING: py:class reference target not found: pathlib._local.Path [ref.class]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #1358 (comment) . #1353 includes a work around for this but hasn't been merged yet

Copy link

codecov bot commented Dec 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.46%. Comparing base (ccfbfee) to head (970751c).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1355      +/-   ##
==========================================
+ Coverage   97.43%   97.46%   +0.03%     
==========================================
  Files          40       40              
  Lines        8966     9009      +43     
==========================================
+ Hits         8736     8781      +45     
+ Misses        230      228       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sappelhoff
Copy link
Member

thanks @waldie11! we have a bit of a backlog right now so please don't worry if we take a bit longer to get back to you.

@waldie11
Copy link
Author

waldie11 commented Jan 2, 2025

@sappelhoff happy new year!
no worries, as i had seen some of the api changes in python-mne bringing down the test suite, i already expected priorities to be elsewhere.

@waldie11 waldie11 force-pushed the rework_mne_bids_path_match branch from 4863705 to c71a41f Compare January 2, 2025 17:00
Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @waldie11 could you please add an example for the feature that you introduce? Either by modifying an actual example that already exists, or by adding a numpy docstr example below the ones that already exist.

Could you please also resolve the conflicts AND follow the steps for first time contributors here? -->#1354

@waldie11 waldie11 force-pushed the rework_mne_bids_path_match branch from 102eb3d to 0480497 Compare February 3, 2025 13:09
@waldie11 waldie11 force-pushed the rework_mne_bids_path_match branch from 12c4f89 to def6ab2 Compare February 3, 2025 13:49
@waldie11
Copy link
Author

waldie11 commented Feb 3, 2025

Hi @sappelhoff ,

I think the true meat is rather a rework of _return_root_paths. This is not exactly a API function, so it feels a bit weird to write an example.

get_entity_vals seems to undergo some changes within upcoming 0.17 anyhow. I extended the docstring a bit on my feature contribution. Do you think this is sufficient? Neither the ignore_ parameters have an example. It benefits filtering a bids_path, which has deep directory trees for sourcedata, derivatives, subject or who-knows-what, where one wants to select prehand in which directories to look.

my_dataset/
  derivatives/
    downsampled/
      sub-01/
        micr/
          sub-01_sample-01_res-4x_TEM.png
          sub-01_sample-01_res-4x_TEM.json
  sub-01/
    micr/
      sub-01_sample-01_TEM.png
      sub-01_sample-01_TEM.json

(BIDS v1.8.0-1 p. 214)

@waldie11
Copy link
Author

waldie11 commented Feb 3, 2025

Idk how far you are interested in diving into this:

in test_path_benchmark I created a dummy bids compliant tree. If I browse this artificial tree with get_entity_vals, I extract a performance gain in about one order of magnitude by using include_match="sub-*/" in comparison to ignore_dirs=["derivatives", "sourcedata"]

This benchmark is eating up some CI runtime though. I tried keeping it lightweight.

@waldie11 waldie11 force-pushed the rework_mne_bids_path_match branch from 66b4e1c to 824649e Compare February 3, 2025 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants