Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡️ Make memory estimates data-dependent #1480

Open
shnizzedy opened this issue Apr 1, 2021 · 3 comments
Open

⚡️ Make memory estimates data-dependent #1480

shnizzedy opened this issue Apr 1, 2021 · 3 comments
Assignees
Labels
0 - Low Priority improvement performance potential idea For feature/change ideas that are still up for discussion or approval.

Comments

@shnizzedy
Copy link
Member

Related problem

The 'insufficient resources' error at the beginning is an attempt to reduce the instances of out-of-memory crashes during runs, to be able to adjust up-front, but we need to do some tuning; right now, the memory estimates are hard-coded and data-independent, based on common data sizes, but obviously the estimates will be too big for small runs and too small for big runs until we make those estimates data-dependent.

These nodes have estimates > 4 GB:

node estimate (GB)
ALFF bandpass filter 13
compcor DetrendPC 12.5
apply ANTs warp 10
cosine filter 8
FLIRT resample 8
functional to standard composite transform 8
mask to EPI 8
reho_map 6
mean filter 5
VMHC apply_transform 5
spatial regression 4.5

Originally posted by @shnizzedy in #1479 (comment)

Related #1166, #1301, #1404, #1453

Proposed feature

Starting with the memory-hungriest nodes (those in the table above),

  1. determine how much memory per data unit (e.g. memory per TR at resolution)
  2. make that data unit available to the functions calling those nodes
  3. apply that memory estimate when calling the node, in place of the defaults above

Additional context

Some of these estimates could potentially get complicated as the pipeline progresses through transforms and resamplings.

@shnizzedy shnizzedy added improvement potential idea For feature/change ideas that are still up for discussion or approval. labels Apr 1, 2021
@shnizzedy
Copy link
Member Author

MM suggested using [1200, 600, 300, 150, 75] TRs × a subject from [HCP, HBN, HNU, NYU test/retest] to generate the initial estimates to generate the formula for estimating.

@shnizzedy shnizzedy self-assigned this Apr 21, 2021
@shnizzedy
Copy link
Member Author

The threading issue I'm working on currently is dealing with is making memory estimates data-dependent.

Before 1.8, we were mostly relying on the default per-node memory estimate of 0.2 GB; in 1.8 we raised the default to 2.0 GB and set some higher estimates based on observed memory usage. These estimates are still just flat estimates that don't allocate well for varying data shapes.

For example, a BOLD image with dimensions 90 × 104 × 72 × 300 uses about 12.5 GB for apply_warp_warp_ts_to_T1template (to a 256 × 320 × 320 T1 image). One with 90 × 104 × 72 × 1200 uses about 50 GB for the same node. Our current estimate for that node is 10 GB. An overrun of 2.5 GB isn't great, and one of 40 GB is rough, but with a powerful enough system and soft limits will still complete.

The issue is really a problem when multiple nodes that don't allocate enough memory try to run at the same time.

The particular pipeline configuration that raised the issue here has a bunch of forks (and the data have several functional images per subject, like

 ├── anat
 │   └── sub-105923_ses-retest_T1w.nii.gz
 └── func
     ├── sub-105923_ses-retest_task-restLRbeltrvt_run-1_bold.json
     ├── sub-105923_ses-retest_task-restLRbeltrvt_run-1_bold.nii.gz
     ├── sub-105923_ses-retest_task-restLRbeltrvt_run-2_bold.json
     ├── sub-105923_ses-retest_task-restLRbeltrvt_run-2_bold.nii.gz
     ├── sub-105923_ses-retest_task-restLRpredrvt_run-1_bold.json
     ├── sub-105923_ses-retest_task-restLRpredrvt_run-1_bold.nii.gz
     ├── sub-105923_ses-retest_task-restLRpredrvt_run-2_bold.json
     ├── sub-105923_ses-retest_task-restLRpredrvt_run-2_bold.nii.gz
     ├── sub-105923_ses-retest_task-restLR_run-1_bold.json
     ├── sub-105923_ses-retest_task-restLR_run-1_bold.nii.gz
     ├── sub-105923_ses-retest_task-restLR_run-2_bold.json
     └── sub-105923_ses-retest_task-restLR_run-2_bold.nii.gz

).

I ran the subject above single-threaded with a single functional image 3 times (in full (1200 timepoints), truncated to 600 timepoints and truncated to 300 timepoints) to get the callback.logs and make the hungriest nodes adjust estimates based on the number of timepoints, to get this particular set of runs going. As a follow-up I plan to

  • run the same configuration with images with other spatial resolutions
  • run other configurations

to better tune the memory estimation formulas.

In the back of my mind, I have a lingering concern that there's something screwy with the way nipype is allocating/monitoring threads beyond how the log is reporting the number of threads. But I know the memory overrun issue is real, so I'm starting there.

― me, in an email

@shnizzedy
Copy link
Member Author

Per @anibalsolon's suggestion, I'm trying to override Node's mem_gb @property to dynamically consider input files: 0626a66

For the first iteration, I'm only considering the time dimension as variable. Once that works, I'll refactor to include x, y, and z dimensions and see if I can figure out a way to simplify the paramaterization. Currently,

mem_gb : int or float
Estimate (in GB) of constant memory to allocate for this node.
mem_x : tuple
(int or float, str)
Multiplier for memory allocation such that
`mem_x[0]` times
the number of timepoints in file at `mem+x[1]` plus
`mem_gb` equals
the total memory allocation for the node.
(TEMPORARY: will replace number of timepoints with
spatial dimensions times timepoints)''']))

like

cal_DVARS = pe.Node(dvars_ImageTo1D,
name='cal_DVARS',
mem_gb=0.4,
mem_x=(0.0033, lambda **kwargs: kwargs['in_file']))

for a memory estimate of 0.4 + 0.0033 * {number of timepoints in 'in_file'}.

It would of course be better to

  • just pass 'in_file' instead of the whole lambda **kwargs: kwargs['in_file'] every time
  • be dynamic to x, y, z and t instead of just t

For now, I just plugged in a couple of these and kicked off a couple runs to test the proof of concept.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Low Priority improvement performance potential idea For feature/change ideas that are still up for discussion or approval.
Projects
None yet
Development

No branches or pull requests

1 participant