Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add Callback Support for Progress Tracking in frelon_peaksearch.process #383

Closed
abmajith opened this issue Jan 21, 2025 · 6 comments · Fixed by #386
Closed

Proposal: Add Callback Support for Progress Tracking in frelon_peaksearch.process #383

abmajith opened this issue Jan 21, 2025 · 6 comments · Fixed by #386

Comments

@abmajith
Copy link
Contributor

abmajith commented Jan 21, 2025

Hi @jonwright, @jadball et @loichuder,

I was working on a task for the Segmenter in the Ewoksapp, and I found it would be very useful to have a callback mechanism in the frelon_peaksearch module, specifically in the function:

def process(ds, worker_args, ncpu=None):

The main motivation is to provide a better user interface for tracking the progress of parallel computation. By allowing tqdm or similar libraries to be used with a custom callback (e.g., integrating progress with a task instance), users can monitor progress in a meaningful way.

Proposed Change

I propose adding two optional arguments to the process function:

tqdm_class: A custom progress bar class (e.g., a subclass of tqdm).
task_instance: A reference to the task instance for linking the progress bar with the calling context.

The updated function signature would look like this:

def process(ds, worker_args, ncpu=None, tqdm_class=None, task_instance=None, **kwargs):

With these arguments, the logic for handling progress bars would be as follows:

if tqdm_class and task_instance:
    all_peaks = process_map(
        pps,
        args,
        chunksize=1,
        max_workers=nthreads,
        tqdm_class=tqdm_class,
        task_instance=task_instance,
    )
else:
    all_peaks = process_map(pps, args, chunksize=1, max_workers=nthreads)

If both tqdm_class and task_instance are provided, the progress bar integrates seamlessly into the calling context. Otherwise, the function defaults to the current behavior.
Testing and Implementation

I tested this change in my fork of the ImageD11 library, which you can find here:
https://github.com/abmajith/ImageD11/blob/dauteam/ImageD11/frelon_peaksearch.py

Looking forward to your thoughts!

PS:
This Custom tqdm class looks like this

from tqdm import tqdm

class CustomTqdm(tqdm):
    def __init__(self, *args, task_instance=None, **kwargs):
        super().__init__(*args, **kwargs)
        self.task_instance = task_instance
        self.finished_count = 0

    def update(self, n=1):
        super().update(n)
        self.finished_count += n
        # Optionally update the task_instance
        if self.task_instance:
            self.task_instance.progress = 100.0 * (self.finished_count  / self.total)

this is same as the default tqdm clask used by process_map, only addition of adding this extra attribute task_instance (ewoks class instance) and finished_count integer instance

@loichuder
Copy link
Contributor

Thanks for the edits @jadball 😄


I think adding a generic arg process_map_kwargs would be more flexible and less ewoks/tqdm-centric:

The process signature would become

def process(ds, worker_args, ncpu=None, **process_map_kwargs):

and the call to process_map would become

# No additional if needed
all_peaks = process_map(pps, args, chunksize=1, max_workers=nthreads, **process_map_kwargs)

@jonwright
Copy link
Member

The edit looks fine, but it misses the point.

For ewoks processing: users will ask you to run the segmentation of each frame live during acquisition and plot the 2D frame by frame results. Then do the 3D merge as soon as the scan ends.

This is the same problem as the one behind issue #252. For every non-trivial computation, we want to run it in parallel on one (or more) compute nodes. The frelon_peaksearch is the only place in ImageD11 where tqdm process_map is used.

A better refactoring in the long term (not for the upcoming release) would be to implement #252 across the whole codebase and try to take into account this need for that work.

@abmajith
Copy link
Contributor Author

Hi @jonwright, @loichuder, and @jadball,

In the context of live acquisition and frame-by-frame segmenting (as opposed to batch processing used in process_map),

we were implemented similar live data acquisition and processing methods in other Ewoks app projects. I will also be discussing this with Wout, Loic, and other members of DAU for further alignment.

To make the 3DXRD technique more suitable for such workflows, we propose treating live acquisition and processing as a separate Ewoks task in the future. While this requires further investigation on the ImageD11 library, more exactly the file structures and stateful masterfile (.h5), we will return with a detailed and modular proposal.

For now, our immediate request is to make the process function in frelon_peaksearch.py more modular by allowing additional flexibility, such as callbacks for progress tracking. Future updates would directly call segmenting functions rather than the process function, ensuring adaptability and modularity in the workflow

@loichuder
Copy link
Contributor

I agree that pursing #252 is the way to go, but it is an great undertaking which will probably need a lot of time and expertise.

In my opinion, we could add the process_map_kwargs described in #383 (comment) as a quick workaround for now.

I think it is light and generic enough to not get in the way of future refactoring and would already be useful to be able to track progress of process_map when running it offline.

@loichuder
Copy link
Contributor

@jonwright I'll take your 👍 for an agreement. Thanks 😄

@loichuder
Copy link
Contributor

loichuder commented Jan 22, 2025

I closed the issue since it was specifically about progress tracking in frelon_peaksearch.process but as discussed, work should continue towards #252

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants