Releases · LLNL/merlin

28 Oct 21:33

bgunnar5

1.12.2

edbfe9b

Version 1.12.2 Latest

Latest

[1.12.2]

Added

Conflict handler option to the dict_deep_merge function in utils.py
Ability to add module-specific pytest fixtures
Added fixtures specifically for testing status functionality
Added tests for reading and writing status files, and status conflict handling
Added tests for the dict_deep_merge function
Pytest-mock as a dependency for the test suite (necessary for using mocks and fixtures in the same test)
New github action test to make sure target branch has been merged into the source first, so we know histories are ok
Check in the status commands to make sure we're not pulling statuses from nested workspaces
Added setuptools as a requirement for python 3.12 to recognize the pkg_resources library
Patch to celery results backend to stop ChordErrors being raised and breaking workflows when a single task fails
New step return code $(MERLIN_RAISE_ERROR) to force an error to be raised by a task (mainly for testing)
- Added description of this to docs
New test to ensure a single failed task won't break a workflow
Several new unit tests for the following subdirectories:
- merlin/common/
- merlin/config/
- merlin/examples/
- merlin/server/
Context managers for the conftest.py file to ensure safe spin up and shutdown of fixtures
- RedisServerManager: context to help with starting/stopping a redis server for tests
- CeleryWorkersManager: context to help with starting/stopping workers for tests
Ability to copy and print the Config object from merlin/config/__init__.py
Equality method to the ContainerFormatConfig and ContainerConfig objects from merlin/server/server_util.py

Changed

merlin info is cleaner and gives python package info
merlin version now prints with every banner message
Applying filters for merlin detailed-status will now log debug statements instead of warnings
Modified the unit tests for the merlin status command to use pytest rather than unittest
Added fixtures for merlin status tests that copy the workspace to a temporary directory so you can see exactly what's run in a test
Batch block and workers now allow for variables to be used in node settings
Task id is now the path to the directory
Split the start_server and config_server functions of merlin/server/server_commands.py into multiple functions to make testing easier
Split the create_server_config function of merlin/server/server_config.py into two functions to make testing easier
Combined set_snapshot_seconds and set_snapshot_changes methods of RedisConfig into one method set_snapshot

Fixed

Bugfix for output of merlin example openfoam_wf_singularity
A bug with the CHANGELOG detection test when the target branch isn't in the ci runner history
Link to Merlin banner in readme
Issue with escape sequences in ascii art (caught by python 3.12)
Bug where Flux wasn't identifying total number of nodes on an allocation
- Not supporting Flux versions below 0.17.0

@bgunnar5 @lucpeterson

Contributors

lucpeterson and bgunnar5

Assets 2

12 Jun 20:46

bgunnar5

1.12.2b1

9e27798

Version 1.12.2b1 Pre-release

Pre-release

[1.12.2b1]

Added

Conflict handler option to the dict_deep_merge function in utils.py
Ability to add module-specific pytest fixtures
Added fixtures specifically for testing status functionality
Added tests for reading and writing status files, and status conflict handling
Added tests for the dict_deep_merge function
Pytest-mock as a dependency for the test suite (necessary for using mocks and fixtures in the same test)
New github action test to make sure target branch has been merged into the source first, so we know histories are ok
Check in the status commands to make sure we're not pulling statuses from nested workspaces
Added setuptools as a requirement for python 3.12 to recognize the pkg_resources library
Patch to celery results backend to stop ChordErrors being raised and breaking workflows when a single task fails
New step return code $(MERLIN_RAISE_ERROR) to force an error to be raised by a task (mainly for testing)
- Added description of this to docs
New test to ensure a single failed task won't break a workflow

Changed

merlin info is cleaner and gives python package info
merlin version now prints with every banner message
Applying filters for merlin detailed-status will now log debug statements instead of warnings
Modified the unit tests for the merlin status command to use pytest rather than unittest
Added fixtures for merlin status tests that copy the workspace to a temporary directory so you can see exactly what's run in a test
Batch block and workers now allow for variables to be used in node settings
Task id is now the path to the directory

Fixed

Bugfix for output of merlin example openfoam_wf_singularity
A bug with the CHANGELOG detection test when the target branch isn't in the ci runner history
Link to Merlin banner in readme
Issue with escape sequences in ascii art (caught by python 3.12)
Bug where Flux wasn't identifying total number of nodes on an allocation
- Not supporting Flux versions below 0.17.0

@lucpeterson @bgunnar5

Contributors

lucpeterson and bgunnar5

Assets 2

02 May 21:50

bgunnar5

1.12.1

b4321d0

Version 1.12.1

[1.12.1]

Added

New Priority.RETRY value for the Celery task priorities. This will be the new highest priority.
Support for the status command to handle multiple workers on the same step
Documentation on how to run cross-node workflows with a containerized server (merlin server)

Changed

Modified some tests in test_status.py and test_detailed_status.py to accommodate bugfixes for the status commands

Fixed

Bugfixes for the status commands:
- Fixed "DRY RUN" naming convention so that it outputs in the progress bar properly
- Fixed issue where a step that was run with one sample would delete the status file upon condensing
- Fixed issue where multiple workers processing the same step would break the status file and cause the workflow to crash
- Added a catch for the JSONDecodeError that would potentially crash a run
- Added a FileLock to the status write in _update_status_file() of MerlinStepRecord to avoid potential race conditions (potentially related to JSONDecodeError above)
- Added in export MANPAGER="less -r" call behind the scenes for detailed-status to fix ASCII error

@bgunnar5 @lucpeterson @koning @ryannova

Contributors

koning, lucpeterson, and 2 other contributors

Assets 2

15 Feb 22:46

bgunnar5

1.12.0

60e46d7

Version 1.12.0

[1.12.0]

Added

A new command merlin queue-info that will print the status of your celery queues
- By default this will only pull information from active queues
- There are options to look for specific queues (--specific-queues), queues defined in certain spec files (--spec; this is the same functionality as the merlin status command prior to this update), and queues attached to certain steps (--steps)
- Queue info can be dumped to outfiles with --dump
A new command merlin detailed-status that displays task-by-task status information about your study
- This has options to filter by return code, task queues, task statuses, and workers
- You can set a limit on the number of tasks to display
- There are 3 options to modify the output display
Docs for all of the monitoring commands
New file merlin/study/status.py dedicated to work relating to the status command
- Contains the Status and DetailedStatus classes
New file merlin/study/status_renderers.py dedicated to formatting the output for the detailed-status command
New file merlin/common/dumper.py containing a Dumper object to help dump output to outfiles
Study name and parameter info now stored in the DAG and MerlinStep objects
Added functions to merlin/display.py that help display status information:
- display_task_by_task_status handles the display for the merlin detailed-status command
- display_status_summary handles the display for the merlin status command
- display_progress_bar generates and displays a progress bar
Added new methods to the MerlinSpec class:
- get_worker_step_map()
- get_queue_step_relationship()
- get_tasks_per_step()
- get_step_param_map()
Added methods to the MerlinStepRecord class to mark status changes for tasks as they run (follows Maestro's StepRecord format mostly)
Added methods to the Step class:
- establish_params()
- name_no_params()
Added a property paramater_labels to the MerlinStudy class
Added two new utility functions:
- dict_deep_merge() that deep merges two dicts into one
- ws_time_to_dt() that converts a workspace timestring (YYYYMMDD-HHMMSS) to a datetime object
A new celery task condense_status_files to be called when sets of samples finish
Added a celery config setting worker_cancel_long_running_tasks_on_connection_loss since this functionality is about to change in the next version of celery
Tests for the Status and DetailedStatus classes
- this required adding a decent amount of test files to help with the tests; these can be found under the tests/unit/study/status_test_files directory
Pytest fixtures in the conftest.py file of the integration test suite
- NOTE: an export command export LC_ALL='C' had to be added to fix a bug in the WEAVE CI. This can be removed when we resolve this issue for the merlin server command
Tests for the celeryadapter.py module
New CeleryTestWorkersManager context to help with starting/stopping workers for tests

Changed

Reformatted the entire merlin status command
- Now accepts both spec files and workspace directories as arguments
- Removed the --steps flag
- Replaced the --csv flag with the --dump flag
- New functionality:
  - Shows step_by_step progress bar for tasks
  - Displays a summary of task statuses below the progress bar
Split the add_chains_to_chord function in merlin/common/tasks.py into two functions:
- get_1d_chain which converts a 2D list of chains into a 1D list
- launch_chain which launches the 1D chain
Pulled the needs_merlin_expansion() method out of the Step class and made it a function instead
Removed tabulate_info function; replaced with tabulate from the tabulate library
Moved verify_filepath and verify_dirpath from merlin/main.py to merlin/utils.py
The entire documentation has been ported to MkDocs and re-organized
- Dark Mode
- New "Getting Started" example for a simple setup tutorial
- More detail on configuration instructions
- There's now a full page on installation instructions
- More detail on explaining the spec file
- More detail with the CLI page
- New "Running Studies" page to explain different ways to run studies, restart them, and accomplish command line substitution
- New "Interpreting Output" page to help users understand how the output workspace is generated in more detail
- New "Examples" page has been added
- Updated "FAQ" page to include more links to helpful locations throughout the documentation
- Set up a place to store API docs
- New "Contact" page with info on reaching Merlin devs
The Merlin tutorial defaults to using Singularity rather than Docker for the OpenFoam example. Minor tutorial fixes have also been applied.

Fixed

The merlin status command so that it's consistent in its output whether using redis or rabbitmq as the broker
The merlin monitor command will now keep an allocation up if the queues are empty and workers are still processing tasks
Add the restart keyword to the specification docs
Cyclical imports and config imports that could easily cause ci issues

@bgunnar5 @koning @lucpeterson @xorJane

Contributors

koning, lucpeterson, and 2 other contributors

Assets 2

23 Oct 18:42

bgunnar5

1.11.1

e731420

Version 1.11.1

[1.11.1]

Fixed

Typo in batch.py that caused lsf launches to fail (ALL_SGPUS changed to ALL_GPUS)

@bgunnar5

Contributors

bgunnar5

Assets 2

09 Oct 20:58

bgunnar5

1.11.0

093c867

Version 1.11.0

[1.11.0]

Added

New reserved variable:
- VLAUNCHER: The same functionality as the LAUNCHER variable, but will substitute shell variables MERLIN_NODES, MERLIN_PROCS, MERLIN_CORES, and MERLIN_GPUS for nodes, procs, cores per task, and gpus

Changed

Hardcoded Sphinx v5.3.0 requirement is now removed so we can use latest Sphinx

Fixed

A bug where the filenames in iterative workflows kept appending .out, .partial, or .expanded to the filenames stored in the merlin_info/ subdirectory
A bug where a skewed sample hierarchy was created when a restart was necessary in the add_merlin_expanded_chain_to_chord task

@koning @bgunnar5

Contributors

koning and bgunnar5

Assets 2

18 Aug 23:55

bgunnar5

1.10.3

faf71ed

Version 1.10.3

[1.10.3]

Added

The *.conf regex for the recursive-include of the merlin server directory so that pip will add it to the wheel
A note to the docs for how to fix an issue where the merlin server start command hangs

Changed

Bump certifi from 2022.12.7 to 2023.7.22 in /docs
Bump pygments from 2.13.0 to 2.15.0 in /docs
Bump requests from 2.28.1 to 2.31.0 in /docs

Assets 2

07 Aug 16:55

bgunnar5

1.10.2

261e035

Version 1.10.2

[1.10.2]

Fixed

A bug where the .orig, .partial, and .expanded file names were using the study name rather than the original file name
A bug where the openfoam_wf_singularity example was not being found
Some build warnings in the docs (unknown targets, duplicate targets, title underlines too short, etc.)
A bug where when the output path contained a variable that was overridden, the overridden variable was not changed in the output_path
A bug where permission denied errors happened when checking for system scheduler

Added

Tests for ensuring $(MERLIN_SPEC_ORIGINAL_TEMPLATE), $(MERLIN_SPEC_ARCHIVED_COPY), and $(MERLIN_SPEC_EXECUTED_RUN) are stored correctly
A pdf download format for the docs
Tests for cli substitutions

Changed

The ProvenanceYAMLFileHasRegex condition for integration tests now saves the study name and spec file name as attributes instead of just the study name
- This lead to minor changes in 3 tests ("local override feature demo", "local pgen feature demo", and "remote feature demo") with what we pass to this specific condition
Updated scikit-learn requirement for the openfoam_wf_singularity example
Uncommented Latex support in the docs configuration to get pdf builds working

Assets 2

04 May 21:54

bgunnar5

1.10.1

ab739c6

Version 1.10.1

[1.10.1]

Fixed

A bug where assigning a worker all steps also assigned steps to the default worker

Added

Tests to make sure the default worker is being assigned properly

Changed

Requirement name in examples/workflows/remote_feature_demo/requirements.txt and examples/workflows/feature_demo/requirements.txt from sklearn to scikit-learn since sklearn is now deprecated

@lucpeterson @bgunnar5

Contributors

lucpeterson and bgunnar5

Assets 2

13 Apr 15:24

bgunnar5

1.10.0

3acb30d

Version 1.10.0

[1.10.0]

Fixed

Pip wheel wasn't including .sh files for merlin examples
The learn.py script in the openfoam_wf* examples will now create the missing Energy v Lidspeed plot
Fixed the flags associated with the stop-workers command (--spec, --queues, --workers)
Fixed the --step flag for the run-workers command
Fixed most of the pylint errors that we're showing up when you ran make check-style
- Some errors have been disabled rather than fixed. These include:
  - Any pylint errors in merlin_template.py since it's deprecated now
  - A "duplicate code" instance between a function in expansion.py and a method in study.py
    - The function is explicitly not creating a MerlinStudy object so the code must be duplicate here
  - Invalid-name (C0103): These errors typically relate to the names of short variables (i.e. naming files something like f or errors e)
  - Unused-argument (W0613): These have been disabled for celery-related functions since celery does use these arguments behind the scenes
  - Broad-exception (W0718): Pylint wants a more specific exception but sometimes it's ok to have a broad exception
  - Import-outside-toplevel (C0415): Sometimes it's necessary for us to import inside a function. Where this is the case, these errors are disabled
  - Too-many-statements (R0915): This is disabled for the setup_argparse function in main.py since it's necessary to be big. It's disabled in tasks.py and celeryadapter.py too until we can get around to refactoring some code there
  - No-else-return (R1705): These are disabled in router.py until we refactor the file
  - Consider-using-with (R1732): Pylint wants us to consider using with for calls to subprocess.run or subprocess.Popen but it's not necessary
  - Too-many-arguments (R0913): These are disabled for functions that I believe need to have several arguments
    - Note: these could be fixed by using *args and **kwargs but it makes the code harder to follow so I'm opting to not do that
  - Too-many-local-variables (R0914): These are disabled for functions that have a lot of variables
    - It may be a good idea at some point to go through these and try to find ways to shorten the number of variables used or split the functions up
  - Too-many-branches (R0912): These are disabled for certain functions that require a good amount of branching
    - Might be able to fix this in the future if we split functions up more
  - Too-few-public-methods (R0903): These are disabled for classes we may add to in the future or "wrapper" classes
  - Attribute-defined-outside-init (W0201): These errors are only disabled in specification.py as they occur in class methods so init() won't be called
Fixed an issue where the walltime value in the batch block was being converted to an integer instead of remaining in HH:MM:SS format

Added

Now loads np.arrays of dtype='object', allowing mix-type sample npy
Added a singularity container openfoam_wf example
Added flux native worker launch support
Added PBS flux launch support
Added check_for_flux, check_for_slurm, check_for_lsf, and check_for_pbs utility functions
Tests for the stop-workers command
A function in run_tests.py to check that an integration test definition is formatted correctly
A new dev_workflow example multiple_workers.yaml that's used for testing the stop-workers command
Ability to start 2 subprocesses for a single test
Added the --distributed and --display-tests flags to run_tests.py
- --distributed: only run distributed tests
- --display-tests: displays a table of all existing tests and the id associated with each test
Added the --disable-logs flag to the run-workers command
Merlin will now assign default_worker to any step not associated with a worker
Added get_step_worker_map() as a method in specification.py
Added tabulate_info() function in display.py to help with table formatting
Added get_flux_alloc function for new flux version >= 0.48.x interface change
New flags to the query-workers command
- --queues: query workers based on the queues they're associated with
- --workers: query workers based on a regex of the names you're looking for
- --spec: query workers based on the workers defined in a spec file

Changed

Changed celery_regex to celery_slurm_regex in test_definitions.py
Reformatted how integration tests are defined and part of how they run
- Test values are now dictionaries rather than tuples
- Stopped using subprocess.Popen() and subprocess.communicate() to run tests and now instead use subprocess.run() for simplicity and to keep things up-to-date with the latest subprocess release (run() will call Popen() and communicate() under the hood so we don't have to handle that anymore)
Rewrote the README in the integration tests folder to explain the new integration test format
Reformatted start_celery_workers() in celeryadapter.py file. This involved:
- Modifying verify_args() to return the arguments it verifies/updates
- Changing launch_celery_worker() to launch the subprocess (no longer builds the celery command)
- Creating get_celery_cmd() to do what launch_celery_worker() used to do and build the celery command to run
- Creating _get_steps_to_start(), _create_kwargs(), and _get_workers_to_start() as helper functions to simplify logic in start_celery_workers()
Modified the merlinspec.json file:
- the minimum gpus per task is now 0 instead of 1
- variables defined in the env block of a spec file can now be arrays
Refactored batch.py:
- Merged 4 functions (check_for_slurm, check_for_lsf, check_for_flux, and check_for_pbs) into 1 function named check_for_scheduler
  - Modified get_batch_type to accommodate this change
- Added a function parse_batch_block to handle all the logic of reading in the batch block and storing it in one dict
- Added a function get_flux_launch to help decrease the amount of logic taking place in batch_worker_launch
- Modified batch_worker_launch to use the new parse_batch_block function
- Added a function construct_scheduler_legend to build a dict that keeps as much information as we need about each scheduler stored in one place
- Cleaned up the construct_worker_launch_command function to utilize the newly added functions and decrease the amount of repeated code
Changed get_flux_cmd for new flux version >=0.48.x interface
The query-workers command now prints a table as its' output
- Each row of the Workers column has the name of an active worker
- Each row of the Queues column has a list of queues associated with the active worker

@koning @lucpeterson @ryannova @bgunnar5

Contributors

koning, lucpeterson, and 2 other contributors

Assets 2

Releases: LLNL/merlin

Version 1.12.2

[1.12.2]

Added

Changed

Fixed

Contributors

Version 1.12.2b1

[1.12.2b1]

Added

Changed

Fixed

Contributors

Version 1.12.1

[1.12.1]

Added

Changed

Fixed

Contributors

Version 1.12.0

[1.12.0]

Added

Changed

Fixed

Contributors

Version 1.11.1

[1.11.1]

Fixed

Contributors

Version 1.11.0

[1.11.0]

Added

Changed

Fixed

Contributors

Version 1.10.3

[1.10.3]

Added

Changed

Version 1.10.2

[1.10.2]

Fixed

Added

Changed

Version 1.10.1

[1.10.1]

Fixed

Added

Changed

Contributors

Version 1.10.0

[1.10.0]

Fixed

Added

Changed

Contributors