Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mca/base: add a new MCA variable type for include lists #12826

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hjelmn
Copy link
Member

@hjelmn hjelmn commented Sep 24, 2024

It is not uncommon in Open MPI to register a string variable to allow specifying an exclude or exclude list (every framework registers one). Given this common patten it is worthwhile to formalize an MCA include list variable type: MCA_BASE_VAR_TYPE_INCLUDE_LIST. Variables of this type use a new opal object (mca_base_var_include_list_t) that stores and argv-style array and an exclude- list flag. To register an include list variable the caller must first either OBJ_CONSTRUCT the object or allocate it with OBJ_NEW. Variables of this type are exposed as strings to MPI_T.

@hjelmn hjelmn requested a review from jsquyres September 24, 2024 20:13
@hjelmn hjelmn force-pushed the make_include_lists_a_variable_type_for_mca branch 3 times, most recently from 4f203e3 to 2ed9438 Compare September 24, 2024 22:15
@rhc54
Copy link
Contributor

rhc54 commented Sep 25, 2024

Hmmm...given the interconnection between the OMPI and PMIx/PRRTE codes, and that they both utilize the MCA architecture, I wonder if we will create confusion if we don't bring this across to the others? There is no connection between MPI_T and those other codes, so if that is the primary motivator here, then perhaps it is less of an issue? Just pondering - if we want to maintain the similarity, then it doesn't seem like it would take much to do the port.

@hjelmn
Copy link
Member Author

hjelmn commented Sep 25, 2024

@rhc54 Just depends on if this is useful. From an external perspective these are just strings like today. Internally the parsing is just handled inside the MCA base to avoid boilerplate code for parsing these. It may still be worth porting over to prrte if that still uses MCA given it centralizes the include/exclude logic. We did this with enums as well-- used to have a lot of code taking an int or string and converting it to a string or int.

@hjelmn
Copy link
Member Author

hjelmn commented Oct 4, 2024

I am cleaning this one up a bit. Hold off on reviews. Adding support for regex include lists since I have a need for that.

It is not uncommon in Open MPI to register a string variable to allow specifying
an exclude or exclude list (every framework registers one). Given this common
patten it is worthwhile to formalize an MCA include list variable type:
MCA_BASE_VAR_TYPE_INCLUDE_LIST. Variables of this type use a new opal object
(mca_base_var_include_list_t) that stores and argv-style array and an exclude-
list flag. To register an include list variable the caller must first either
OBJ_CONSTRUCT the object or allocate it with OBJ_NEW. Variables of this type are
exposed as strings to MPI_T.

Signed-off-by: Nathan Hjelm <[email protected]>
@hjelmn hjelmn force-pushed the make_include_lists_a_variable_type_for_mca branch from 2ed9438 to bb835ff Compare October 16, 2024 16:30
@hjelmn
Copy link
Member Author

hjelmn commented Oct 16, 2024

Ok, ready to go.

@hjelmn hjelmn requested a review from bwbarrett January 5, 2025 01:15
@rhc54
Copy link
Contributor

rhc54 commented Feb 19, 2025

Hey Nathan - can you provide an example of this being used somewhere in the code base? I'm trying to fully grok what you planned to do with it.

Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is rare to see a PR adding a new feature and such extensive testing, kudos for that. That being said, while I can picture few places where such capability would be nice to have (not critical because we live without it for a long time), this PR seems like an overblown solution.

OBJ_CLASS_INSTANCE(opal_include_list_t, opal_object_t, include_list_contructor,
include_list_destructor);

static int include_list_match_regex(opal_include_list_t *include_list, const char *value,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is doing the opposite of one might expect by reading the function name: it is not finding an item matching a regular expression from a list, but instead it is finding if an item matches one of the regular expressions in the list. At some point these two outcomes can be considered similar but their complexity is drastically different.

Assuming there is a real need for such the capability to match regular expressions stored in a list, this function need to be documented. And maybe renamed include_regex_list_match?

Copy link
Member Author

@hjelmn hjelmn Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a use case for this. With btl/uct we agreed on an include list of the supported memory domains. Problem is the naming is not necessarily simple. Right now with have mlx5_0 but it will fail to match other HCAs in the system. Same is true for our hardware (somegooglenic[0-9]). I could just enumerate all the possibilities but having it be a regex makes the include list cleaner and easier to maintain (I can just put somegooglenic.* in the list).

* ^foo,bar,baz (exclude)
*/
struct opal_include_list {
opal_serializable_t super;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I question the need for the intermediary opal_serializable_t class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not strictly necessary but allows the building of additional variable types as needed. I can't think of anything off the top of my head that would benefit but figured it is a good to have.

Copy link
Member Author

@hjelmn hjelmn Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea is that these variables are strings to the user (command line, MPI_T, env vars, etc) but may be something else inside Open MPI: argv list, struct, etc. The serializable takes the string and fills in the internal structure and vise versa. All of this is handled in mca/var so it works with MPI_T as well.

@hjelmn
Copy link
Member Author

hjelmn commented Feb 26, 2025

@rhc54 Best example I have is to modify btl_uct_memory_domains to be an include list. This will make it automatically parse the list and support regex matching. I can code that all in the btl but having mca/var do the parsing is better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants