Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16645 cart: Bump file descriptor limit #15224

Merged
merged 29 commits into from
Oct 21, 2024
Merged
Changes from 4 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
6f723ee
DAOS-16645 cart: Bump file descriptor limit
jolivier23 Sep 30, 2024
cce4e6d
clang-format
jolivier23 Sep 30, 2024
c7f7403
One more checkpatch issue
jolivier23 Sep 30, 2024
6c49a8e
Address review comments
jolivier23 Sep 30, 2024
28b70ad
Address review comment
jolivier23 Sep 30, 2024
5df6142
Allow-unstable-test: true
jolivier23 Sep 30, 2024
772d239
Address review comments
jolivier23 Oct 1, 2024
861b4cf
Since this adds a new warning, empty commit for pragma
jolivier23 Oct 1, 2024
e7c32be
Just a test to see if we can get around the extra NLT warning (#15231)
jolivier23 Oct 5, 2024
d9cee76
Merge branch 'master' into jvolivie/setrlimit
jolivier23 Oct 5, 2024
c63e298
Still fails in once case and not sure why
jolivier23 Oct 5, 2024
39bc7d9
change behavior for super user
jolivier23 Oct 5, 2024
9b5439d
fix typo
jolivier23 Oct 5, 2024
2312afc
clang format fix
jolivier23 Oct 5, 2024
5449989
Ok, not sure what is going on here. I have yet to
jolivier23 Oct 5, 2024
72bb00a
Merge branch 'master' into jvolivie/setrlimit
jolivier23 Oct 14, 2024
92c3afc
try another
jolivier23 Oct 15, 2024
e85614a
Bump rlimit to max if using valgrind
jolivier23 Oct 16, 2024
10e786c
Merge branch 'master' into jvolivie/setrlimit
jolivier23 Oct 16, 2024
c80b533
add extra commit
jolivier23 Oct 16, 2024
63b1496
remove superfluous parens
jolivier23 Oct 17, 2024
ed1d48a
Merge branch 'master' into jvolivie/setrlimit
jolivier23 Oct 17, 2024
bf5aee8
Move rlimit setting to main
jolivier23 Oct 17, 2024
2465445
retrigger
jolivier23 Oct 17, 2024
89f6a11
try again
jolivier23 Oct 17, 2024
99dbd33
Test again
jolivier23 Oct 18, 2024
808030f
Merge branch 'master' into jvolivie/setrlimit
jolivier23 Oct 18, 2024
560f030
test again
jolivier23 Oct 18, 2024
782c700
try again
jolivier23 Oct 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 33 additions & 1 deletion src/cart/crt_init.c
Original file line number Diff line number Diff line change
Expand Up @@ -492,10 +492,14 @@ check_grpid(crt_group_id_t grpid)
return rc;
}

#define MIN_TCP_FD 131072
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please prefix internal macros with CRT_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


static void
prov_settings_apply(bool primary, crt_provider_t prov, crt_init_options_t *opt)
{
uint32_t mrc_enable = 0;
struct rlimit rlim;
int rc;
uint32_t mrc_enable = 0;

/* Avoid applying same settings multiple times */
if (g_prov_settings_applied[prov] == true)
Expand All @@ -510,6 +514,34 @@ prov_settings_apply(bool primary, crt_provider_t prov, crt_init_options_t *opt)
d_setenv("FI_OFI_RXM_DEF_TCP_WAIT_OBJ", "pollfd", 0);
}

if (prov == CRT_PROV_OFI_TCP || prov == CRT_PROV_OFI_TCP_RXM) {
/* Bump file descriptor limit if low and if possible */
rc = getrlimit(RLIMIT_NOFILE, &rlim);
if (rc != 0) {
DS_WARN(errno, "getrlimit() failed. Unable to check file descriptor limit");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a warn ? to me it sounds like it should be an error with appropriate errno checking ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Looking at the man page, it doesn't look like it should ever fail in our case.

goto next;
}

if (rlim.rlim_cur < MIN_TCP_FD) {
if (rlim.rlim_max < MIN_TCP_FD) {
D_ERROR("File descriptor hard limit should be at least %d\n",
MIN_TCP_FD);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider that as a warning, not as an error, if let's say someone wanted for any reason to have a lower limit...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is a warning - in effect it's asking the admin to reconfigure the node but there's no error here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

goto next;
}

rlim.rlim_cur = rlim.rlim_max;
rc = setrlimit(RLIMIT_NOFILE, &rlim);
if (rc != 0) {
DS_ERROR(errno,
"setrlimit() failed. Unable to bump file descriptor"
" limit to value >= %d, limit is %lu",
MIN_TCP_FD, rlim.rlim_max);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goto next; missing here? ele on error you will print d_info below:)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, good catch. I added the print later.

soumagne marked this conversation as resolved.
Show resolved Hide resolved
}
D_INFO("Updated soft file descriptor limit to %lu\n", rlim.rlim_max);
}
}
next:

jolivier23 marked this conversation as resolved.
Show resolved Hide resolved
if (prov == CRT_PROV_OFI_CXI)
mrc_enable = 1;

Expand Down
Loading