-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16877 client: implement utilities for shared memory #15613
base: feature/dfs_dcache
Are you sure you want to change the base?
Conversation
Features: shm 1. use tlsf as memory allocator 2. shared memory create/destroy 3. robust mutex in shared memory 4. hash table in shared memory Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>
Ticket title is 'To implement node-wise caching with shared memory ' |
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15613/1/execution/node/1210/log |
Features: shm Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>
…hm_mutex Features: shm Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>
Features: shm Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>
Some component (e.g., hash table record reference count) will be revised later when we add more use cases. |
Currently we use tlsf memory allocator. It will be replaced by our own allocator in future. |
src/tests/ftest/daos_test/shm.py
Outdated
job.assign_hosts(cmocka_utils.hosts) | ||
job.assign_environment(daos_test_env) | ||
|
||
cmocka_utils.run_cmocka_test(self, job) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally we run the cmocka tests, like daos_test, via DaosCoreBase.run_subtest()
, which sets up additional environment variables and configures the dmg command. Do we need any of that here? Note: In its current form the run_subtest()
method uses Orterun to run the daos_test command remotely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a requirement for adding this test we should also run it with the faults-enabled: false
commit pragma to ensure that it will run when we attempt a release build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phender Thank you very much! I wrote shm.py and shm.yaml with dfuse.py and dfuse.yaml as templates. "shm_test" does not need to configure dmg command. Additional environment variables might be in added in futures tests with "daos_test_env" here.
Thank you for your tip of using "faults-enabled: false". I will use it in future. I used "Features: shm" previously to run the new test.
src/tests/ftest/daos_test/shm.yaml
Outdated
pool: | ||
scm_size: 1G | ||
container: | ||
type: POSIX | ||
control_method: daos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't used by the test. A typical cmocka test would use the pool
entry information, but only when the test is run via DaosCoreBase.run_subtest()
:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new test "shm_test" does not need scm_size and nvme_size information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what @phender means is since this test does not use DaosCoreBase.run_subtest()
, the entire pool
and container
keys here are not used and can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or if we use DaosCoreBase.run_subtest()
we can keep it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daltonbohning @phender Thank you very much! I will try removing pool and container keys locally to make sure it works. I thought they are required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daltonbohning @phender You are right. We can remove pool and container keys in yaml file as you suggested. I will update it in next commit. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not yet finish the review process, but I still have several concerns and questions regarding this PR.
src/tests/ftest/daos_test/shm.py
Outdated
@@ -0,0 +1,44 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding this test is more a unit test and thus should probably be run with the utils/run_utest.py python script instead of by the functional test framework.
@phender and @daltonbohning what is your opinion on this point ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, yes. If the same test can be ran as a unit test (low cost, quick) instead of a functional test (higher cost, slower) then we should run it as a unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@knard38 @daltonbohning Thank you very much! I will look into running this test as a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wiliamhuang , If it can help, I recently added some unit tests in the following PR:
https://github.com/daos-stack/daos/pull/14713/files#diff-294ea4ccb7880cabe2a9a4ffadd3c709916da304a39a819c82f23bfe06197a61
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Current tests are simple. They could fit as a unit test. More complex tests will be added in future. We can add shared memory related ftest later when we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wiliamhuang , If it can help, I recently added some unit tests in the following PR: https://github.com/daos-stack/daos/pull/14713/files#diff-294ea4ccb7880cabe2a9a4ffadd3c709916da304a39a819c82f23bfe06197a61
@knard38 Thank you very much! It's very helpful.
src/gurt/shm_alloc.c
Outdated
/* failed to open */ | ||
if (shm_ht_fd == -1) { | ||
if (errno == ENOENT) { | ||
goto create_shm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT, could improve readability to put this code in a dedicated function instead of goto ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I will create a function for shm creation. Thank you!
src/include/gurt/shm_alloc.h
Outdated
uint64_t shm_pool_size; | ||
/* reserved for future usage */ | ||
char reserved[256]; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be needed if we want to have the mmaped address space to be well aligned
}; | |
} __attribute__((aligned(PAGE_SIZE))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! The address returned by mmap() always is page aligned.
src/gurt/shm_alloc.c
Outdated
shm_pool_size = shm_size / N_SHM_POOL; | ||
if (shm_pool_size % 4096) | ||
/* make shm_pool_size 4K aligned */ | ||
shm_pool_size += (4096 - (shm_pool_size % 4096)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To support different architectures and page size configuration, the value 4096
should probably be defined with a macro such as PAGE_SIZE
shm_pool_size += (4096 - (shm_pool_size % 4096)); | |
shm_pool_size += (PAGE_SIZE - (shm_pool_size % PAGE_SIZE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I will update it as you suggested to make it more portable.
|
||
/* map existing shared memory */ | ||
shm_addr = mmap(FIXED_SHM_ADDR, shm_size, PROT_READ | PROT_WRITE, | ||
MAP_SHARED | MAP_FIXED, shm_ht_fd, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using fixed memory location seems to be strongly discouraged by the man page.
Not sure to understand how we will be sure why it is needed and how we will be sure that it will not overlap existing mmaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Fixed memory location is a strong limitation. It comes from the memory allocator we use. We could eliminate this limitation later once we have our own memory allocator supporting shared memory management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i also don't get why using MAP_FIXED and the fixed address. using MAP_FIXED can cause undefined behavior if the address is actually in use by something else, no?
maybe i don't get the requirement why you need this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right.
The requirement of using fix same address across processes is due to memory allocator we use. This is a quick and dirty way to allow us to use existing memory allocator for now. In future we need to implement our own memory allocator to natively support shared memory management, then the limitation could be removed.
src/gurt/shm_alloc.c
Outdated
char daos_shm_file_name[128]; | ||
|
||
sprintf(daos_shm_file_name, "/dev/shm/%s_%d", daos_shm_name, getuid()); | ||
unlink(daos_shm_file_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT,
From my understanding shm_unlink()
and shm_link()
are equivalent, but using shm_unlink()
with the fd
opened with shm_open()
seems to be more understandable to me: explicitly indicating that we are closing a file descriptor get with shm_open()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I will replace unlink() with shm_unlink(). Thank you!
uint64_t oldref; | ||
|
||
if (idx_small < 0) { | ||
tid = syscall(SYS_gettid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of calling syscall(SYS_gettid)
(which needs to trap into the kernel), a call to a user land function such as pthread_self()
could be more suited ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never used "pthread_self" before.
https://man7.org/linux/man-pages/man3/pthread_self.3.html
The thread ID returned by pthread_self() is not the same thing as
the kernel thread ID returned by a call to gettid.
This syscall will be called only once. I would not worry about the overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not fully agree with you on this, this function should be called for each memory allocation and thus its performance have some impact. However, this point is not a blocker for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested pthread_self() and syscall(SYS_gettid).
Hello from thread 1! tid_pthread_self = 140335788001024 SYS_gettid = 3023160
Hello from thread 2! tid_pthread_self = 140335779608320 SYS_gettid = 3023161
Hello from thread 3! tid_pthread_self = 140335771215616 SYS_gettid = 3023162
Hello from thread 4! tid_pthread_self = 140335762822912 SYS_gettid = 3023163
It looks like all tid_pthread_self are even number here. They are not evenly distributed to me. syscall(SYS_gettid) is only called once per thread instead of every memory allocation.
|
||
atomic_fetch_add_relaxed(&(d_shm_head->ref_count), -1); | ||
if (pid != pid_shm_creator) | ||
munmap(d_shm_head, d_shm_head->size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, all the process should unmap
and then close
the file.
The shared mmaped file content should be keept until the unlink()
and shm_unlink()
will be called.
Thus, not unmapping and closing with the process creating the shared memory file seems to be useless.
I am also concerned that it could be seen as memoy leak by valgrind or other memory checker tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the cache in kernel space, I thought we may want to keep our cache persistent too. Otherwise, shared memory needs to be initialized again and again. Ideally, the space for caching would be freed after the content expires. We need our own shared memory allocator to dynamically expand/shrink shared memory region. It would be a long way to get there.
Yes. I did have a little concern about whether valgrind can detect the memory leak in shared memory usage here. I could play with valgrind to find out with simple test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure that it will change something for the kernel cache to keep a dangling pointer when the process will die.
From my understanding what will make the difference is to unlink the file.
@mchaarawi do you have an opinion on this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One extreme case, the caching does not have any benefit at all if a user runs job serially in case we destruct caching once application ends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, I am missing something, but if you remove the shared file then keeping a dangling pointer will not help if you have unlink the file. On the other hand, if you do not unlink the file, the cache will still be available even if you do not have a dangling pointer at the end your applications. From my understanding the live of the cache is managed by the kernel and it will be removed when the file will be unlinked.
However, it is perfectly that I am missing something obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shm_destroy() is not called by regular applications. It was called only in the shm_test. The file associated with shared memory will not be unlinked.
I talked to Mohamad recently. He suggested daos_agent will initialize and destruct shared memory region. We will update it later.
if (shm_pool_size % 4096) | ||
/* make shm_pool_size 4K aligned */ | ||
shm_pool_size += (4096 - (shm_pool_size % 4096)); | ||
shm_size = shm_pool_size * N_SHM_POOL + sizeof(struct d_shm_hdr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact my remarks on aligning the struct d_shm_addr
was to have a sizeof struct which will be a multiple of the PAGE_SIZE
otherwise shm_size
could be not a multiple of PAGE_SIZE
from my understanding. Then the shared pools (following the header), could be not aligned on PAGE_SIZE (from my understanding).
However, I have not check that the align attribute will properly change the sizeof. I will check this asap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"shm_size" does not have be a multiple of PAGE_SIZE. mmap() does not require size to be a multiple of PAGE_SIZE. The memory allocator will use some space in the pool too. I am not sure making shm_size a multiple of PAGE_SIZE will bring noticeable benefit. Maybe some performance tests could help to clarify later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I know it is indeed always better to have aligned memory for performance.
Moreover, from what I understand, the size really allocated by mmap()
will be the same.
The only difference is that the padding will be added at the end of allocated memory instead of being between the struct d_shm_hdr
and the first shm_pool.
In any case, this is not a blocker from my side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some quick comments.
still need to review more closely. will do that soon
return 0; | ||
} | ||
|
||
rc = d_getenv_uint64_t("DAOS_SHM_SIZE", &shm_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably should have something different than an env variable to determine the size. i previosly implemented a utility that grabs that from the agent. i can integrate that into the branch later and replace this.
|
||
/* map existing shared memory */ | ||
shm_addr = mmap(FIXED_SHM_ADDR, shm_size, PROT_READ | PROT_WRITE, | ||
MAP_SHARED | MAP_FIXED, shm_ht_fd, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i also don't get why using MAP_FIXED and the fixed address. using MAP_FIXED can cause undefined behavior if the address is actually in use by something else, no?
maybe i don't get the requirement why you need this.
|
||
/* the shared memory only accessible for individual user for now */ | ||
sprintf(daos_shm_name_buf, "%s_%d", daos_shm_name, getuid()); | ||
open_rw: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it work if we just use O_CREAT without O_EXCL? so you don't need the try_open then try_create semantics here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used "O_CREAT | O_EXCL" to make sure only one process will initialize shared memory. Without "O_EXCL" would allow more than one concurrent processes to initialize shared memory.
src/tests/suite/SConscript
Outdated
shm_test_env = base_env.Clone() | ||
shm_test_env.compiler_setup() | ||
shm_test_env.AppendUnique(LIBPATH=[Dir('../../gurt')]) | ||
shm_test_env.AppendUnique(LIBPATH=[Dir('../../common')]) | ||
shm_test_env.AppendUnique(LIBPATH=[Dir('../../cart')]) | ||
shmtest = shm_test_env.d_program(File("shm_test.c"), LIBS=['gurt', 'daos_common', 'cart', | ||
'cmocka', 'rt', 'pthread']) | ||
denv.Install('$PREFIX/bin/', shmtest) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think you need a DAOS server to run those tests, right?
so probably adding those as unit tests in gurt will be more appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Current tests are quite simple. DAOS server is not needed. I will change the tests as unit tests. We will add ftest later when necessary. Thank you!
src/include/gurt/shm_alloc.h
Outdated
#ifndef __DAOS_SHM_ALLOC_H__ | ||
#define __DAOS_SHM_ALLOC_H__ | ||
|
||
#include <stdint.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's probably better to create 1 header for this rather than multiple public headers that users of this module need to always include.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much! Just fixed it. Now only necessary APIs are exposed in this header file.
Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15613/6/testReport/ |
Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>
Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>
Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>
Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15613/14/testReport/ |
Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand of the best DAOS practice, the internal header should not be located in the src/include directory
to not be visible by the end user. But I could be wrong on this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I will move it to src/gurt. Thank you!
_Atomic int ref_count; | ||
/* global counter used for round robin picking memory allocator for large memory request */ | ||
_Atomic uint64_t large_mem_count; | ||
/* array of pointors to memory allocators */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT
/* array of pointors to memory allocators */ | |
/* array of pointers to memory allocators */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good catch. Thank you! Will fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating a new hash map, could it not be possible to update the current DAOS htable implementation with your shared memory new features ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I inclined to take current DAOS htable implementation and modify it to fit shared memory at the beginning. I decided to implement hash table in shared memory from scratch after I realized many parts are not compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I will thus have a more in depth look to this file.
Features: shm
Required-githooks: true
Skipped-githooks: codespell
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: