Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16877 client: implement utilities for shared memory #15613

Open
wants to merge 11 commits into
base: feature/dfs_dcache
Choose a base branch
from

Conversation

wiliamhuang
Copy link
Contributor

Features: shm

  1. use tlsf as memory allocator
  2. shared memory create/destroy
  3. robust mutex in shared memory
  4. hash table in shared memory

Required-githooks: true
Skipped-githooks: codespell

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Features: shm

1. use tlsf as memory allocator
2. shared memory create/destroy
3. robust mutex in shared memory
4. hash table in shared memory

Required-githooks: true
Skipped-githooks: codespell

Signed-off-by: Lei Huang <[email protected]>
Copy link

Ticket title is 'To implement node-wise caching with shared memory '
Status is 'Open'
https://daosio.atlassian.net/browse/DAOS-16877

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15613/1/execution/node/1210/log

Features: shm

Required-githooks: true
Skipped-githooks: codespell

Signed-off-by: Lei Huang <[email protected]>
…hm_mutex

Features: shm

Required-githooks: true
Skipped-githooks: codespell

Signed-off-by: Lei Huang <[email protected]>
Features: shm

Required-githooks: true
Skipped-githooks: codespell

Signed-off-by: Lei Huang <[email protected]>
@wiliamhuang wiliamhuang marked this pull request as ready for review December 23, 2024 15:15
@wiliamhuang wiliamhuang requested review from a team as code owners December 23, 2024 15:15
@wiliamhuang
Copy link
Contributor Author

Some component (e.g., hash table record reference count) will be revised later when we add more use cases.

@wiliamhuang
Copy link
Contributor Author

Currently we use tlsf memory allocator. It will be replaced by our own allocator in future.

job.assign_hosts(cmocka_utils.hosts)
job.assign_environment(daos_test_env)

cmocka_utils.run_cmocka_test(self, job)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally we run the cmocka tests, like daos_test, via DaosCoreBase.run_subtest(), which sets up additional environment variables and configures the dmg command. Do we need any of that here? Note: In its current form the run_subtest() method uses Orterun to run the daos_test command remotely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a requirement for adding this test we should also run it with the faults-enabled: false commit pragma to ensure that it will run when we attempt a release build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phender Thank you very much! I wrote shm.py and shm.yaml with dfuse.py and dfuse.yaml as templates. "shm_test" does not need to configure dmg command. Additional environment variables might be in added in futures tests with "daos_test_env" here.
Thank you for your tip of using "faults-enabled: false". I will use it in future. I used "Features: shm" previously to run the new test.

Comment on lines 16 to 20
pool:
scm_size: 1G
container:
type: POSIX
control_method: daos
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used by the test. A typical cmocka test would use the pool entry information, but only when the test is run via DaosCoreBase.run_subtest():

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test "shm_test" does not need scm_size and nvme_size information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what @phender means is since this test does not use DaosCoreBase.run_subtest(), the entire pool and container keys here are not used and can be removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or if we use DaosCoreBase.run_subtest() we can keep it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daltonbohning @phender Thank you very much! I will try removing pool and container keys locally to make sure it works. I thought they are required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daltonbohning @phender You are right. We can remove pool and container keys in yaml file as you suggested. I will update it in next commit. Thank you!

Copy link
Contributor

@knard38 knard38 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not yet finish the review process, but I still have several concerns and questions regarding this PR.

@@ -0,0 +1,44 @@
"""
Copy link
Contributor

@knard38 knard38 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding this test is more a unit test and thus should probably be run with the utils/run_utest.py python script instead of by the functional test framework.
@phender and @daltonbohning what is your opinion on this point ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, yes. If the same test can be ran as a unit test (low cost, quick) instead of a functional test (higher cost, slower) then we should run it as a unit test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knard38 @daltonbohning Thank you very much! I will look into running this test as a unit test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Current tests are simple. They could fit as a unit test. More complex tests will be added in future. We can add shared memory related ftest later when we need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wiliamhuang , If it can help, I recently added some unit tests in the following PR: https://github.com/daos-stack/daos/pull/14713/files#diff-294ea4ccb7880cabe2a9a4ffadd3c709916da304a39a819c82f23bfe06197a61

@knard38 Thank you very much! It's very helpful.

/* failed to open */
if (shm_ht_fd == -1) {
if (errno == ENOENT) {
goto create_shm;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT, could improve readability to put this code in a dedicated function instead of goto ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I will create a function for shm creation. Thank you!

uint64_t shm_pool_size;
/* reserved for future usage */
char reserved[256];
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be needed if we want to have the mmaped address space to be well aligned

Suggested change
};
} __attribute__((aligned(PAGE_SIZE)));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! The address returned by mmap() always is page aligned.

shm_pool_size = shm_size / N_SHM_POOL;
if (shm_pool_size % 4096)
/* make shm_pool_size 4K aligned */
shm_pool_size += (4096 - (shm_pool_size % 4096));
Copy link
Contributor

@knard38 knard38 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To support different architectures and page size configuration, the value 4096 should probably be defined with a macro such as PAGE_SIZE

Suggested change
shm_pool_size += (4096 - (shm_pool_size % 4096));
shm_pool_size += (PAGE_SIZE - (shm_pool_size % PAGE_SIZE));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I will update it as you suggested to make it more portable.


/* map existing shared memory */
shm_addr = mmap(FIXED_SHM_ADDR, shm_size, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_FIXED, shm_ht_fd, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using fixed memory location seems to be strongly discouraged by the man page.
Not sure to understand how we will be sure why it is needed and how we will be sure that it will not overlap existing mmaps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Fixed memory location is a strong limitation. It comes from the memory allocator we use. We could eliminate this limitation later once we have our own memory allocator supporting shared memory management.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i also don't get why using MAP_FIXED and the fixed address. using MAP_FIXED can cause undefined behavior if the address is actually in use by something else, no?
maybe i don't get the requirement why you need this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.
The requirement of using fix same address across processes is due to memory allocator we use. This is a quick and dirty way to allow us to use existing memory allocator for now. In future we need to implement our own memory allocator to natively support shared memory management, then the limitation could be removed.

src/gurt/shm_alloc.c Outdated Show resolved Hide resolved
src/gurt/shm_alloc.c Outdated Show resolved Hide resolved
char daos_shm_file_name[128];

sprintf(daos_shm_file_name, "/dev/shm/%s_%d", daos_shm_name, getuid());
unlink(daos_shm_file_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT,
From my understanding shm_unlink() and shm_link() are equivalent, but using shm_unlink() with the fd opened with shm_open() seems to be more understandable to me: explicitly indicating that we are closing a file descriptor get with shm_open()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I will replace unlink() with shm_unlink(). Thank you!

uint64_t oldref;

if (idx_small < 0) {
tid = syscall(SYS_gettid);
Copy link
Contributor

@knard38 knard38 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of calling syscall(SYS_gettid) (which needs to trap into the kernel), a call to a user land function such as pthread_self() could be more suited ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never used "pthread_self" before.
https://man7.org/linux/man-pages/man3/pthread_self.3.html

The thread ID returned by pthread_self() is not the same thing as
the kernel thread ID returned by a call to gettid.

This syscall will be called only once. I would not worry about the overhead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fully agree with you on this, this function should be called for each memory allocation and thus its performance have some impact. However, this point is not a blocker for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested pthread_self() and syscall(SYS_gettid).

Hello from thread 1! tid_pthread_self = 140335788001024 SYS_gettid = 3023160
Hello from thread 2! tid_pthread_self = 140335779608320 SYS_gettid = 3023161
Hello from thread 3! tid_pthread_self = 140335771215616 SYS_gettid = 3023162
Hello from thread 4! tid_pthread_self = 140335762822912 SYS_gettid = 3023163

It looks like all tid_pthread_self are even number here. They are not evenly distributed to me. syscall(SYS_gettid) is only called once per thread instead of every memory allocation.


atomic_fetch_add_relaxed(&(d_shm_head->ref_count), -1);
if (pid != pid_shm_creator)
munmap(d_shm_head, d_shm_head->size);
Copy link
Contributor

@knard38 knard38 Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, all the process should unmap and then close the file.
The shared mmaped file content should be keept until the unlink() and shm_unlink() will be called.
Thus, not unmapping and closing with the process creating the shared memory file seems to be useless.
I am also concerned that it could be seen as memoy leak by valgrind or other memory checker tools.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the cache in kernel space, I thought we may want to keep our cache persistent too. Otherwise, shared memory needs to be initialized again and again. Ideally, the space for caching would be freed after the content expires. We need our own shared memory allocator to dynamically expand/shrink shared memory region. It would be a long way to get there.
Yes. I did have a little concern about whether valgrind can detect the memory leak in shared memory usage here. I could play with valgrind to find out with simple test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that it will change something for the kernel cache to keep a dangling pointer when the process will die.
From my understanding what will make the difference is to unlink the file.
@mchaarawi do you have an opinion on this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One extreme case, the caching does not have any benefit at all if a user runs job serially in case we destruct caching once application ends.

Copy link
Contributor

@knard38 knard38 Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, I am missing something, but if you remove the shared file then keeping a dangling pointer will not help if you have unlink the file. On the other hand, if you do not unlink the file, the cache will still be available even if you do not have a dangling pointer at the end your applications. From my understanding the live of the cache is managed by the kernel and it will be removed when the file will be unlinked.
However, it is perfectly that I am missing something obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shm_destroy() is not called by regular applications. It was called only in the shm_test. The file associated with shared memory will not be unlinked.
I talked to Mohamad recently. He suggested daos_agent will initialize and destruct shared memory region. We will update it later.

if (shm_pool_size % 4096)
/* make shm_pool_size 4K aligned */
shm_pool_size += (4096 - (shm_pool_size % 4096));
shm_size = shm_pool_size * N_SHM_POOL + sizeof(struct d_shm_hdr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact my remarks on aligning the struct d_shm_addr was to have a sizeof struct which will be a multiple of the PAGE_SIZE otherwise shm_size could be not a multiple of PAGE_SIZE from my understanding. Then the shared pools (following the header), could be not aligned on PAGE_SIZE (from my understanding).
However, I have not check that the align attribute will properly change the sizeof. I will check this asap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"shm_size" does not have be a multiple of PAGE_SIZE. mmap() does not require size to be a multiple of PAGE_SIZE. The memory allocator will use some space in the pool too. I am not sure making shm_size a multiple of PAGE_SIZE will bring noticeable benefit. Maybe some performance tests could help to clarify later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I know it is indeed always better to have aligned memory for performance.
Moreover, from what I understand, the size really allocated by mmap() will be the same.
The only difference is that the padding will be added at the end of allocated memory instead of being between the struct d_shm_hdr and the first shm_pool.
In any case, this is not a blocker from my side.

Copy link
Contributor

@mchaarawi mchaarawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some quick comments.
still need to review more closely. will do that soon

return 0;
}

rc = d_getenv_uint64_t("DAOS_SHM_SIZE", &shm_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably should have something different than an env variable to determine the size. i previosly implemented a utility that grabs that from the agent. i can integrate that into the branch later and replace this.


/* map existing shared memory */
shm_addr = mmap(FIXED_SHM_ADDR, shm_size, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_FIXED, shm_ht_fd, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i also don't get why using MAP_FIXED and the fixed address. using MAP_FIXED can cause undefined behavior if the address is actually in use by something else, no?
maybe i don't get the requirement why you need this.


/* the shared memory only accessible for individual user for now */
sprintf(daos_shm_name_buf, "%s_%d", daos_shm_name, getuid());
open_rw:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it work if we just use O_CREAT without O_EXCL? so you don't need the try_open then try_create semantics here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used "O_CREAT | O_EXCL" to make sure only one process will initialize shared memory. Without "O_EXCL" would allow more than one concurrent processes to initialize shared memory.

Comment on lines 16 to 24
shm_test_env = base_env.Clone()
shm_test_env.compiler_setup()
shm_test_env.AppendUnique(LIBPATH=[Dir('../../gurt')])
shm_test_env.AppendUnique(LIBPATH=[Dir('../../common')])
shm_test_env.AppendUnique(LIBPATH=[Dir('../../cart')])
shmtest = shm_test_env.d_program(File("shm_test.c"), LIBS=['gurt', 'daos_common', 'cart',
'cmocka', 'rt', 'pthread'])
denv.Install('$PREFIX/bin/', shmtest)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think you need a DAOS server to run those tests, right?
so probably adding those as unit tests in gurt will be more appropriate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Current tests are quite simple. DAOS server is not needed. I will change the tests as unit tests. We will add ftest later when necessary. Thank you!

#ifndef __DAOS_SHM_ALLOC_H__
#define __DAOS_SHM_ALLOC_H__

#include <stdint.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably better to create 1 header for this rather than multiple public headers that users of this module need to always include.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much! Just fixed it. Now only necessary APIs are exposed in this header file.

Required-githooks: true
Skipped-githooks: codespell

Signed-off-by: Lei Huang <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15613/6/testReport/

Required-githooks: true
Skipped-githooks: codespell

Signed-off-by: Lei Huang <[email protected]>
Required-githooks: true
Skipped-githooks: codespell

Signed-off-by: Lei Huang <[email protected]>
Required-githooks: true
Skipped-githooks: codespell

Signed-off-by: Lei Huang <[email protected]>
Required-githooks: true

Signed-off-by: Lei Huang <[email protected]>
Required-githooks: true

Signed-off-by: Lei Huang <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15613/14/testReport/

Required-githooks: true

Signed-off-by: Lei Huang <[email protected]>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand of the best DAOS practice, the internal header should not be located in the src/include directory to not be visible by the end user. But I could be wrong on this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I will move it to src/gurt. Thank you!

_Atomic int ref_count;
/* global counter used for round robin picking memory allocator for large memory request */
_Atomic uint64_t large_mem_count;
/* array of pointors to memory allocators */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT

Suggested change
/* array of pointors to memory allocators */
/* array of pointers to memory allocators */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good catch. Thank you! Will fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a new hash map, could it not be possible to update the current DAOS htable implementation with your shared memory new features ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I inclined to take current DAOS htable implementation and modify it to fit shared memory at the beginning. I decided to implement hash table in shared memory from scratch after I realized many parts are not compatible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I will thus have a more in depth look to this file.

@wiliamhuang wiliamhuang requested a review from mchaarawi January 15, 2025 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants