Skip to content
This repository has been archived by the owner on Nov 4, 2024. It is now read-only.

Merging PMDebugger into Pmemcheck #83

Open
wants to merge 11 commits into
base: pmem-3.15
Choose a base branch
from

Conversation

dibang2008
Copy link

@dibang2008 dibang2008 commented May 11, 2021

This change is Reviewable

Copy link
Contributor

@krzycz krzycz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please, squash your commits.
  2. Please, format your code uniformly and try to follow the same coding style as in the existing code.
    Yes, it matters.
  3. Is there any documentation on what has changed and why?
  4. Any unit tests for that?

Reviewed 2 of 3 files at r1.
Reviewable status: 2 of 3 files reviewed, 2 unresolved discussions (waiting on @dibang2008)


pmemcheck/pmc_include.h, line 35 at r1 (raw file):

Bool is_delete;

I'd suggest to move it to the end of the struct. Otherwise you waste 7 bytes for padding in each instance of this struct. And if you preallocate an array of 100000000 such structs it's quite a lot of wasted memory.
In general, it would be nice to arrange elements in this and arr_md structures in a way that we don't waste too much memory.


pmemcheck/pmc_main.c, line 63 at r1 (raw file):

100000000UL

Could it be configurable?

@pbalcer
Copy link
Member

pbalcer commented May 11, 2021

@krzycz this is context of https://dl.acm.org/doi/abs/10.1145/3445814.3446744?sid=SCITRUS @dibang2008 has agreed to upstream those changes back.
But yes, I agree with your points. But instead of squashing the commits, I would recommend simply improving the commit message to accurately reflect the why and how of the change.

… why; (3) move is_delete to the end of the struct and change 100000000UL to 1000UL
@dibang2008
Copy link
Author

I have done following things according to your comments:

  1. format my codes according to existing codes;
  2. add a document (named PMDebuggerToPmemcheck.md) to simply illustrate what has changed and why;
  3. move is_delete to the end of the structure;
  4. change 100000000UL to 1000UL to save memory.

For unit tests, I have used the existing tests in Pmemcheck to verify my code (perl tests/vg_regtest pmemcheck) and I am not sure that is suitable for you. If you want more comprehensive unit tests, please give me more details (I am not familiar with this area).

Thanks

Copy link
Member

@pbalcer pbalcer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done an initial high-level pass of this PR. In general, I think this change should be done behind an abstraction that hides the details of the two data structures used for pmem stores. Right now, all this code is intertwined and hard to read for me.
I'm also worried about the maintainability of the implementation if there are two data structures that need to be updated - a nice abstraction should be able to solve this problem.

pmemcheck/pmc_main.c Outdated Show resolved Hide resolved
pmemcheck/pmc_main.c Outdated Show resolved Hide resolved
pmemcheck/pmc_main.c Outdated Show resolved Hide resolved
pmemcheck/pmc_main.c Outdated Show resolved Hide resolved
@@ -32,6 +32,7 @@ struct arr_md {

/** Single store to memory. */
struct pmem_st {
Bool is_delete;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, please ensure that is data structure is as compact as possible. If the Bool type is 1 byte, then, as @krzycz pointed out, this wastes tons of memory for alignment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this 'is_delete' flag and now I use 'address size = 0' to mark a persistent memory info as deleted

pmemcheck/pmc_main.c Outdated Show resolved Hide resolved
pmemcheck/pmc_main.c Outdated Show resolved Hide resolved
pmemcheck/pmc_main.c Outdated Show resolved Hide resolved
@@ -308,12 +309,28 @@ print_multiple_stores(void)
static void
print_store_stats(void)
{
/* print store states of the array*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this debug code?

pmemcheck/pmc_main.c Outdated Show resolved Hide resolved
@dibang2008
Copy link
Author

OK, I will try to solve these problems next week, because I have to prepare my dissertation oral defense this week

@pbalcer
Copy link
Member

pbalcer commented May 19, 2021

No worries, take your time.
Good luck with your dissertation!

…move the is_delete flag; (4) abstract the process of inserting data;
@@ -294,16 +339,33 @@ print_multiple_stores(void)
static void
print_store_stats(void)
{
/* print store states of the array */
struct pmem_st *tmp;
for (int s_index=0; s_index<=pInfo.info_array.m_index; s_index++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a temporary variable for pInfo.info_array. This will allow you to shorten this code quite a bit.

for (int s_index=0; s_index<=pInfo.info_array.m_index; s_index++) {
if (cmp_with_arr_minandmax(region, s_index) != -1) {
for (int i = pInfo.info_array.m_data[s_index].start_index; i < pInfo.info_array.m_data[s_index].end_index; i++){
//VG_(umsg)("remove before=%d\n",VG_(OSetGen_Size)(curr_node->pmem_stores));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this (and other comments like this)

Bool valid_flush = False;

for(int s_index=0; s_index <= pInfo.info_array.m_index; s_index++) {
if (flush_max >= pInfo.info_array.m_data[s_index].max_addr && pInfo.info_array.m_data[s_index].min_addr >= flush_info.addr &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a temp variable for pInfo.info_array.m_data[s_index]

@pbalcer
Copy link
Member

pbalcer commented Jun 1, 2021

Thanks for cleaning up this code. Looks much nicer now, but there are still some stylistic changes that make it a bit hard to read. I've pointed out a few in my comments.

I'm still a bit worried that this change is going to make pmemcheck a bit harder to maintain - so I'll try to think of a way to mitigate that.

@dibang2008
Copy link
Author

I have fixed these issues. Thanks for your valuable suggestions!

Copy link
Member

@pbalcer pbalcer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to run regression testing over this and I also asked for someone else in my team to take a look at your patch.
Thanks.

Reviewable status: 1 of 4 files reviewed, 6 unresolved discussions (waiting on @dibang2008 and @krzycz)


pmemcheck/pmc_include.h, line 20 at r4 (raw file):

/** Metadata structure for store information array */
struct arr_md {
    UWord end_index; //Index of current free metadata

can you explain in a comment how the end_index and start_index fields work? And in general, what's the relation between this structure and pmem_stores in the pmem_stores_array.

If my understanding is correct, there can be more than one pmem_st per arr_md, and this structure has a slice into the stores array. Would it be possible to reorganize it so that this is more obvious, like so:


struct pmem_stores_span {
  Addr min_addr;
  Addr max_addr;
  Vec<pmem_st> stores;
  enum flushed_state state;
};

struct pmem_stores_array {
  Vec<pmem_stores_span> store_spans;
}

where Vec<T> is just a dynamic array of elements of type T.

Or am I wrong entirely?:D


pmemcheck/pmc_main.c, line 901 at r4 (raw file):

        if (cmp_with_arr_minandmax(store, s_index) != -1) {
            for (int i = pInfo.info_array.m_data[s_index].start_index; i < pInfo.info_array.m_data[s_index].end_index; i++) {
                if (LIKELY(pInfo.info_array.pmem_stores[i].size != 0)) {

please simplify control flow:

if (... == 0)
  continue;

  existing = ...;
  

pmemcheck/pmc_main.c, line 1312 at r4 (raw file):

    struct pmem_st *being_fenced = NULL;
    for(int s_index=0; s_index <= pInfo.info_array.m_index; s_index++) {
        if (pInfo.info_array.m_data[s_index].state != ALL_FLUSHED) {      

please use a temp variable for pInfo.info_array.m_data[s_index]

And please try to keep lines to under 80.

@dibang2008
Copy link
Author

Yes, your understanding is correct and we can make our structures more obvious. But there is one problem: a dynamic structure (such as Oset in Valgrind) may introduce extra overheads because a dynamic structure requires reallocations, deletions, and rebalancing. These overheads can be very large as the number of instructions increase.

We can try a dynamic structure but we need to find a more lightweight structure supported by Valgrind (The Oset structure is not a good choice). Any suggestion? Thanks.

@pbalcer
Copy link
Member

pbalcer commented Jun 29, 2021

Well, you are already using a reallocating array - so my suggestion would be to wrap it in a small abstraction called "Ovec" or something like that.
EDIT: This already exists in valgrind and is called XArray.

Btw, we've run regression testing over PMDK using valgrind from this patch and many of our tests have failed with this error:

[line 55881] obj_basic_integration/TEST1 pmemcheck1.log valgrind: m_mallocfree.c:305 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed.
[line 55882] obj_basic_integration/TEST1 pmemcheck1.log valgrind: Heap block lo/hi size mismatch: lo = 48032, hi = 8.
[line 55883] obj_basic_integration/TEST1 pmemcheck1.log This is probably caused by your program erroneously writing past the
[line 55884] obj_basic_integration/TEST1 pmemcheck1.log end of a heap block and corrupting heap metadata.  If you fix any
[line 55885] obj_basic_integration/TEST1 pmemcheck1.log invalid writes reported by Memcheck, this assertion failure will
[line 55886] obj_basic_integration/TEST1 pmemcheck1.log probably go away.  Please try that before reporting this as a bug.
[line 55887] obj_basic_integration/TEST1 pmemcheck1.log
[line 55888] obj_basic_integration/TEST1 pmemcheck1.log
[line 55889] obj_basic_integration/TEST1 pmemcheck1.log host stacktrace:
[line 55890] obj_basic_integration/TEST1 pmemcheck1.log ==1747976==    at 0x58008F28: show_sched_status_wrk (in /usr/local/lib/valgrind/pmemcheck-amd64-linux)
[line 55891] obj_basic_integration/TEST1 pmemcheck1.log ==1747976==    by 0x58009037: report_and_quit (in /usr/local/lib/valgrind/pmemcheck-amd64-linux)
[line 55892] obj_basic_integration/TEST1 pmemcheck1.log ==1747976==    by 0x580091BE: vgPlain_assert_fail (in /usr/local/lib/valgrind/pmemcheck-amd64-linux)
[line 55893] obj_basic_integration/TEST1 pmemcheck1.log ==1747976==    by 0x5800FA01: blockSane (in /usr/local/lib/valgrind/pmemcheck-amd64-linux)
[line 55894] obj_basic_integration/TEST1 pmemcheck1.log ==1747976==    by 0x580136A6: vgPlain_arena_realloc (in /usr/local/lib/valgrind/pmemcheck-amd64-linux)
[line 55895] obj_basic_integration/TEST1 pmemcheck1.log ==1747976==    by 0x580018C3: pInfo_AllocNodeInArray (in /usr/local/lib/valgrind/pmemcheck-amd64-linux)
[line 55896] obj_basic_integration/TEST1 pmemcheck1.log ==1747976==    by 0x58002954: trace_pmem_store (in /usr/local/lib/valgrind/pmemcheck-amd64-linux)
[line 55897] obj_basic_integration/TEST1 pmemcheck1.log ==1747976==    by 0x1002A1D107: ???
[line 55898] obj_basic_integration/TEST1 pmemcheck1.log ==1747976==    by 0x100288DF2F: ???
[line 55899] obj_basic_integration/TEST1 pmemcheck1.log ==1747976==    by 0xB7DA: ???

Can you reproduce this issue on your end?

@dibang2008
Copy link
Author

Can you send me the regression testing? It can help me reproduce this bug. I am not sure if the regression testing can be open-source. If not, can you give some hints of input data? I will try to reproduce this bug. Thanks :).

@pbalcer
Copy link
Member

pbalcer commented Jul 2, 2021

We've simply ran the entire PMDK test suite under this version of pmemcheck. The specific output example was from obj_basic_integration TEST1. You can run it by going into the directory of that test (src/test/obj_basic_integration) and then ./TESTS.py -u 1.

@dibang2008
Copy link
Author

I also have a bug for this test but it is not the same as yours. It is as follow:

obj_basic_integration/TEST1: SETUP	(medium/pmem/debug/pmemcheck)

Error 1
Last 0 lines of /home/aim/bangdi/pmdk/src/test/obj_basic_integration/err1.log below (whole file has 0 lines):
Last 2 lines of /home/aim/bangdi/pmdk/src/test/obj_basic_integration/out1.log below (whole file has 2 lines):
obj_basic_integration/TEST1: START: obj_basic_integration
 /home/aim/bangdi/pmdk/src/test/obj_basic_integration/obj_basic_integration /mnt/dbpmemfs/obj_basic_integration_1/testfile1
Last 2 lines of /home/aim/bangdi/pmdk/src/test/obj_basic_integration/trace1.log below (whole file has 2 lines):
{obj_basic_integration.c:654 main} obj_basic_integration/TEST1: START: obj_basic_integration
 /home/aim/bangdi/pmdk/src/test/obj_basic_integration/obj_basic_integration /mnt/dbpmemfs/obj_basic_integration_1/testfile1
Last 30 lines of /home/aim/bangdi/pmdk/src/test/obj_basic_integration/pmemcheck1.log below (whole file has 54 lines):
==6698==    by 0x1002A8DF17: ???
==6698==    by 0x1002A8DF2F: ???
==6698==    by 0x1002A8DF3F: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 6698)
==6698==    at 0x4C81A94: ulog_construct (ulog.c:178)
==6698==    by 0x4C5DA64: lane_init_data (lane.c:345)
==6698==    by 0x4C6A7CE: obj_descr_create (obj.c:905)
==6698==    by 0x4C6C3BD: pmemobj_createU (obj.c:1411)
==6698==    by 0x4C6C57F: pmemobj_create (obj.c:1452)
==6698==    by 0x10FC27: main (obj_basic_integration.c:666)
client stack range: [0x1FFEFFB000 0x1FFF000FFF] client SP: 0x1FFEFFFA30
valgrind stack range: [0x100298E000 0x1002A8DFFF] top usage: 7088 of 1048576


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

This bug has been fixed by aligning struct arr_md in pmc_include.h.
You can try my new commit. If bugs still exist, I think your testconfig.py may help us reproduce your bugs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants