Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase MAX_PATTERN_LEN and STRING_SIZE constants #164

Merged
merged 4 commits into from
Dec 1, 2023

Conversation

JDTruj2018
Copy link
Collaborator

Simply increasing these constants to account for some of the larger LANL patterns.

@JDTruj2018
Copy link
Collaborator Author

Interestingly, I am not able to reproduce the segfaults when running make test

@jyoung3131
Copy link
Contributor

The build failure is weird, but I think maybe it's overflowing some string(s) in the test environment. I'm going to try and test locally as well.

The build log has this output:

/home/runner/work/spatter/spatter/src/parse-args.c: In function ‘parse_runs’:
/home/runner/work/spatter/spatter/src/parse-args.c:392:45: warning: ‘%s’ directive writing up to 63999999 bytes into a region of size 63999985 [-Wformat-overflow=]
392 | sprintf(output, "Invalid kernel %s\n", kernel_name);
| ^~ ~~~~~~~~~~~
In file included from /usr/include/stdio.h:867,
from /home/runner/work/spatter/spatter/src/parse-args.c:3:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:36:10: note: ‘__builtin___sprintf_chk’ output between 17 and 64000016 bytes into a destination of size 64000000
36 | return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
37 | __bos (__s), __fmt, __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

@jyoung3131
Copy link
Contributor

Adding some more info - I pulled this change, built it and tested it locally with the serial backend. I can recreate this failure.

Program received signal SIGSEGV, Segmentation fault.
0x0000555555589c4c in parse_runs (argc=<error reading variable: Cannot access memory at address 0x7ffff48e1f6c>,
    argv=<error reading variable: Cannot access memory at address 0x7ffff48e1f60>) at ../spatter/src/parse-args.c:355
355     {

(gdb) bt full
#0  0x0000555555589c4c in parse_runs (argc=<error reading variable: Cannot access memory at address 0x7ffff48e1f6c>,
    argv=<error reading variable: Cannot access memory at address 0x7ffff48e1f60>) at ../spatter/src/parse-args.c:355
        pattern_found = <error reading variable pattern_found (Cannot access memory at address 0x7ffff48e1f74)>
        pattern_scatter_found = <error reading variable pattern_scatter_found (Cannot access memory at address 0x7ffff48e1f78)>
        pattern_gather_found = <error reading variable pattern_gather_found (Cannot access memory at address 0x7ffff48e1f7c)>
        rc = <error reading variable rc (Cannot access memory at address 0x7ffff48e1ff8)>
#1  0x0000555555589a46 in parse_args (argc=2, argv=0x602000000010, nrc=0x7fffffffd490, rc=0x7fffffffd4d0) at ../spatter/src/parse-args.c:345
        nerrors = 0
        json = 0

@jyoung3131
Copy link
Contributor

In my tests, it seems the max STRING_SIZE that doesn't crash with GCC 8/10/12 is 8300000.

One suggestion might be that we specify which variable(s) need to be extended and make sure we use a separate global to set the size of just those array inputs.

@jyoung3131
Copy link
Contributor

We talked about possible solutions

  1. Determine pattern length when reading in patterns to allocate buffers.
  2. Use included "pattern length cutoffs" in some JSON inputs to set the pattern length before allocating memory

@JDTruj2018
Copy link
Collaborator Author

@jyoung3131 I've gone with option 1 for now and determine the pattern length when reading in the patterns to allocate buffers. The JSON parser does some of this work and has a field with the length of the read in buffer, so using that instead of STRING_SIZE took care of some of those issues.

The other tweaks this required was

  1. adding a second safestrcopy function that takes a string size as a parameter rather than using the STRING_SIZE constant
  2. changing the generator field to be dynamically allocated rather than allocated on the stack with STRING_SIZE. This gets malloced right before usage now to be the correct size.

@JDTruj2018
Copy link
Collaborator Author

@jyoung3131
Is the GPU runner operational at the moment? Looks like it has been hanging for about an hour.

@jyoung3131
Copy link
Contributor

Hi Jered - our GPU runner is down due to some unexpected upgrades that broke Slurm for a few of our GPU nodes. I'll fix it and rerun this PR for merging.

@jyoung3131
Copy link
Contributor

Hi Jered - unfortunately it looks like the GPU Ustride test passed but the AMG test does not. Do you think this is related to some of our other issues like #154?

Min         25%          Med          75%          Max
31013.2      67950        107861       226297       932613
H.Mean       H.StdErr
81553.1      16133.3
*** buffer overflow detected ***: terminated
Aborted (core dumped)
Test failure on ../spatter -pFILE=../../standard-suite/app-traces/amg_gpu.json
<end of output>
Test time =  32.45 sec
----------------------------------------------------------
Test Failed.
"standard_suite_gpu_test" end time: Nov 29 21:49 EST
"standard_suite_gpu_test" time elapsed: 00:00:32
----------------------------------------------------------

End testing: Nov 29 21:49 EST

@JDTruj2018
Copy link
Collaborator Author

JDTruj2018 commented Nov 30, 2023

Hi Jeff

Thanks for re-running that!

I do not think it is related to #154. It looks like there is some code in the parse_p function that is still allocating based on MAX_PATTERN_LEN which is very large now.

        ssize_t *mypat;

        size_t psize;
        if (rc->pattern_size > 0)
            psize = rc->pattern_size;
        else
            psize = MAX_PATTERN_LEN;

        mypat = sp_malloc(sizeof(spIdx_t), psize, ALIGN_CACHE);

I am working on a few tweaks now to count the number of elements in the pattern before this allocation and I'll push that later this afternoon.

It now looks like this

        char *copy_optarg = sp_malloc(sizeof(char), strlen(optarg) + 1, ALIGN_CACHE);
        strcpy(copy_optarg, optarg);

        char *delim = ",";
        char *ptr = strtok(copy_optarg, delim);
        if (!ptr)
            error("Pattern not found", 1);

        size_t sz = 0;
        while (ptr != NULL) {
          sz++;
          ptr = strtok(NULL, delim);
        }
        free(copy_optarg);

        ssize_t *mypat;

        size_t psize = 0;
        if (rc->pattern_size > 0)
            psize = rc->pattern_size;
        else
            psize = MAX_PATTERN_LEN; 

        if (psize > sz)
          psize = sz;

        mypat = sp_malloc(sizeof(spIdx_t), psize, ALIGN_CACHE);

        ptr = strtok(optarg, delim);

        size_t read = 0;
        if (sscanf(ptr, "%zu", &(mypat[read++])) < 1)
            error("Failed to parse first pattern element in custom mode", 1);

	@@ -1300,6 +1317,9 @@ void parse_p(char* optarg, struct run_config *rc, int mode)
            if (sscanf(ptr, "%zu", &(mypat[read++])) < 1)
                error("Failed to parse pattern", 1);
        }

        assert(psize == read);

        *pattern = mypat;
        *pattern_len = read;

@jyoung3131
Copy link
Contributor

Thanks, Jered - this looks excellent! @plavin can you add your review (confirm there is no issue with your ongoing work) and we will merge this PR.

@jyoung3131 jyoung3131 self-assigned this Dec 1, 2023
Copy link
Contributor

@jyoung3131 jyoung3131 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed code updates. Tests seem to have caught many of our corner cases, and those all definitely pass now. Approved for merge.

@plavin plavin merged commit 0870dc7 into main Dec 1, 2023
3 checks passed
@jyoung3131 jyoung3131 deleted the increase-pattern-size branch August 15, 2024 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants