Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix infinite loop due to integer overflow when reading large strings #1350

Closed
wants to merge 3 commits into from

Conversation

adamjshook
Copy link

If the StreamReadConstraints are set to Integer.MAX_VALUE, reading a large string results in an infinite loop due to the value of max overflowing to a negative value. while (ptr < max) is always false so break ascii_loop is never reached.

This change uses Math.addExact to throw an ArthimeticException if an overflow were to occur and sets max to max integer, allowing the loop to break and large strings to be properly deserialized.

This can be reproduced like so, where the call to nextTextValue will trigger the infinite loop. After applying the patch, the code completes. I can add the test case somewhere (or put it in a new test class) however it requires a large JVM to run and I'm not sure the size of the JVMs used for testing.

    @Test
    void testLargeStringDeserialization() throws Exception {
        String largeString = RandomString.make(Integer.MAX_VALUE - 1024);
        JsonFactory f = JsonFactory.builder()
                .streamReadConstraints(StreamReadConstraints.builder()
                        .maxStringLength(Integer.MAX_VALUE)
                        .build())
                .build();
        ByteArrayOutputStream bytes = new ByteArrayOutputStream();
        JsonGenerator g = f.createGenerator(bytes);
        g.writeString(largeString);
        g.close();

        JsonParser parser = f.createParser(bytes.toByteArray());
        String parsedString = parser.nextTextValue();

        assertEquals(largeString, parsedString);
    }

If the StreamReadConstraints are set to Integer.MAX_VALUE, reading a string results in an infinite loop due to the value of max overflowing to a negative value. while (ptr < max) is always false so break ascii_loop is never reached.

This change uses Math.addExact to throw an ArthimeticException if an overflow were to occur and sets max to max integer, allowing the loop to break and large strings to be properly deserialized.
@pjfanning
Copy link
Member

Fair enough. Can't you add the test in the PR?

try {
max = Math.min(_inputEnd, Math.addExact(ptr, outBuf.length - outPtr));
} catch (ArithmeticException ex) {
max = Integer.MAX_VALUE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way that this could be set to something like the StreamReadConstraints max number size? For people who might set a bigger max number size but that don't go for the absolute max.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eg would _inputEnd work?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, either _inputEnd or _streamReadConstraints.getMaxStringLength() will fix the issue. Do you have a preference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

min of both?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, looking at the code Integer.MAX_VALUE looks like a good fallback

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think Integer.MAX_VALUE would be fine since we're just trying to prevent an overflow here. max would generally be larger than the max string length or _inputEnd. Verification that we don't go beyond the max string length happens when data is read.

@adamjshook
Copy link
Author

Fair enough. Can't you add the test in the PR?

Added the test case but I see the jobs are failing with Java heap space errors. The JVMs would need to be scaled up for the test and I'm not sure the appetite on that one for this change. I needed to set the JVM args to 16G to get it to pass.

@pjfanning
Copy link
Member

@cowtowncoder is there some kind of annotation or pattern that we could use to mark a test as too big to run normally?

@pjfanning
Copy link
Member

pjfanning commented Oct 21, 2024

@adamjshook The same issue appears to occur in 2 places in NonBlockingUtf8JsonParserBase - https://github.com/search?q=repo%3AFasterXML%2Fjackson-core%20%22ptr%20%2B%20(outBuf.length%20-%20outPtr)%22&type=code

And one more:

int max2 = _inputPtr + (outBuf.length - outPtr);

@adamjshook
Copy link
Author

@adamjshook The same issue appears to occur in 2 places in NonBlockingUtf8JsonParserBase - https://github.com/search?q=repo%3AFasterXML%2Fjackson-core%20%22ptr%20%2B%20(outBuf.length%20-%20outPtr)%22&type=code

And one more:

int max2 = _inputPtr + (outBuf.length - outPtr);

Sure, can update in these other places as well.

@cowtowncoder
Copy link
Member

cowtowncoder commented Oct 21, 2024

@cowtowncoder is there some kind of annotation or pattern that we could use to mark a test as too big to run normally?

I don't think there is a specific annotation.

But I think there is already one test that is skipped by default, for running long; but can then be run from IDE manually. I'll see if I can find it.

EDIT: it's in src/test/java/com/fasterxml/jackson/core/io/NumberOutputTest.java but basic @Disabled, so locally need to comment out

@cowtowncoder
Copy link
Member

Ok the basic idea is sound; even if we have no way to automate test runs. I'm ok with that.

Couple of notes...

First, I think this would be good to get in 2.18(.1); while unlikely to occur, infinite loops are nasty. So I think PR should be against 2.18.

Second: I'd prefer not using exception or Math.addExact() -- can simply check that result is positive number (in this case it cannot roll over to 0).
But I think it could/should be extracted as a helper method in a base class (I can help find the space).

@cowtowncoder cowtowncoder added the 2.18 Issues planned at earliest for 2.18 label Oct 22, 2024
@adamjshook
Copy link
Author

Ok the basic idea is sound; even if we have no way to automate test runs. I'm ok with that.

I can add @Disabled to the existing test so at least it is available should someone choose to run it.

First, I think this would be good to get in 2.18(.1); while unlikely to occur, infinite loops are nasty. So I think PR should be against 2.18.

Agreed, I can rebase on 2.18 and change the base of this PR to 2.18 as well.

Second: I'd prefer not using exception or Math.addExact() -- can simply check that result is positive number (in this case it cannot roll over to 0). But I think it could/should be extracted as a helper method in a base class (I can help find the space).

Sure, can check after the fact if it is negative and set it to Integer.MAX_VALUE if so. Let me know what base class you had in mind and I can add a function there and update the locations pointer out here to use it. I'll get it updated tomorrow morning and push.

As an aside, I'll be unavailable Oct 24th to Oct 29th. I can continue addressing any additional comments when I return on the 30th, but feel free to finish it up if there is some urgency to do so.

@cowtowncoder
Copy link
Member

Thank you @adamjshook ! I'll see if I can play with this PR wrt base class/method to use (or if shared method even makes sense).

@ShemTovYosef
Copy link

Hi @adamjshook, we have started to see similar behavior since the upgrade to version 2.17.2.
Is there a way to work around this issue programmatically until the fix is merged?

@pjfanning
Copy link
Member

Hi @adamjshook, we have started to see similar behavior since the upgrade to version 2.17.2. Is there a way to work around this issue programmatically until the fix is merged?

@ShemTovYosef you should provide more details if you need help. If this problem only happened since you upgraded, can't you revert to the old version that you used? Can you tell us that version number so others know?

It still isn't clear if this is an old issue or one that was introduced recently.

@pjfanning
Copy link
Member

I created #1352 targeting 2.17 branch because this issue seems to have bad consequences.

@pjfanning
Copy link
Member

I tried the test case with Jackson 2.14 and the issue appears there. So this does not appear to a new issue and in particular, does not appear to relate to the recent refactors of jackson-core to support StreamReadConstraints.

@pjfanning
Copy link
Member

@adamjshook could you test the change in #1352? - it appears to fix the issue that you reported (in my testing) but it would be good to get it verified by more people

@cowtowncoder
Copy link
Member

Yes, this is probably a long-time bug; but due to its nature rare (unlikely to occur). But pretty nasty one if it ever happens.
After fix, such case would probably fail due to other problems, but at least not get into busy loop & hog CPU.

@pjfanning
Copy link
Member

As @cowtowncoder has pointed out elsewhere, anyone hitting this is not far off hitting an even bigger issue and that is the fact Jackson uses byte arrays internally. This issue appears to only happen if you have strings in the JSON that are approaching this limit (about 2Gb). Even if we fix this, users will soon be risking going over the 2Gb limit and hitting a new array size exception.

I really can't understand how anyone would want such bigs strings in JSON docs. There have got to be better data structures for this.

@cowtowncoder
Copy link
Member

cowtowncoder commented Oct 28, 2024

I agree w/ @pjfanning that use of 2+ gig String values seems ill-advised.

But just for sake of completeness, on size limits imposed: Jackson does not actually limit things to 2 gigs except where basic JDK types are the limiting factor due to int length markers (namely, String and Java arrays are both limit to 2^31 chars/elements).

So:

  1. Input length not limited when reading using InputStream or Reader (or with Async provider)
  2. Output length is similarly unlimited w/ OutputStream or Writer
  3. Even long JSON String values can be read as content is accumulated using TextBuffer which uses segments -- 2 GB is not the limit -- however, accessing this is trickier as it cannot be accessed as String (... since String does have 2 gig char max).

Wrt (3) one can use:

public int getText(Writer writer) 

which will provide the whole JSON String value, if caller can provide Writer that can build something that can contain longer content than 2 billion chars.

So technically it is possible to handle humongous content. :)
Not convenient but possible.

EDIT: Actually... I may need to take that back. Due to MAX_STRING_LENGTH, we now do limit the longest JSON String value to 2 gigs, I think. Hmmh. That was not an intentional added limitation but an oversight.

EDIT/2: Or, does it? In fact, setting maxStringLength to Integer.MAX_VALUE does prevent exception since no length can exceed it.

@cowtowncoder
Copy link
Member

Closing this in favor of #1352.

@cowtowncoder
Copy link
Member

Thank you for reporting this, suggesting the fix @adamjshook ! I merged #1352 so this will be fixed in:

  • 2.17.3
  • 2.18.1

@adamjshook
Copy link
Author

Thank you for reporting this, suggesting the fix @adamjshook !

Thank you for finishing it up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.18 Issues planned at earliest for 2.18
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants