Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: slightly improve substitutions #562

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

carenas
Copy link
Contributor

@carenas carenas commented Nov 13, 2024

Avoid at least one crash introduced with recent changes to substitute code as well as clarify what the expected offset value should be when overflowing the provided buffer.

While at it, make sure that the returned string is always NUL terminated, and do some minor cleanup.

NOTE: at least truncation is wrong so posting mainly as a FYI with the hopes someone else might give it some love

Avoid at least one crash introduced with recent changes to substitute code
as well as clarify what the expected offset value should be when overflowing
the provided buffer.

While at it, make sure that the returned string is always NUL terminated,
and do some minor cleanup.
@zherczeg
Copy link
Collaborator

Could you update this patch?

@NWilson
Copy link
Member

NWilson commented Nov 20, 2024

Could you also split out the unrelated changes into their own PRs? It should be quick to do, and would let us merge the cosmetic changes and behaviour-altering changes in their own commits.

@@ -1113,7 +1112,7 @@ in the decoded tables. */

if ((code->flags & PCRE2_DEREF_TABLES) != 0)
{
ref_count = (PCRE2_SIZE *)(code->tables + TABLES_LENGTH);
PCRE2_SIZE *ref_count = (PCRE2_SIZE *)(code->tables + TABLES_LENGTH);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'm very happy with these changes.

I know Philip likes the old style of defining variables high up, at the top of a scope, and with a blank line after variable definitions.

But I don't see any benefit to having variables available for use, but not yet initialised. Much better to define & initialise at the same time (safer).

The compiler will hoist all the variables up to the top anyway (it will bump the stack pointer just once at the start of a block, rather than bump the stack pointer multiple times, when it sees a new variable).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partly it's because I'm a dinosaur from the age when one had to define variables like that, but partly also I find it makes it easier when looking back up some code to find where a variable is defined. However, I am not going to try to impose my own preferences on the future. I can certainly see the advantage of always initializing at definition time. So please don't worry about me too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny is that this change is still valid C89 code and the main motivation wasn't to go against Philip's advice of defining variables at the beginning of blocks, but just reducing the scope of this variable to where it was actually needed/used.

Since we have at least one CI job with -Wshadow and I wanted to minimize churn didn't rename the variable to reflect its "temp" holder (might be even optimized out) status.

extra_needed++;
lengthleft = 0;
}
if (!overflowed || lengthleft == 0) buffer[buff_offset] = 0; else extra_needed++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why you need to inline the CHECKMEMCPY here for, for the trailing NUL?

What was wrong before? Do you want the returned string to be NUL-terminated, even if the function returns an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want the returned string to be NUL-terminated, even if the function returns an error?

Correct, I found the way this function behaves strange and the fact that it will return non NUL terminated strings on overflow, potentially risky.

123abc123\=substitute_overflow_length,replace=[9]XYZ
123abc123\=substitute_overflow_length,replace=[6]XYZ
123abc123\=substitute_overflow_length,replace=[1]XYZ
123abc123\=substitute_overflow_length,replace=[0]XYZ
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be curious to run these new tests against the old code, just to see which (if any) of the test outputs have changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None does, but could add split the tests in a "setup" patch of its own so it will be obvious

@NWilson NWilson modified the milestone: 10.45-RC1 Dec 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants