Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve ByteUtil printable char validation of multi-byte charset #1001

Conversation

fbruton
Copy link
Collaborator

@fbruton fbruton commented Oct 24, 2024

Planning to add additional unit-tests.

@fbruton fbruton changed the title improve ByteUtil printable character validation of multi-byte character set improve ByteUtil printable char validation of multi-byte charset Oct 24, 2024
@fbruton fbruton marked this pull request as draft October 24, 2024 17:44
@fbruton fbruton marked this pull request as ready for review October 24, 2024 18:20
* @param bytes the bytes to be scanned.
* @return whether or not there were non-printable values.
* @param utf8Bytes the bytes to be scanned.
* @return regardless of whether there were non-printable values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@return true if the byte array contains non-printable bytes as defined as .....

@Test
void testHasnonPrintableValues() {
String newLineCarriageTab = "\tThis is line one\r\nThis is line two\nThis is line three\n\nEnding with a tab";
assertFalse(ByteUtil.hasNonPrintableValues(newLineCarriageTab.getBytes(StandardCharsets.UTF_8)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible there is more than one definition, but I thought non-printing characters were tab, newline, etc. Maybe I need more concise definition of what we're trying to identify.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is it supposed to end with a tab? \t?

Copy link
Collaborator Author

@fbruton fbruton Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be called binary or valid text.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code allowed tabs, new lines and carriage returns, which makes sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is it supposed to end with a tab? \t?

The tab was at the beginning of the sentence, and now it aligns with the description.

@fbruton fbruton marked this pull request as draft October 24, 2024 19:19
@fbruton fbruton marked this pull request as ready for review October 24, 2024 20:05
@jpdahlke jpdahlke added this to the v8.17.0 milestone Oct 24, 2024
@fbruton fbruton requested a review from ldhardy October 28, 2024 23:52
@ldhardy ldhardy self-requested a review November 1, 2024 18:17
@dev-mlb
Copy link
Collaborator

dev-mlb commented Nov 7, 2024

I'm thinking we should merge the ideas from this pr and #1006

@jpdahlke jpdahlke modified the milestones: v8.17.0, v8.18.0 Nov 7, 2024
Copy link
Collaborator

@cfkoehler cfkoehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My experience in this area is pretty minimal. But the code makes sense to me and seems to do what is needed.

@fbruton fbruton closed this Nov 12, 2024
@jpdahlke jpdahlke removed this from the v8.18.0 milestone Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants