-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve ByteUtil printable char validation of multi-byte charset #1001
improve ByteUtil printable char validation of multi-byte charset #1001
Conversation
* @param bytes the bytes to be scanned. | ||
* @return whether or not there were non-printable values. | ||
* @param utf8Bytes the bytes to be scanned. | ||
* @return regardless of whether there were non-printable values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@return true if the byte array contains non-printable bytes as defined as .....
@Test | ||
void testHasnonPrintableValues() { | ||
String newLineCarriageTab = "\tThis is line one\r\nThis is line two\nThis is line three\n\nEnding with a tab"; | ||
assertFalse(ByteUtil.hasNonPrintableValues(newLineCarriageTab.getBytes(StandardCharsets.UTF_8))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible there is more than one definition, but I thought non-printing characters were tab, newline, etc. Maybe I need more concise definition of what we're trying to identify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is it supposed to end with a tab? \t?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it should be called binary or valid text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous code allowed tabs, new lines and carriage returns, which makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is it supposed to end with a tab? \t?
The tab was at the beginning of the sentence, and now it aligns with the description.
I'm thinking we should merge the ideas from this pr and #1006 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My experience in this area is pretty minimal. But the code makes sense to me and seems to do what is needed.
Planning to add additional unit-tests.