-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MIR_scan_string_s #341
base: v0_master
Are you sure you want to change the base?
Conversation
I am not sure about this PR. I am trying not to change API w/o significant advantages. But I'll think more about this. |
it would appear this PR makes code slower for no good reason other than to appear as "safe" as Annex K |
I made this PR because I don't want to append the final \0 for mmaped files. \0 can appear inside a normal file. The current parser stop parsing at first \0. Maybe it should be an error, or ignore the \0? |
Not all strings are nul-terminated. Raw text files are a common case, as mentioned above. And this comes up a lot in FFI: I’ve worked with several languages whose strings are passed into C as unterminated (pointer, length) pairs. IIRC both Go and Python do this. And then there’s C++’s If the overhead of calling strlen is a problem, that could be removed; instead, just keep the nul check, and set the max len to a huge value in the existing function so the nul byte will be hit first. (There’s no valid reason to have a 00 byte in either ASCII or UTF-8 text.) On the other hand, I see the string API as mostly for debugging, so is it really important to save the overhead of copying the text into a nul-terminated buffer? |
I have applied the suggestion in the last commit. |
Pass all test.
I'm not sure what the best API would be. At least
MIR_scan_string_s
should be safer.More improvement opportunities: Once it is sure that no \0 is in input string, maybe other code can be simplified too?