-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support streaming code unit sequences by saving incomplete code unit sequences as encoding state #15
Comments
I have been thinking about this topic (wrote this two things: https://github.com/ruoso/u5e/blob/master/StreamVsIterators.md I believe it's best if there is a more clear "firewall" between raw data and text. The code handling the specific streamed protocol (such as HTTP or IRC for instance) is in a much better position to validate the data before 'declaring' it to be text. Doing that in the iterator itself creates an undue burden on everyone handling that type of code. |
I agree that ensuring proper data boundaries in packet oriented protocols is best practice. I think there will always be cases where that isn't possible though. In those cases, the only solutions I've found so far are for the iterator to throw an exception, block (on advancement of the underlying code unit iterator), or the approach described in the first comment of this issue. The initial email thread where I requested feedback on text_view talked about some of these options. You can find it at: https://groups.google.com/a/isocpp.org/d/msg/std-proposals/Tu84_TQOlhc/lV0MdIq1HQAJ |
My point is that introducing that support is counter productive. It is a In practice, the industry consensus is that the only reasonable way to Any library support that weakens that firewall not only is not useful Em sáb, 24 de set de 2016 22:07, Tom Honermann [email protected]
|
I think there are legitimate use cases. People stream text across command line pipes all the time. Granted, blocking and data loss tend not to be issues in those cases. At any rate, addressing this issue is not high on my priority list. This issue was opened due to concerns raised in the email thread mentioned in #15 (comment) |
Consuming code unit sequences from a streaming source may result in attempts to decode a partial code unit sequence. At present, an exception will be thrown when such underflow occurs. An alternative would be to store the partial code unit sequence in the iterator state and then have the iterator compare equally to the end iterator. This would enable code like the following to work correctly even if buffer ends fail to fall on a code unit sequence boundary.
A problem with this approach is that it leaves open the possibility for trailing code units (e.g., garbage at the end of the encoded text) to go unnoticed. Because of this, the behavior above probably shouldn't be the default behavior, but it should be possible for code to opt in to it; perhaps via a policy class as suggested in #14.
The text was updated successfully, but these errors were encountered: