-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sixels: last line cut/truncated on terminal emulators with "correct" text cursor placement #192
Comments
Hi @dnkl. The VT340's algorithm should not be followed too strictly in modern terminals. Despite what the documentation implied, it uses a fast heuristic which relies on the character cell being 20 pixels tall. That algorithm was faster but included a glitch which should not be copied. If you are designing a terminal that uses characters that are not 20 pixels tall, the algorithm does not apply and will have to be adapted in one of two ways:
I strongly believe the first method is the correct one for most modern terminals. It lets programmers easily create software that integrates graphics with character cell text interfaces, which is to me what makes sixels useful. If you've read my discussion with j4james about whether this VT340 behavior is a "glitch", you'll see that even though he believes it is the historical behavior and thus correct for any terminal that claims to emulate a VT340, neither of us could come up with an easy solution for application programmers who want to just splat a sixel image on the screen and show some text underneath it. Since a workaround requires the application to model the internal state of the VT340, no sane program will ever intentionally use this odd behavior, whether it is technically a glitch or not. |
@hackerb9 I don't mind changing foot to always put the cursor on the last row touched by the sixel (i.e. the bottom pixel of the last sixel). What I don't want is slightly different behavior in modern terminals, and I was under the impression that the other "correct" terminals also followed the DEC algorithm? If not, I'd be more than happy to update foot. |
That said, it looks like chafa isn't emitting a newline at all, so even with the tweaked cursor placement (always put it on the last row touched by the sixel), the image is sometimes cut off. |
@PerBothner @christianparpart @wez I was hoping we could all agree on how to implement cursor placement after emitting a sixel. As far as I can tell, foot, DomTerm, Contour and Wezterm all place the cursor on the same row as the last sixel. But do you follow the DEC algorithm, and place it on the same row as the upper pixel of the last sixel, or do you place it on the last row touched by the sixel (i.e. the row containing the bottom pixel of the last sixel). I know at least some of you have been following the discussions between @hackerb9 j4james, but I don't know what you ended up implementing. From an application point of view, I think it would be beneficial if we all implemented the same cursor placement algorithm... Foot currently implements the DEC algorithm, but I think it would be easier for applications if I changed it to just place the cursor on the last row. Then, to print text under the sixel, you know all that's needed is (always) a single newline. Not one or two. But, I think it's a bad idea to change foot if all other sixel terminals implement the DEC algorithm, and don't want to change. |
I agree putting the cursor on the row containing the bottom sixel row makes more sense, and I can certainly change it if that the consensus. I prefer to match xterm.js for various reasons. https://github.com/jerch - what do you think? |
@PerBothner Imho xterm.js currently keeps the text cursor at the row of the bottom-most pixel drawn from last sixel band. Means if the last band contains only "fiftel" (6th pixel never set), the 5th pixel would be the last one, not the sixth anymore. |
I'm open to tweaking wezterm to be more sane, assuming that there are a couple of test cases with examples of where the cursor should end up. FWIW, I think the current cursor placement in wezterm may well be a bit of a fluke arising from re-using the iterm2 image protocol logic that preceded it rather than a conscious effort to implement the vt340 algorithm. wezterm's logic for this (shared by iterm2, kitty and sixel handling) can be found here: the vertical position: the horizontal position: |
It may be a good idea to get @arakiken and the other mlterm developers on board too. I've been testing with it, since it had one of the first implementations, and is still one of the fastest. It currently (as of version 3.9.3) places the cursor on the row immediately after (that is, the first character row not touched by any sixel, transparent or not). My main concerns as an application writer are a) consistency between terminals and b) simplicity of design. I'll happily support any consensus terminal developers arrive at.
I favored this approach at first, but it has the minor annoyance of deliberate image transparency being cut off. It also means applications must inspect the image data in order to know where the cursor'll end up, which is a slightly bigger problem. Correct me if I'm wrong and there's a way around this. |
How about this:
For sixels without an explicit width/height (no raster attributes), assume all sixels are 6 pixels tall. I.e don't bother inspecting the image looking for transparency.
For sixels with an explicit width/height, use the specified height.
|
following along for notcurses, good to see this effort taking place |
Sorry to intrude.... I just want to add that if it's possible to also consider the horizontal cursor position, it'd be really good (from the perspective of an application developer). A unified vertical position is good enough for aligning images with text or other images vertically but not horizontally (i.e side-by-side). Yes, it's probably possible to workaround this using absolute cursor positioning or save/restore but these are not always viable options, plus I believe the purpose of a consensus includes eliminating the need for workarounds in applications anyways. Thank you all. |
My understanding is the text cursor's horizontal position isn't changed at all. It only moves vertically.
Put another way, it is positioned "at the beginning of the sixel", i.e in the bottom **left** corner of the sixel.
|
@hpjansson related, but perhaps worth its own issue; chafa currently ends the sixel with a GNL ('-'). Is this intentional?
It adds an extra, empty, graphical row. I think it would be better to use a textual newline instead.
Fwiw, this behavior has a (very) minor performance impact on foot, for sixels with an explicit width/height, as we're forced to reallocate and enlarge the backing image buffer, and then initialize it to the background color. I'm not really bothered by it, but thought it might be worth mentioning at least.
Be happy to move this to a separate issue if you'd prefer that.
|
i went and looked at what we do in notcurses, and we do a hard cursor position after emission of any sixel. i imagine any application wanting to be portably correct will have to do the same thing, no? since they might be dealing with old terminals, or noncompliant ones, and it's not indicated via term queries? i don't want to disrupt unification, but from an app/toolkit author's perspective, i don't see how this helps...? |
I agree, mostly. It's important we unify the vertical placement since it affects scrolling.
Horizontal placement isn't as important, *except* if a terminal places it *after* the image, in which case it _might_ affect scrolling if it ends up being beyond the last column.
|
No you are right. This "bottom-most colored pixel" behavior cuts a fully transparent line of pixels at the bottom as not being part of the original image. If an image has that line intentionally, it will get stripped. Thats for level 1 sixel.
Well I put another warning into the docs not to use level 1 sixel on encoder side anymore, but to go with level 2 with explicit raster attributes denoting width and height extend. DEC STD 070 also tells us, that the graphics extends in raster attributes should never be exceeded by encoders, thus my decoder uses these to trim the graphics, which also solves the issue of non multiple-of-6 image heights in a more deterministic way.
Thats not possible with sixel level 1, it has no width idea. Every sixel band can have different sixel cursor width to the right (an image might be ragged to the right) - which one to choose from? Btw xterm.js also uses the VT340 cursor for IIP as the only supported cursor mode to level out image sequence differences. While it is more annoying to deal with that cursor mode as app dev, if you want to place text right of the image, its handling is always the same:
|
Foot currently allows "level 2" sixels to be extended, both vertically and horizontally. That is, the image will be resized, if necessary, to accommodate whatever the encoder is emitting.
I'd be more than happy to change it, to instead truncate the image to the width/height specified in the raster attributes. It'd just make everything simpler on the terminal side.
|
Yepp, it reduces code complexity alot, and on perf side - it is actually ~40% faster during sixel decoding because of known upper bounds prehand in my decoder. |
Thanks (@dnkl, @dankamongmen and @jerch) for the clarifications and suggestions. I guess I can work with those. EDIT: ... as regards cursor horizontal placement. |
I don't remember exactly how intentional it was, but when I wrote most of the encoder back in 2018 I had to work around issues in existing decoders. For instance, I specify the raster dimensions but still make sure to pad every sixel row to the full width, since I noticed a case where the terminal would have garbage in the image buffer otherwise. It's possible the GNL was required by a decoder at some point. That said, after testing it again now, it seems e.g. mlterm behaves the same with and without the GNL; I think it opens a new sixel row only when its pixel data starts arriving. I don't know of anything that needs the final GNL anymore, so I'll remove it. I'm also partial to the idea that raster attributes should preempt dynamic resizing. It makes things more predictable for everyone. |
I'm glad to see all the terminal developers here working together! If I can summarize, it sounds like everyone is in agreement that modern terminals should allow what I will refer to as splat-nl-print: Applications may send sixels to a screen and simply send a newline before any text if they do not wish to overlap the graphics. Although VT340 compatibility is not the highest priority, I can add that my tests show splat-nl-print as the algorithm of choice even on a real VT340 as the occasional glitch is vanishingly rare in actual usage. Additional points brought up:
|
If you're going to define your own version of Sixel, can you please make it something that apps can opt into or out of with a mode. Worst case, if you don't want to implement both standard and non-standard cursor placement, you could still report the mode as permanently set, and then apps can at least tell what behavior to expect from the terminal. |
Have you tested recent versions of xterm? I think it is desirable to be compatible with xterm. It may be a good idea to contact Thomas E. Dickey, the maintainer of xterm. He has tweaked the handling of Sixels in the past, and may be open to (if necessary) doing so to match the "saner" behavior. |
@j4james I don't believe there is a "standard version" of Sixel. That is part of the problem: Different implementations act differently. Is "standard Sixel" whatever DEC implemented in their terminals? Are all such terminals consistent? What about the specifications (manuals) from DEC? What about corner cases not convered in the manuals? What about xterm - and which version of xterm? If all of these were consistent, I'd consider that as "standard sixel" - but I'm pretty certain that is not the case, |
UPDATE I have determined that I was mistaken about Xterm's behavior regressing. In fact, it is now almost precisely correct. The one thing it is missing, however, is moving the text cursor down on Graphic New Lines, which just happens to be the default output from ImageMagick's |
Here is a new script, textcursor2.sh, which shows how a TEXT NEW LINE (or, equivalently, CURSOR DOWN) separates a sixel image from any following text on a VT340 with its 20 pixel high character cell. It also shows what happens when GRAPHIC NEW LINE is used; the most important feature of which seems to be that it acts exactly like a single text new line whenever the image height is a multiple of the character cell height. |
That's an interesting proposal, though ideally I'd wish for something that can be used without returning to the first column, and which doesn't require a blank row. Chafa (by request) has a |
Fortunately, there is not much genuine controversy on that point: just send a single newline, Click to read Hackerb9's humble opinion
The reason there seems to be controversy is that the vt340 can have an extremely rare quirk where the text will overwrite a few pixels at the bottom of the image. Depending upon your goal for the terminal emulator you are writing, this may or may not be important. It's not a major graphical problem and it almost never happens. Still a terminal which wants to be a faithful clone of the VT340's behaviours would of course care about this nuance. The cost of attempting to replicate it are high, causing other design trade-offs and adding complexity not just for the terminal developers but also for application programmers. Terminals which aim to be useful in modern times would be well advised to skip the quirk. I believe it was not a design goal but a compromise. The VT340's "top pixel" heuristic is a quick approximation for a calculation that was too expensive at the time: "bottom pixel". Fortunately -- or perhaps by design -- the VT340's character cell height of 20 pixels makes that heuristic work just like "bottom pixel" in nearly every case. I had thought this glitch was a bug in the VT340 but, after looking into it deeply, I am actually extremely impressed with the engineers from DEC. They came up with a clever solution that nobody noticed at the time was any different from the correct calculation. DEC's lack of documentation on this point would be surprising given how thorough the manuals are until one realizes it was probably omitted on purpose. If people knew the trick the VT340 was using, they might start relying on the quirky behaviour and future terminals would be obligated to support it. Modern terminals have no need to approximate the calculation of the bottom-most opaque pixel as processors are not as limited as they were in the 1980s. Even if they wanted to, there is no benefit to trying to extrapolate what the VT340 heuristic would be in modern times. Whatever it is, it is certainly not just picking the top pixel as that doesn't work for other character cell heights. Trying to salvage it by presuming all character cells are 10x20 regardless of the font size causes a cascade of other problems, the worst being that high res images require making the font size imperceptibly small. And, even if some terminal did implement a heuristic that worked at any font size, it would be useless to application programmers. Calculating when to send two newlines is unnecessarily complicated and sometimes not even possible. Consider the case where a program wants to display files that contain sixel screen dumps, perhaps captured by the VT340's MediaCopy. Since each file can contain an arbitrarily sized region, the program doesn't know ahead of time how high the image is in pixels. The only sane thing to do would be to send a single newline and presume one is enough to get the text cursor to a free line. This works on a VT340 so close to always that it isn't worth it to try to work around the occasional glitch. In summary: We're talking about a very minor and rare graphical glitch that can occur on the VT340. While interesting from a historical perspective, only a precise VT340 emulator needs to care about such quirks. There is no benefit to copying this behaviour of the VT340 to modern terminals and much harm. @PerBothner: Although not appropriate for sixel graphics, I could see your fresh-line proposal being useful for other situations, such as to make sure the prompt is located correctly after a program dies abruptly. @hpjansson: To not return to the first column after displaying sixels, use IND, |
Right - but assuming a DEC-faithful TE, I would have to emit IND once or twice, depending - or rely on some extension such as @PerBothner's suggestion. The central question is "can we conserve DEC sixels but do something else to obviate the need to know where the last sixel band fell in relation to text cells?" |
Just emit IND once, same as a newline. This conserves DEC's sixel design. |
Okay - I'll do that (and unless I've misunderstood something, accept that a few pixels may get cut off). I'll get out of your hair now so you can discuss the other aspects (e.g. should raster attributes define a clipping rectangle? :-) Enjoying the conversation. |
The final GNL could cause extra space to be emitted in some circumstances. Also fix an issue causing more bands to be padded than necessary when multithreaded. See #192 (GitHub).
This positions the cursor correctly ~everywhere. See #192 (GitHub).
maybe i'm misunderstanding the need, but in notcurses i handle what i believe to be your problem by getting the terminal size in pixels, dividing that out by the number of rows and cols, and using those as the cell pixel dimensions. doesn't this provide you enough? |
Good question. I've already said I think it's a reasonable, if not ideal, optimization even though it clearly violates both DEC's documentation and actual hardware behaviour. I should ask, though, does anyone have a good hypothesis for why DEC repeatedly stated that sixel images can extend beyond the rectangle defined by RA? What is lost by taking this optimization? My working theory had been that DEC probably wanted RA to define a clipping box but their hardware wasn't up to the task. However, that kinda falls apart when I look into it as their "GPU" (DRAGON) actually featured multiple viewports that might have done the job in no time. And, if jerch's results apply, using clipping could have actually made the VT340 run quite a bit faster, not slower. But do they apply? Would the VT340 have seen a significant speed benefit? @jerch, when you say you get a 40% speed boost, what exactly was the bottleneck? Memory pressure from dynamic allocation of large rectangles? |
@hackerb9 how does trailing GNLs interact with last transparent rows being clipped? One way of looking at final, trailing GNL, is that it is a completely transparent sixel row (and thus that it should be removed). But perhaps it's more correct to say that a GNL should be treated as a fully opaque row, until you start printing sixels; then you start tracking the bottom-most opaque pixel.
I can obviously not speak for @jerch , and I, too, am very curious. However, for me, there's no 40% speed boost just from allowing the raster attributes to act like a clipping region. Foot allocates the entire backing memory when the raster attributes is set. We still have to check for "overflows" (either increase image size if the sixel cursor goes beyond the raster attributes, or ignore the sixel). Thus, it makes very little difference while processing sixel characters. There would be a small performance gain, in that we wouldn't have to reallocate the backing image when we encounter "sloppy" encoders that emit a trailing GNL, that triggers a vertical resize. Treating it as a clipping region does simplify things though. And, almost removes the need to scan for last-opaque sixel row ;) But, I'm fine with either way. |
Infinite scroll would be the most obvious example (I'm sure we discussed this somewhere before but I can't find it in your repo right now). You'd also lose some bandwidth saving tricks that could be beneficial when working with non-rectangular output. You can see the sort of thing I mean in the raster dimension tests. |
Before I get into the weeds about a trailing graphic newline, I do want to say that I think GNL is not as important as getting the text newline behaviour consistent across modern terminals.
Click to see hackerb9's pondering of GNL
@j4james is most knowledgeable of precise VT340 behaviour and may even know the exact algorithm for 20 pixel tall fonts off-hand. For modern terminals, I think perhaps a better question would be why did DEC choose the algorithm they did for the VT340? We've already seen that sometimes they developed fast but inexact algorithms to overcome hardware limitations, so what benefits did the algorithm they chose for the VT340's GNL provide to programmers and users at that time? With the caveat that I haven't thought this out as deeply as I have text newlines, here's my current take on GNL: EFFECT OF A TRAILING GRAPHIC NEW LINE ON TEXT CURSOR POSITION
It seems that a trailing GNL is practically useless to current programmers as the following text will almost always overlap. The one case it is sure to give a fresh line is not terribly useful since a text newline works the same and is more general. I don't know the design parameters DEC was constrained by, but it looks an awful lot like an attempt at backwards compatibility. Historically, sixels were designed for printers and teletypewriters in which GNL represented advancing the paper by a fraction of the usual line height. Excerpt from LJ250 Printer Programmer's Reference Manual
Since the fractions can add up, it makes sense that some programmers may have relied on printing images at a multiple of the line height and sending a final GNL to move the printhead to the next (whole) text line instead of using an explicit LF. Perhaps this was a common programming idiom and DEC wanted to make sure it still worked on video terminals. A possible critique and response
One problem with this theory is that printer-terminals, unlike video-terminals, might have been able to print a fraction of a line down so sizing images to a multiple of the line-height might not matter. Response: It's also possible that being aligned to whole lines was important if not 100% necessary. For example, the manual for the DEC LA100 printer has a caution about using Partial Line Down:
Another possible reason aligning to whole lines may have been important back in those days was that green bar paper was common, but that seems weak to me. Even if my above theory is correct, one thing I don't get is why not always advance the text cursor? What, if any, benefit is there to have a trailing GNL stay on the same line? My first thought was that perhaps the fractional page motion was saved and would be used to align any following sixel images, but no, they overwrite the previous image just like text does. Speed of calculation is likely part of it, but what exactly were they trying to calculate? I suspect this is a historical mystery which won't be solved until someone documents the actual behaviour of something even older than a VT340, perhaps a DECwriter IV printer-terminal. |
Alright, I now have three open PRs for foot, addressing the following:
Is this something you all (though I guess it's pretty clear where @j4james stands on this) would consider implementing in your TEs? Just to make it clear. I don't intend to merge any of the above (1640 being the exception) unless we can reach at least some level of consensus here. @hackerb9 thanks again for your detailed explanation. What I ended up doing (in 1640), is to let trailing GNLs move the text cursor as if you had at least one fully opaque 6-pixel sixel on that row, but as soon as you start printing sixels, I switch to tracking whatever the actual bottom pixel is. In other words, a trailing GNL will not be trimmed out when we remove trailing, transparent sixel rows. |
It's sufficient, but not ideal (click to expand summary):
To be clear, I'm not asking for anything in particular to be done about this, just that it's taken into account if terminal maintainers are making changes anyway. IMO, a broad consensus is more important than any of these concerns. Also, as @hackerb9 suggested, we should probably leave points 2 and 3 for a separate issue :-) |
Considering images actually having trailing transparent rows, I have a couple concerns/questions as regards trimming trailing transparent rows:
|
@hpjansson thanks for the explanation. i work around these three issues, but they're all valid concerns. |
Yes, and that's kind of the whole idea. If we don't trim, all images will be forced to have a height that is a multiple of 6. If we choose to truncate images with raster attributes, we could also choose to not trim trailing transparent rows. But if we don't truncate the image, I think trimming should be done regardless of whether the image has raster attributes or not. Otherwise, an image with raster attributes would still be forced to have a height that is a multiple of 6.
That's a valid question. Not sure if @hackerb9 has any insights on what the real VT340 does? I would kind of make sense to only trim when |
Honestly, I think this approach results in the most reliable/consistent/predictable behaviour and is technically the most straightforward and efficient to implement... both for TE and app developers. |
Definitely a good question, though straying a bit from the issue nominally at hand (newlines: graphical and otherwise). I just ran a test of p2 effects on overlaying graphics and the results surprised me. The rules for overlaying graphics seem to be:
№ 3 was the most surprising to me, but I guess it makes sense for a sixel parser: if you don't have any guess what size the graphics actually are, but you know there's an opaque background that must be cleared first, set the RA size to maximum. This behaviour also fits with how the documentation talks about the RA size parameter not being the actual geometry of the sixel image but rather an easy way to clear a rectangle. (You can see that in my test because I made the 20x20 image have a 60x60 RA size, which matters when transparency is off, P2=0). It was also interesting to me that the Raster Attribute size had no effect on the final cursor position. I'm not sure what the benefit is, but I think perhaps it makes sense since multiple Raster Attributes are allowed in a single sixel DCS string. If you want to test your terminal emulator of choice, you can get my script from here: https://raw.githubusercontent.com/hackerb9/vt340test/main/sixeltests/p2effect.sh . I'm curious to know the results. FootnoteFootnote: I think of the VT340 as lacking the rectangle operations that existed in later terminals like the VT4x0. I'm not sure if I ever quite grasped before that that there is actually an easy way to clear rectangles on the VT340. (And the rectangle doesn't even have to align to the character cell! --- not sure if that's a bug or a feature.) |
Wow! Ain't that something... Now, i kinda regret asking. |
@hackerb9 thanks! That's some interesting results. I'll be doing a couple of changes in foot to better match the VT340. I'm also inclined to not make RA truncate images, but instead continue allowing images to extend beyond their RA. But combine that with trimming trailing transparent sixel rows. |
@wez I think it's your system that is a bit wonky! Maybe an incompatibility with the shell? Because even the Xterm image has a whole bunch of mistakes that I'm not seeing when I run the script myself. For example, where are all those |
@wez Your Xterm screenshot still looks wrong to me. Are you sure you're using the latest version? This is what I get: It doesn't get the cursor position right when raster attributes are set, and it doesn't set the opaque background correctly when raster attributes are not set, but otherwise it seems OK to me. And note that you need to use a 10x20 font if you want to fully emulate the VT340, otherwise the cursor position tests are going to be misleading. |
The final GNL could cause extra space to be emitted in some circumstances. Also fix an issue causing more bands to be padded than necessary when multithreaded. See #192 (GitHub).
This positions the cursor correctly ~everywhere. See #192 (GitHub).
Sixel capable terminal emulators have gotten cursor placement (after emitting the sixel) wrong since the beginning. They usually put the cursor on a new line under the sixel. This means the terminal content may scroll, if a sixel is printed on the last row.
However, it's not how the VT340 did it. The simplified explanation is that it places the cursor on the last line of the sixel. Thus, if you want to print text under the sixel, you first have to print a newline.
The real algorithm is slightly more complex than that. A sixel is 6 pixels tall. This means it can cover two text rows. The DEC cursor placement algorithm puts the text cursor where the top pixel is. This means there are times when two newlines are required to print text under the sixel.
A number of terminals have started to implement the correct behavior. Terminals that implement the DEC placement algorithm are foot, contour, DomTerm and WezTerm. There may be more that I'm not aware of. XTerm is close to correct, but last time I checked, it placed the cursor on the bottom pixel (i.e. you always need a single newline).
Right now, running
chafa <image> && echo "XXXXXX"
will look something like this in e.g. foot:(picture shows a part of my dog's paw...)
A bit more information here:
The text was updated successfully, but these errors were encountered: