Special latex characters in alt-text #804

lstonys · 2025-02-25T12:01:43Z

In alt text we need some encoding where we can hide any character from tex (like HTML has entities where every character can be typed as unicode).
For example what will be if I want to describe brace characters:

alt ={%
{ - small variant of curly brace....
{ - large variant ...
}%

I can hide it with \{ but in alt text backslash is useless and need to be striped.
Now I tried directly to add common special characters and I see that:

~ is changed to space
% is gone (ofcourse)
# is doubled

example:

\DocumentMetadata{testphase=phase-III}
\documentclass{article}
\usepackage{graphicx}

\begin{document}
    \begin{figure}[h]
    \centering
    \includegraphics[alt={Alternative, `'@~#$^_{}%
text},width=\textwidth]{example-image}
    \end{figure}
some text
\end{document}

The text was updated successfully, but these errors were encountered:

u-fischer · 2025-02-25T13:21:09Z

well tilde and superscript are a bit special, but beside this:


\DocumentMetadata{testphase=phase-III} \documentclass{article} \usepackage{graphicx} \begin{document} \begin{figure}

 \centering \includegraphics[alt={Alternative, `'@\string~\#\$\string^\_\{\}\% text},width=\textwidth]{example-image} 

\end{figure} 
some text 
\end{document}

lstonys · 2025-02-25T13:41:49Z

OK, I tried \detokenize, \string, \, but didn't mixed them. Could you update docs section 4 Alternative text, ActualText and text-to-speech software with some notes about special characters. Thanks a lot !!!

FrankMittelbach · 2025-02-25T14:12:35Z

Arguably, standard commands like \textleftbrace, etc should work too in this place, but they don't as far as I can see.

lstonys · 2025-02-25T14:41:23Z

Some publishers now requires to add alternative texts in their tex files and authors who don't care much about tags writes just
werbatim in alt={} field. Mostly their goal is to get correct pdf view. They could escape characters but we need a clear instructions. Better would be a raw text input (for alt tex) but we can't do it in key=val system.

Alternative

\defineAltText{ID}
some verbatim text
\enddefineAltText
\includegraphics[alttextid={ID}]{...}

where \defineAltText drops all catcodes and reads every char until \enddefineAltText and it's easy to guess where I'm pointing to :) Ofcourse \defineAltText can't be in some inner macro.

Another idea that maybe \defineAltText ID could be image name (the same as mandatory \includegraphics parameter) than we don't need to pass alttextid={ID} and \includegraphics could load automatically these alt texts.

u-fischer · 2025-02-25T15:20:13Z

Arguably, standard commands like \textleftbrace, etc should work too in this place, but they don't as far as I can see

hm, no. \text_purify:n leaves them in the stream. It works with \text_declare_purify_equivalent:Nn\textbraceleft{\{}, but it would be a bit overkill, to do that for every such command. @josephwright any thought about this?

FrankMittelbach · 2025-02-25T15:37:09Z

hm, no. \text_purify:n leaves them in the stream. It works with \text_declare_purify_equivalent:Nn\textbraceleft{\{}, but it would be a bit overkill, to do that for every such command. @josephwright any thought about this?

There are a few dozen such commands, but for what it is worth they are standard input methods and if they appear in, say, a heading they should be replaced by something suitable when the text is moved to the book mark, for example. So I think there is some argument for it that the purify handles them. On the other hand it is a noticeable overhead for a marginal use case

If you could get all special chars using \string then this is a relatively easy way to input them, but unfortunately that isn't the case for % { # } where you really need \% etc. So it is somewhat awkward in any case and it would need documentation whatever is done about it (if anything).

josephwright · 2025-02-25T17:04:22Z

I can certainly adjust \text_purify:n to cover more chars - it would be sensible to collect up a proper list. There are several (partial) lists of this form about - but one clear one from the team would likely be best.

A related issue for me is that currently \text_expand:n leaves the input as far as possible unchanged, leaving 'Unicode-ification' to \text_purfiy:n. My feeling is that it would be a lot easier if that were handled by \text_expand:n, as then all text functions would get as much as possible 'just chars'. But that is a change in that at present \text_expand:n is described as similar to \protected@edef. Thoughts?

FrankMittelbach · 2025-02-25T17:17:00Z

TLC3 I-768 to I776 is what is documented as encoding specific commands.

It's a mouthful already plus packages (eg babel) might add more so you would also need an interface to add to whatever list is automatically handled.

Redefining all of them is not feasible. Instead, I think what should be done is to define a PU encoding and then provide definitions in that encoding and during purifying you change to that encoding. That then uses the encoding change approach to avoid doing all the redefinitions and and only do them on the fly when they are actually show up in the input (handwaving, may not work with the purify approach easily)

josephwright · 2025-02-25T17:22:25Z

@FrankMittelbach Surely that's not much worse than loading puenc.def, just a question of where you store the data? (`\text_... all work in expansion contexts, so we can't read data as-and-when).

FrankMittelbach · 2025-02-25T17:26:00Z

much worse. puenc.def is loaded once, but the redefinitions for the encoding specific commands would happen each time purify is done. In contrast a text encoding specific command checks if it already has a definition suitable for the current encoding and if not chenges to the one in the right encoding, but that happens only for those encoding specific ocmmands that are actually used so only a few if any not a few hundred each time and this is all done expandably.

josephwright · 2025-02-25T17:26:32Z

Note that special chars are already covered:

\text_declare_purify_equivalent:Nn \\ { }
\tl_map_inline:nn
  { \{ \} \# \$ \% \_ }
  { \text_declare_purify_equivalent:Ne #1 { \cs_to_str:N #1 } }

FrankMittelbach · 2025-02-25T17:29:54Z

but this is what I mean you do these mappings each time even if none of the commands show up. not a problem for 5 but a bit different if a few hundred.

In contrast if we are in a PU encoding the expansion of \{ would check find it is T1 so changes to \PU\{ and runs that

josephwright · 2025-02-25T17:30:10Z

@FrankMittelbach I have a feeling we are talking at cross-purposes here! When you pass something like \textlbrace to \text_purify:n, we see exactly that token then just need to see if there is an equivalent 'purification' definition - there is no encoding change. As I said, my personal preference would be to move this to \text_expand:n (along with things like composing accent commands), but the data loading doesn't worry me at all. See the latter part of l3text-purify.dtx for what we load ATM.

FrankMittelbach · 2025-02-25T17:34:47Z

ah ok, so your purify does something similar to what the encoding specific command mechanism does (makes me wonder if it could have used that mechanism in the first place -- probably not as you have to get rid of other stuff)

josephwright · 2025-02-25T17:35:53Z

but this is what I mean you do these mappings each time even if none of the commands show up. not a problem for 5 but a bit different if a few hundred.

In contrast if we are in a PU encoding the expansion of { would check find it is T1 so changes to \PU{ and runs that

No, it's more-or-less the same as PU. There, we have 100s of

\DeclareTextCommand ...

which store the data (once) and then are looked up in the hash table. For \text_purify:n we need the same idea but with lots of \text_declare_purify_equivalent:Nn, which again store the data as control sequences so we look up in the hash table. As point-of-use, it's just a question of \cs_if_exist_use:cF for the right name.

josephwright · 2025-02-25T17:39:02Z

ah ok, so your purify does something similar to what the encoding specific command mechanism does (makes me wonder if it could have used that mechanism in the first place -- probably not as you have to get rid of other stuff)

Yes, very similar to encoding and of course even more similar to what hyperref already had for the same idea. But as this is a generic expl3 function it works using just expl3's own data structures, etc., and it's expandable, and it does try to cover more stuff.

lstonys · 2025-02-26T07:20:07Z

I don't think that we need to cover all latex input in alt text. Tex in \section{...} has to deal with macros because the same string goes to bookmark. Alt text doesn't go to output so we need to deal only with %{}# and just simply do \detokenize{#1} and later replace with regex these few patterns. With tex4ht could do the same replacement.

u-fischer transferred this issue from latex3/tagpdf Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Special latex characters in alt-text #804

Special latex characters in alt-text #804

lstonys commented Feb 25, 2025 •

edited

Loading

u-fischer commented Feb 25, 2025

lstonys commented Feb 25, 2025

FrankMittelbach commented Feb 25, 2025 •

edited

Loading

lstonys commented Feb 25, 2025 •

edited

Loading

u-fischer commented Feb 25, 2025

FrankMittelbach commented Feb 25, 2025

josephwright commented Feb 25, 2025

FrankMittelbach commented Feb 25, 2025

josephwright commented Feb 25, 2025

FrankMittelbach commented Feb 25, 2025

josephwright commented Feb 25, 2025

FrankMittelbach commented Feb 25, 2025 •

edited

Loading

josephwright commented Feb 25, 2025 •

edited

Loading

FrankMittelbach commented Feb 25, 2025

josephwright commented Feb 25, 2025

josephwright commented Feb 25, 2025

lstonys commented Feb 26, 2025

Special latex characters in alt-text #804

Special latex characters in alt-text #804

Comments

lstonys commented Feb 25, 2025 • edited Loading

u-fischer commented Feb 25, 2025

lstonys commented Feb 25, 2025

FrankMittelbach commented Feb 25, 2025 • edited Loading

lstonys commented Feb 25, 2025 • edited Loading

u-fischer commented Feb 25, 2025

FrankMittelbach commented Feb 25, 2025

josephwright commented Feb 25, 2025

FrankMittelbach commented Feb 25, 2025

josephwright commented Feb 25, 2025

FrankMittelbach commented Feb 25, 2025

josephwright commented Feb 25, 2025

FrankMittelbach commented Feb 25, 2025 • edited Loading

josephwright commented Feb 25, 2025 • edited Loading

FrankMittelbach commented Feb 25, 2025

josephwright commented Feb 25, 2025

josephwright commented Feb 25, 2025

lstonys commented Feb 26, 2025

lstonys commented Feb 25, 2025 •

edited

Loading

FrankMittelbach commented Feb 25, 2025 •

edited

Loading

lstonys commented Feb 25, 2025 •

edited

Loading

FrankMittelbach commented Feb 25, 2025 •

edited

Loading

josephwright commented Feb 25, 2025 •

edited

Loading