-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor cacheing properties #119
Comments
Related: |
There's two types of caching to consider:
The spec currently does briefly talk about this in the performance considerations section. I can expand this section a bit more to also mention the caching aspect. |
Duplicate of #93 |
@martinthomson we closed this as a duplicate, but you are tagged on the other issue as well. |
AIUI that issue was resolved by relying on redirects (which are horrible for performance) and QUERY (which doesn't exist yet, may not be cacheable in these use cases, and if it is, won't be implemented broadly for some time, if ever). I don't think this issue is closed. |
Not mentioned in the other issue but caching will also be supported via GET requests so long as the response includes a "Vary: Font-Patch-Request" header. That said, direct caching of the responses is unlikely to be all that effective due to the large cache key space as you originally mentioned. Though it may be helpful in cases such as where there's a common initial request from a widely visited page. Instead it's probably better for caching to be done in the server implementation by matching incoming requests to a nearby superset of codepoints that the server has a cached response for. This is supported in the spec currently. If having strong regional caching support that does not rely on caching in server implementation is a requirement for a particular use case then serving fonts via range request is likely a better fit. The range request approach provides very good support for caching. |
The HTTP WG has been exploring ways to improve Vary efficiency; the latest iteration (not yet adopted, but being discussed next week) is here. |
Thanks for pointing this out, I was not aware of it. |
Reopening as there's a couple of additions we could make to the spec:
|
You need to use However, that's a bare miniumum for safety only - it's unlikely you'll get many cache hits because of the extremely dynamic nature of the requests. Architecturally, it'd be much better to cast this as a new range-unit, so that it reuses the framework that range requests offer (which are semantically very similar to what youre doing). E.g.,
... with the response being a Recommending how server-side caching could be done isn't helpful; have you engaged with any CDNs to see how likely they are to implement this? |
That's an interesting idea. I hadn't realized that range units are extensible. It seems the current range specification allows for us to have a pretty arbitrary range identifier (specifically other-range). At first glance I don't see any reason why we couldn't switch to using a custom range unit specific to font subsets instead of the font-patch-request header. Though I will have to spend some time reviewing the relevant HTTP range-request specs to make sure what we are trying to do would be a good fit for that framework. Also, it's probably worth mentioning that we've recently reworked the specification to work within the compression-dictionary-transport framework. The relevant part for this issue is that the response (after content-encoding has been decoded) is now a valid font subset file (where previously the response was a patch that needed further decoding). The patching part is now handled as part of the content-encoding. With this in mind I think we end up with something like this:
What do you think? |
That sounds interesting, but we'd need to carefully consider the interaction of content encoding and range requests (which have never played very nicely together). Also, it seems like you're doing something very different than compression dictionary transport. There, the dictionary is a separate resource on the server, identified by a URI and relatively static. Here, the dictionary is the current state of the client's local cache (effectively). So (if I understand the proposal correctly) I'm wondering how much reuse you'll actually get beyond syntax -- keeping in mind that we often find trouble happens when protocol syntax is resued but semantics diverge. I was thinking about reusing ranges because it seems to me that you could encode the entire patch request into the range identifier. It's not particularly elegant, but it is in keeping with how ranges work, conceptually. |
I’ve spent some time reviewing the specs relating to range request and unfortunately according to my interpretation putting the font patch request as a range request probably won’t be a good fit. Overall there’s an assumption that runs through the existing specification that the resource is divided into some number of units and the result of the request is some subset of those units. This manifests in a couple of places which would cause issues when trying to utilize this for font subsets which are not formed of as a set of units from the original font resource:
That said, these are not necessarily insurmountable but I’m currently leaning towards sticking with using the “font-patch-request” + “vary”. Also after further thought I think it’s best to keep the entirety of the font-patch-request message in one place instead of splitting part of it out into the compression dictionary transport header (so sticking with how we currently have it specified).
The compression dictionary transport specification specifically allows past versions of a resource to be used to encode future versions (see delta compression under use cases). I’ve been in close contact with the folks developing the compression dictionary transport proposal and they’re OK with its use for IFT. The IFT spec needs a couple of updates to sync up with the latest version of the proposal but the plan is to fully follow the semantics laid out in the proposal. I’ve implemented a prototype in Chrome of incremental transfer which utilizes the separate prototype compression dictionary transport implementation so I can confirm it’s possible to re-use the generic compression dictionary transport mechanism as part of a client side IFT implementation. It works roughly like this:
“font-patch-request” communicates two things:
Note: if the dictionary can’t be found/recreated then the server will respond with the requested font subset from (1) but will not use “sbr” encoding and everything will still work as normal. |
I don't see why Unicode codepoints can't be used for range units. That's what The superset requirement for combination is a little surprising to me. The delta encoding stuff is maybe OK, but it seems like you might have some difficult with non-linearity when clients have already made some number of partial requests and have synthesized something from those requests. Maybe it can be made to work, but it would be extremely fragile. That said, delta encoding seems like a great idea for simpler scenarios, like the case where you start with Latin script and want to expand in some way from there. That's a case where you might just expand iteratively, either from a baseline (Latin) or what you already have (Latin + Math, Latin + Greek, Latin + Line Drawing, Latin + Emoji Subset 1, etc...). |
A font subset definition is currently made up of a set of unicode codepoints, the variable axis space, and the set of layout features being requested. So using just codepoints doesn’t fully capture what is covered by a response. The other issue is that “content-range” can specify only one continuous range of units. To have more than one range the range request specification currently has this encoded as a multi part response which doesn’t fit with incxfer which always uses a single part response.
In fonts there are various mechanisms which associate data with combinations/sequences of codepoints. A common example is the “fi” ligature where if text has a ‘f’ followed by an ‘i’ it will be substituted with a special “fi” glyph. Now consider a case where you have two subsets where one contains ‘f’ and the other contains ‘i’. Neither subset would contain the ‘fi’ ligature since it’s not reachable. If you tried to combine those two subsets then the merged font won't render the same as the original font on account of the “fi” ligature glyph being missing. This is one of the main shortcomings of the unicode-range approach to serving fonts.
Yes, a bit of care needs to be taken here but in my prototype in Chrome this didn’t end up being too difficult:
This of course is just an example of how it could be done. There’s other approaches such as having the network process coordinate requests and ensure only one is inflight at a time across tabs. For our implementation we decided the added complexity is not worth the small downside of potentially re-requesting data for what should be a relatively infrequent occurrence.
The IFT spec is pretty open ended about what the server is allowed to do. The only requirement is the responses contain at least what was asked for. So this type of approach is absolutely something that can be done and likely makes a lot of sense for scripts which don’t have large codepoint counts (ie. non CJK, emoji, icon fonts). From a server implementation perspective I think it would be pretty reasonable to define a pretty compact latin core (basically just ascii) and then several extended latin subsets for various latin based scripts (for example Vietnamese and sets for European languages which need specific diacritic sets). Outside of latin you could do similar things for other languages/scripts. Once you have these defined the server could always augment in units of the defined subsets based on what codepoints have been requested. This would give performance that’s better than a unicode-range based solution in use today (by way of having tighter subsets and not duplicating data between subsets) while avoiding breaking rendering across subsets. All the while getting it done in a pretty similar number of font loads. |
#153 adds "Vary" to the response. |
Some updates on this post TPAC. I've proposed an alternative version of IFT where the references to patches are embedded in the font file (see: https://lists.w3.org/Archives/Public/public-webfonts-wg/2023Sep/0003.html). Most importantly this eliminates the dynamically constructed patch request message and associated custom header in favour of using regular old URLs pointed to by a mapping in the font file. This would allow fully statically hosted implementations (and hence easy cacheabilty) while still leaving the door open for dynamic implementations if desired. We're also currently exploring the possibility of merging this new approach and the IFTB approach into a single unified IFT mechanism. |
This has now been done, so the whole "produce a complete font in response to a query " issue is no longer applicable. @garretrieger what do you think, close? |
For reference, here's a early draft of the new approach: https://garretrieger.github.io/IFT/Overview.html This allows the patches to be hosted as regular files and uses no special headers/http extensions. So cacheing now works normally. |
Originally raised by Martin Thomson
The text was updated successfully, but these errors were encountered: