General lack of clarity about input/output/context languages #16

domenic · 2024-11-25T06:40:47Z

If you try to summarize Japanese text, should you expect a Japanese summary? Or an English summary?

What if you provide your { context } or { sharedContext } in a third or fourth language?

How do the answers to these questions interact with summarizerCapabilities.languageAvailable()? Currently it's only intended to give an answer for input language support.

Should we allow web developers to specify the output language more tightly? If so, how could we guarantee the result---would we pass it through translation APIs behind the scenes? Or just fail if it's not supported, and let developers do the translation themselves?

The text was updated successfully, but these errors were encountered:

These solve the problem discussed in webmachinelearning/prompt-api#29 and #16. They provide a mechanism for web developers to tell the browser to download additional material to support additional languages, and for web developers to get early errors if they know they will be trying to use a language that isn't supported. It also clearly separates input, context, and output languages, with a requirement on how the output language is produced by default (match the input). This removes the languageAvailable() API, folding it into createOptionsAvailable(). Further work might remove the AISummarizerCapabilities object altogether, since now it's mostly a wrapper around the single createOptionsAvailable() method.

etiennenoel · 2024-12-09T22:17:03Z

I assume that this would use the LLM and not the translate API to do the translation right?

In this case, what is the difference between the rewriteAPI and the translateAPI if you can use the rewriteAPI simply for translation purposes?

domenic · 2024-12-10T04:24:11Z

I agree this is confusing and unsatisfactory.

One could argue that there's a difference between "rewriting" and "translating", similar to the difference between "summarizing" and "rewriting". But I'm not sure the argument is very solid.

I think the more practical issue is just about expected language support in current implementations, and how that affects the combinations. Currently we'd expect:

Rewrite supports the various options (tone, format, length, context). It supports 1-5 input and output languages. It supports multiple input languages in the same string. This kind of capability is naturally emergent from the language model we plan to use.
Translate supports zero options. It supports many languages. It only supports a single input language per string. This kind of capability is naturally emergent from the translation model we plan to use.

Our current strategy is to signal this clearly via different API entrypoints: translate doesn't have any configurable options, for example, and expectedInputLanguages is optional for rewriter.

You could imagine an alternate strategy where we try to fit everything into the rewriter API. This would have some sharp edges, though. For example:

Even if both the language and the translation models support a given language pair, you could see dramatically different translations by just tweaking the options slightly. E.g., if you use the { context } option, which only the language model supports, your translation will suddenly change in ways that are not related to the context.
Translating between a given language pair might be supported using the translation model, but then when you ask it to make the result shorter, we either fail (because we can't do everything with the language model) or we have to do something like translate to English, make shorter, translate to destination language, which could introduce unexpected artifacts.

Do we think it might be worth pursuing this road anyway?

An alternate strategy would be to get rid of the outputLanguage setting for the writing assistance APIs and say that the output language is always derived from the input language. Then, web developers have to explicitly use the translation APIs if they want. But that runs into the issue where it's not clear what kind of results should occur for multilingual input; as #22 states,

If the outputLanguage is not supplied, the default behavior is to produce the output in "the same language as the input". For the multilingual input case, what this means is left implementation-defined for now, and implementations should err on the side of rejecting with a "NotSupportedError" DOMException. For this reason, it's strongly recommended that developers supply outputLanguage.

These solve the problem discussed in webmachinelearning/prompt-api#29 and #16. They provide a mechanism for web developers to tell the browser to download additional material to support additional languages, and for web developers to get early errors if they know they will be trying to use a language that isn't supported. It also clearly separates input, context, and output languages, with a requirement on how the output language is produced by default (match the input). This removes the languageAvailable() API, folding it into createOptionsAvailable(). Further work might remove the AISummarizerCapabilities object altogether, since now it's mostly a wrapper around the single createOptionsAvailable() method.

michaelwasserman · 2024-12-12T18:35:11Z

WDYT about combining input, context, and shared context languages into a single list?
Those lists will likely undergo identical impl support checks, and most dev usage will likely have identical lists. Is there a clear user/dev/impl benefit to splitting them up?
Also, WDYT about handling a list for output languages?
Perhaps a single response could be multi-lingual, or separate responses could be in separate languages? This might also help coalesce dev inquiries and creation requests for multiple output languages, say for translation.
It might be nice if responses include a string description or codes regarding incompatibilities (e.g. "No multi-lingual output", or NotSupportedInputLanguage, even NotSupportedLengthAndToneCombination or similar)

domenic · 2024-12-20T00:02:58Z

WDYT about combining input, context, and shared context languages into a single list?
Those lists will likely undergo identical impl support checks, and most dev usage will likely have identical lists. Is there a clear user/dev/impl benefit to splitting them up?

I don't think it's true that they will have identical implementation support checks. For example, in Chromium we have specific output languages we support because we've done sufficient safety-checking on responses in those languages. That set is a subset of the supported input languages.

It's less clear that there might be cases where input and context language support differs. But, I'm thinking if you create a fine-tuning for summarizing Japanese, that doesn't necessarily mean you've fine-tuned for following Japanese instructions (i.e. context). And it seems simpler for web developers to have a 1:1 correspondence between text-valued options to the API, and corresponding supportedXYZLanguages properties.

2. Also, WDYT about handling a list for output languages?
Perhaps a single response could be multi-lingual, or separate responses could be in separate languages? This might also help coalesce dev inquiries and creation requests for multiple output languages, say for translation.

I thought about this, but was unsure how we'd implement it. I guess we would prompt the model with something like "output a mix of English or Japanese"? Or "you can output either English or Japanese as appropriate"?

If you think this is implementable, then I'd be happy to update the API with it.

3. It might be nice if responses include a string description or codes regarding incompatibilities (e.g. "No multi-lingual output", or NotSupportedInputLanguage, even NotSupportedLengthAndToneCombination or similar)

Do you mean, the error messages on the "NotSupportedError" DOMException should be clear? Or do you think we should provide programatically-distinguished different error names (beyond a blanket "NotSupportedError") which programmers could use to react differently depending on the error cases?

In general the web platform doesn't go very granular with its error names. But, if we have concrete cases where we expect developers to write different logic paths for different error cases, instead of generally bubbling up to some sort of "the API was not supported in your browser" error message, then we can definitely do this.

michaelwasserman · 2024-12-20T00:38:15Z

Thanks for answering those qs; I hope we can chat more about design after the holidays (happy holidays!).

Separate i/o languages makes sense for sure, but input and context language support discrepancies would surprise me (as a novice). That said xyz and supportedXYZLanguages might be a reasonable pattern.
An output list is just forward-looking, even if some models today don't support multi-lingual output
Yeah, pair messages or codes with NotSupportedError, to discern fallback options instead of spamming queries

domenic mentioned this issue Dec 6, 2024

Overhaul availability testing and add expected language options #22

Merged

domenic closed this as completed in #22 Jan 17, 2025

domenic closed this as completed in da9ac67 Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General lack of clarity about input/output/context languages #16

General lack of clarity about input/output/context languages #16

domenic commented Nov 25, 2024

etiennenoel commented Dec 9, 2024

domenic commented Dec 10, 2024 •

edited

Loading

michaelwasserman commented Dec 12, 2024

domenic commented Dec 20, 2024

michaelwasserman commented Dec 20, 2024

General lack of clarity about input/output/context languages #16

General lack of clarity about input/output/context languages #16

Comments

domenic commented Nov 25, 2024

etiennenoel commented Dec 9, 2024

domenic commented Dec 10, 2024 • edited Loading

michaelwasserman commented Dec 12, 2024

domenic commented Dec 20, 2024

michaelwasserman commented Dec 20, 2024

domenic commented Dec 10, 2024 •

edited

Loading