Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General lack of clarity about input/output/context languages #16

Closed
domenic opened this issue Nov 25, 2024 · 5 comments · Fixed by #22
Closed

General lack of clarity about input/output/context languages #16

domenic opened this issue Nov 25, 2024 · 5 comments · Fixed by #22

Comments

@domenic
Copy link
Collaborator

domenic commented Nov 25, 2024

If you try to summarize Japanese text, should you expect a Japanese summary? Or an English summary?

What if you provide your { context } or { sharedContext } in a third or fourth language?

How do the answers to these questions interact with summarizerCapabilities.languageAvailable()? Currently it's only intended to give an answer for input language support.

Should we allow web developers to specify the output language more tightly? If so, how could we guarantee the result---would we pass it through translation APIs behind the scenes? Or just fail if it's not supported, and let developers do the translation themselves?

domenic added a commit that referenced this issue Dec 6, 2024
These solve the problem discussed in webmachinelearning/prompt-api#29 and #16. They provide a mechanism for web developers to tell the browser to download additional material to support additional languages, and for web developers to get early errors if they know they will be trying to use a language that isn't supported. It also clearly separates input, context, and output languages, with a requirement on how the output language is produced by default (match the input).

This removes the languageAvailable() API, folding it into createOptionsAvailable(). Further work might remove the AISummarizerCapabilities object altogether, since now it's mostly a wrapper around the single createOptionsAvailable() method.
@etiennenoel
Copy link

I assume that this would use the LLM and not the translate API to do the translation right?

In this case, what is the difference between the rewriteAPI and the translateAPI if you can use the rewriteAPI simply for translation purposes?

@domenic
Copy link
Collaborator Author

domenic commented Dec 10, 2024

I agree this is confusing and unsatisfactory.

One could argue that there's a difference between "rewriting" and "translating", similar to the difference between "summarizing" and "rewriting". But I'm not sure the argument is very solid.

I think the more practical issue is just about expected language support in current implementations, and how that affects the combinations. Currently we'd expect:

  • Rewrite supports the various options (tone, format, length, context). It supports 1-5 input and output languages. It supports multiple input languages in the same string. This kind of capability is naturally emergent from the language model we plan to use.
  • Translate supports zero options. It supports many languages. It only supports a single input language per string. This kind of capability is naturally emergent from the translation model we plan to use.

Our current strategy is to signal this clearly via different API entrypoints: translate doesn't have any configurable options, for example, and expectedInputLanguages is optional for rewriter.

You could imagine an alternate strategy where we try to fit everything into the rewriter API. This would have some sharp edges, though. For example:

  • Even if both the language and the translation models support a given language pair, you could see dramatically different translations by just tweaking the options slightly. E.g., if you use the { context } option, which only the language model supports, your translation will suddenly change in ways that are not related to the context.
  • Translating between a given language pair might be supported using the translation model, but then when you ask it to make the result shorter, we either fail (because we can't do everything with the language model) or we have to do something like translate to English, make shorter, translate to destination language, which could introduce unexpected artifacts.

Do we think it might be worth pursuing this road anyway?

An alternate strategy would be to get rid of the outputLanguage setting for the writing assistance APIs and say that the output language is always derived from the input language. Then, web developers have to explicitly use the translation APIs if they want. But that runs into the issue where it's not clear what kind of results should occur for multilingual input; as #22 states,

If the outputLanguage is not supplied, the default behavior is to produce the output in "the same language as the input". For the multilingual input case, what this means is left implementation-defined for now, and implementations should err on the side of rejecting with a "NotSupportedError" DOMException. For this reason, it's strongly recommended that developers supply outputLanguage.

domenic added a commit that referenced this issue Dec 11, 2024
These solve the problem discussed in webmachinelearning/prompt-api#29 and #16. They provide a mechanism for web developers to tell the browser to download additional material to support additional languages, and for web developers to get early errors if they know they will be trying to use a language that isn't supported. It also clearly separates input, context, and output languages, with a requirement on how the output language is produced by default (match the input).

This removes the languageAvailable() API, folding it into createOptionsAvailable(). Further work might remove the AISummarizerCapabilities object altogether, since now it's mostly a wrapper around the single createOptionsAvailable() method.
@michaelwasserman
Copy link

  1. WDYT about combining input, context, and shared context languages into a single list?
    Those lists will likely undergo identical impl support checks, and most dev usage will likely have identical lists. Is there a clear user/dev/impl benefit to splitting them up?

  2. Also, WDYT about handling a list for output languages?
    Perhaps a single response could be multi-lingual, or separate responses could be in separate languages? This might also help coalesce dev inquiries and creation requests for multiple output languages, say for translation.

  3. It might be nice if responses include a string description or codes regarding incompatibilities (e.g. "No multi-lingual output", or NotSupportedInputLanguage, even NotSupportedLengthAndToneCombination or similar)

@domenic
Copy link
Collaborator Author

domenic commented Dec 20, 2024

  1. WDYT about combining input, context, and shared context languages into a single list?
    Those lists will likely undergo identical impl support checks, and most dev usage will likely have identical lists. Is there a clear user/dev/impl benefit to splitting them up?

I don't think it's true that they will have identical implementation support checks. For example, in Chromium we have specific output languages we support because we've done sufficient safety-checking on responses in those languages. That set is a subset of the supported input languages.

It's less clear that there might be cases where input and context language support differs. But, I'm thinking if you create a fine-tuning for summarizing Japanese, that doesn't necessarily mean you've fine-tuned for following Japanese instructions (i.e. context). And it seems simpler for web developers to have a 1:1 correspondence between text-valued options to the API, and corresponding supportedXYZLanguages properties.

2. Also, WDYT about handling a list for output languages?
Perhaps a single response could be multi-lingual, or separate responses could be in separate languages? This might also help coalesce dev inquiries and creation requests for multiple output languages, say for translation.

I thought about this, but was unsure how we'd implement it. I guess we would prompt the model with something like "output a mix of English or Japanese"? Or "you can output either English or Japanese as appropriate"?

If you think this is implementable, then I'd be happy to update the API with it.

3. It might be nice if responses include a string description or codes regarding incompatibilities (e.g. "No multi-lingual output", or NotSupportedInputLanguage, even NotSupportedLengthAndToneCombination or similar)

Do you mean, the error messages on the "NotSupportedError" DOMException should be clear? Or do you think we should provide programatically-distinguished different error names (beyond a blanket "NotSupportedError") which programmers could use to react differently depending on the error cases?

In general the web platform doesn't go very granular with its error names. But, if we have concrete cases where we expect developers to write different logic paths for different error cases, instead of generally bubbling up to some sort of "the API was not supported in your browser" error message, then we can definitely do this.

@michaelwasserman
Copy link

Thanks for answering those qs; I hope we can chat more about design after the holidays (happy holidays!).

  1. Separate i/o languages makes sense for sure, but input and context language support discrepancies would surprise me (as a novice). That said xyz and supportedXYZLanguages might be a reasonable pattern.

  2. An output list is just forward-looking, even if some models today don't support multi-lingual output

  3. Yeah, pair messages or codes with NotSupportedError, to discern fallback options instead of spamming queries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants