Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Browser AI API for Utilizing On-Device Models #178

Open
kenzic opened this issue Oct 25, 2024 · 9 comments
Open

Browser AI API for Utilizing On-Device Models #178

kenzic opened this issue Oct 25, 2024 · 9 comments

Comments

@kenzic
Copy link

kenzic commented Oct 25, 2024

Introduction

The Browser AI API is a proposal for a new browser feature that makes AI models accessible directly on users' devices through the browser. By offering a simple API available on the window object, this approach would allow websites to leverage AI without sending data to the cloud, preserving privacy, reducing latency, and enabling offline functionality.

This is about empowering developers to integrate advanced AI into web apps—without needing heavy infrastructure—while giving users control over their data and the models they choose to run. Imagine a world where on-device AI enhances web apps in real-time, with no data leaving the device and no reliance on external servers.

The API would let developers:

  1. Query available AI models on a users' device.
  2. Request user permission to access specific models.
  3. Create sessions with the models.
  4. Perform common tasks like text generation, embeddings, and chat.

By running models directly on the user's hardware, we’re opening up new possibilities for AI-driven web apps while keeping things secure, private, and available offline.

Prototype

I created a prototype of this concept for review: https://github.com/kenzic/browser.ai

API Overview

The proposed API would be exposed on the window.ai object with the following high-level structure:

window.ai = {
  permissions: {
    models: () => Promise<AIModel[]>,
    request: (options: RequestOptions) => Promise<boolean>
  },
  model: {
    info: (options: ModelInfoOptions) => Promise<ModelInfo>,
    connect: (options: ConnectSessionOptions) => Promise<ModelSession>
  }
}

Permissions

Before using any models, websites must first query available models and request permission:

// Get list of available models
const models = await window.ai.permissions.models();

// Request permission for a specific model
const granted = await window.ai.permissions.request({
  model: "llama3.2"
});

Model Sessions

Once permission is granted, websites can create sessions to interact with models:

const session = await window.ai.model.connect({
  model: "llama3.2"  
});

Chat

const session = await window.ai.model.connect({ model: 'llama3.2' });

const response = await session.chat({
  messages: [
    { role: 'user', content: 'Tell me a joke' }
  ],
  options: { temperature: 0.7 }
});

console.log(response.choices[0].message.content);

Embed

const session = await window.ai.model.connect({ model: 'llama3.2' });
const embedding = await session.embed({
  input: 'my text to encode',
});

console.log(embedding.embeddings);

WebIDL

interface Message {
  attribute DOMString role;
  attribute DOMString content;
};

typedef DOMString ModelName;

dictionary ModelDetails {
  required DOMString parent_model;
  required DOMString format;
  required DOMString family;
  required sequence<DOMString> families;
  required DOMString parameter_size;
  required DOMString quantization_level;
};

dictionary ModelInfo {
  required ModelName model;
  required DOMString license;
  required ModelDetails details;
};

dictionary Options {
  double? temperature = null;
  unsigned long? stop = null;
  unsigned long? seed = null;
  double repeat_penalty;
  double presence_penalty;
  double frequency_penalty;
  unsigned long top_k;
  double top_p;
};

dictionary EmbedOptions {
  required DOMString model;
  required (DOMString or sequence<DOMString>) input;
  boolean truncate = false;
  (DOMString or unsigned long)? keep_alive = null;
  Options? options;
};

dictionary EmbedResponse {
  required DOMString model;
  required sequence<sequence<double>> embeddings;
};

dictionary ChatOptions {
  required ModelName model;
  required sequence<Message> messages;
  DOMString? format = null;
  Options? options;
};

dictionary ModelInfoOptions {
  required ModelName model;
};

dictionary ConnectSessionOptions {
  required ModelName model;
};

enum FinishReason {
  "stop",
  "length",
  "tool_calls",
  "content_filter",
  "function_call"
};

dictionary ChatChoice {
  required Message message;
  required FinishReason finish_reason;
};

dictionary ChatResponseUsage {
  required double total_duration;
  required double load_duration;
  required unsigned long prompt_eval_count;
  required double prompt_eval_duration;
  required unsigned long eval_count;
  required double eval_duration;
};

dictionary RequestOptions {
  required ModelName model;
  boolean silent = false;
};

dictionary ChatResponse {
  required DOMString id;
  required sequence<ChatChoice> choices;
  required DOMTimeStamp created;
  required ModelName model;
  required ChatResponseUsage usage;
};

interface ModelSession {
  Promise<ChatResponse> chat(ChatOptions options);
  Promise<EmbedResponse> embed(EmbedOptions options);
};

dictionary AIModel {
  required ModelName model;
  required boolean enabled;
};

interface Permissions {
  Promise<sequence<AIModel>> models();
  Promise<boolean> request(RequestOptions options);
};

interface Model {
  Promise<ModelInfo> info(ModelInfoOptions options);
  Promise<ModelSession> connect(ConnectSessionOptions options);
};

interface AIInterface {
  readonly attribute Permissions permissions;
  readonly attribute Model model;
};

// Expose the AIInterface on the window's ai property
partial interface Window {
  readonly attribute AIInterface ai;
};

Key Benefits

  1. Privacy: User data stays on their device—nothing gets sent to remote servers.
  2. Low Latency: No server round-trips mean faster responses.
  3. Offline Capability: AI apps work even without an internet connection.
  4. Reduced Costs: Developers don’t need expensive infrastructure to serve models.
  5. User Control: Users can decide which models to enable, and they have the power to revoke permissions at any time.

Use Cases

This API opens the door for all kinds of innovative web applications:

  • AI-powered text editors that assist with writing without sacrificing privacy.
  • Language translation tools that run locally.
  • Intelligent form auto-completion to streamline data entry.
  • Creative tools that help users generate images, music, or video without needing a connection.
  • Offline chatbots or virtual assistants that don’t depend on cloud services.

Technical Considerations

  • Model Distribution: Models could be distributed at the OS level or through the browser itself. There are pros and cons to both approaches. OS-level distribution would allow broader access and updates, while browser-based distribution would be easier to roll out without coordination with OS teams.
  • Security: We need to prevent malicious sites from misusing models, and ensure permission requests are transparent and easy to manage for users.
  • Performance: Running models in the browser has its challenges, especially on low-power devices. The API should be designed to handle fallback mechanisms, where smaller models can be used if needed.
  • Cross-Browser Support: This API needs to work consistently across all major browsers.

Other Considerations

There are similar proposals, such as Prompt API proposal

I see this as an alternative approach to implementing AI APIs, one that makes them more open and flexible. This proposal isn’t about focusing on specific tasks like prompt generation or summarization. Instead, it’s about creating a bridge to a model runtime that gives access to models suited to each specific use case. I think building separate APIs for each task—like prompt-specific or translation-specific APIs—is the wrong direction. It’s better to have an API that lets the developer or user choose the model with the right capabilities for their needs.


Feedback

Please provide all feedback below.

@tomayac
Copy link

tomayac commented Oct 25, 2024

(Meta comment: Are you aware of the Prompt API proposal?)

@tomayac
Copy link

tomayac commented Oct 25, 2024

(Also, did you see #147 and #163 (both accepted) for more concrete tasks like translation, language detection, summarization, writing, and rewriting?)

@kenzic
Copy link
Author

kenzic commented Oct 25, 2024

Yes, I saw those, and have worked with the prompt api, which is what inspired this proposal. I see this as an alternative approach to implementing AI APIs, one that makes them more open and flexible. This proposal isn’t about focusing on specific tasks like prompt generation or summarization. Instead, it’s about creating a bridge to a model runtime that gives access to models suited to each specific use case. I think building separate APIs for each task—like prompt-specific or translation-specific APIs—is the wrong direction. It’s better to have an API that lets the developer or user choose the model with the right capabilities for their needs.

@tomayac
Copy link

tomayac commented Oct 25, 2024

(Great, was just wondering, since your proposal didn't mention these previous efforts.)

@kenzic
Copy link
Author

kenzic commented Oct 25, 2024

Thanks for the feedback. I updated the description to include "Other Considerations" which discusses this.

@AdamSobieski
Copy link

AdamSobieski commented Jan 7, 2025

@kenzic, hello. I have two quick feedback items.

Firstly, your proposed API includes models' names and versions together in ModelName, e.g., "llama3.2". What about separating models' monikers from their versions?

// Request permission for a specific model
const session = await window.ai.model.connect({
  model: "llama", version: "3.2"
});

Or, perhaps, adding more parameters?

// Request permission for a specific model
const session = await window.ai.model.connect({
  model: "llama", version: "3.2", size: "11B", lang: "en", publicKeyToken: "..."
});

Secondly, it appears that your API allows iterating available models:

// Get list of available models
const models = await window.ai.permissions.models();

and that the Prompt API presently doesn't.

I think that this feature would be useful for an eventual API to enable content negotiation between computers. A client might have multiple models available with which to, for instance, create an embedding vector, and a server might recognize embedding vectors from multiple models. The two computers could perform content negotiation to select which available model to make use of to generate the embedding vector with. Did I explain that well?

@kenzic
Copy link
Author

kenzic commented Jan 7, 2025

Hi @AdamSobieski - Thanks for the feedback.

I agree that having a version parameter could be helpful. The reason I didn't include it is because most of the providers, such as openai, ollama, anthropic include the version in the name. For example, gpt-4o-2024-11-20. However, I'm open to changing this. Do you think it's worth departing from this convention?

Did I explain that well?

From what I understand, you're saying this API allows the website accessing the api to see which models have been made available to it by the browser, which is helpful. This is something the Prompt API doesn't do, and would be helpful to add to it as well?

@AdamSobieski
Copy link

AdamSobieski commented Jan 8, 2025

Benefits of separating out the version for these purposes include that the semantics of connecting to a model without providing an explicit version could be such as to request the newest, or latest, version available. So:

// Request permission for a specific model
const session = await window.ai.model.connect({
  model: "llama"
});

could describe a request to connect to the highest version of llama available from the user. This would also simplify requiring at least a certain version, for instance "3.2". A Web developer might simply request to connect to the latest version and then check its version value.

However, there are also the numbers of model parameters to consider. With respect to "llama", at version "3.2", there are different sizes, "1B", "3B", "11B", and "90B", available for download.

From what I understand, you're saying this API allows the website accessing the api to see which models have been made available to it by the browser, which is helpful. This is something the Prompt API doesn't do, and would be helpful to add to it as well?

Yes. Thank you. In my opinion, being able to list the available models would be useful – both those locally available on a client device and those available to it per downloading.

I'm trying to envision how two computers might negotiate which model or models to utilize when each could make use of more than one model (e.g., to exchange embedding vectors).

It could be that one party, e.g., a server, would provide a set of content-negotiation headers with values describing model aspects (like the Accept, Accept-Language HTTP headers) and that the other party, e.g., a client, could use these headers and their values, through a method resembling connect(), without having to list, iterate, or enumerate all of the models available on or to the computer.

@kenzic
Copy link
Author

kenzic commented Jan 8, 2025

You raise some good points about the versions and sizes. I'd like to keep the api as consistent with other providers as possible. Let me think about it.

Regarding negotiating models with prompt api: I don't manage that spec, but I'd recommend you reach out them. It's a great idea, and one I thought important to add to this spec.

To do it with this implementation you'd do something like:

// Get list of available models
const models = await window.ai.permissions.models();

if (models.includes('llama3.2')) {
  // Request permission for a specific model

  const granted = await window.ai.permissions.request({
    model: "llama3.2"
  });
  
  if (!granted) { 
    // handle
  } else {
    const session = await window.ai.model.connect({
      model: "llama3.2"
    });
  }
  
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants