Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Browser AI API for Utilizing On-Device Models #178

Open
kenzic opened this issue Oct 25, 2024 · 5 comments
Open

Browser AI API for Utilizing On-Device Models #178

kenzic opened this issue Oct 25, 2024 · 5 comments

Comments

@kenzic
Copy link

kenzic commented Oct 25, 2024

Introduction

The Browser AI API is a proposal for a new browser feature that makes AI models accessible directly on users' devices through the browser. By offering a simple API available on the window object, this approach would allow websites to leverage AI without sending data to the cloud, preserving privacy, reducing latency, and enabling offline functionality.

This is about empowering developers to integrate advanced AI into web apps—without needing heavy infrastructure—while giving users control over their data and the models they choose to run. Imagine a world where on-device AI enhances web apps in real-time, with no data leaving the device and no reliance on external servers.

The API would let developers:

  1. Query available AI models on a users' device.
  2. Request user permission to access specific models.
  3. Create sessions with the models.
  4. Perform common tasks like text generation, embeddings, and chat.

By running models directly on the user's hardware, we’re opening up new possibilities for AI-driven web apps while keeping things secure, private, and available offline.

Prototype

I created a prototype of this concept for review: https://github.com/kenzic/browser.ai

API Overview

The proposed API would be exposed on the window.ai object with the following high-level structure:

window.ai = {
  permissions: {
    models: () => Promise<AIModel[]>,
    request: (options: RequestOptions) => Promise<boolean>
  },
  model: {
    info: (options: ModelInfoOptions) => Promise<ModelInfo>,
    connect: (options: ConnectSessionOptions) => Promise<ModelSession>
  }
}

Permissions

Before using any models, websites must first query available models and request permission:

// Get list of available models
const models = await window.ai.permissions.models();

// Request permission for a specific model
const granted = await window.ai.permissions.request({
  model: "llama3.2"
});

Model Sessions

Once permission is granted, websites can create sessions to interact with models:

const session = await window.ai.model.connect({
  model: "llama3.2"  
});

Chat

const session = await window.ai.model.connect({ model: 'llama3.2' });

const response = await session.chat({
  messages: [
    { role: 'user', content: 'Tell me a joke' }
  ],
  options: { temperature: 0.7 }
});

console.log(response.choices[0].message.content);

Embed

const session = await window.ai.model.connect({ model: 'llama3.2' });
const embedding = await session.embed({
  input: 'my text to encode',
});

console.log(embedding.embeddings);

WebIDL

interface Message {
  attribute DOMString role;
  attribute DOMString content;
};

typedef DOMString ModelName;

dictionary ModelDetails {
  required DOMString parent_model;
  required DOMString format;
  required DOMString family;
  required sequence<DOMString> families;
  required DOMString parameter_size;
  required DOMString quantization_level;
};

dictionary ModelInfo {
  required ModelName model;
  required DOMString license;
  required ModelDetails details;
};

dictionary Options {
  double? temperature = null;
  unsigned long? stop = null;
  unsigned long? seed = null;
  double repeat_penalty;
  double presence_penalty;
  double frequency_penalty;
  unsigned long top_k;
  double top_p;
};

dictionary EmbedOptions {
  required DOMString model;
  required (DOMString or sequence<DOMString>) input;
  boolean truncate = false;
  (DOMString or unsigned long)? keep_alive = null;
  Options? options;
};

dictionary EmbedResponse {
  required DOMString model;
  required sequence<sequence<double>> embeddings;
};

dictionary ChatOptions {
  required ModelName model;
  required sequence<Message> messages;
  DOMString? format = null;
  Options? options;
};

dictionary ModelInfoOptions {
  required ModelName model;
};

dictionary ConnectSessionOptions {
  required ModelName model;
};

enum FinishReason {
  "stop",
  "length",
  "tool_calls",
  "content_filter",
  "function_call"
};

dictionary ChatChoice {
  required Message message;
  required FinishReason finish_reason;
};

dictionary ChatResponseUsage {
  required double total_duration;
  required double load_duration;
  required unsigned long prompt_eval_count;
  required double prompt_eval_duration;
  required unsigned long eval_count;
  required double eval_duration;
};

dictionary RequestOptions {
  required ModelName model;
  boolean silent = false;
};

dictionary ChatResponse {
  required DOMString id;
  required sequence<ChatChoice> choices;
  required DOMTimeStamp created;
  required ModelName model;
  required ChatResponseUsage usage;
};

interface ModelSession {
  Promise<ChatResponse> chat(ChatOptions options);
  Promise<EmbedResponse> embed(EmbedOptions options);
};

dictionary AIModel {
  required ModelName model;
  required boolean enabled;
};

interface Permissions {
  Promise<sequence<AIModel>> models();
  Promise<boolean> request(RequestOptions options);
};

interface Model {
  Promise<ModelInfo> info(ModelInfoOptions options);
  Promise<ModelSession> connect(ConnectSessionOptions options);
};

interface AIInterface {
  readonly attribute Permissions permissions;
  readonly attribute Model model;
};

// Expose the AIInterface on the window's ai property
partial interface Window {
  readonly attribute AIInterface ai;
};

Key Benefits

  1. Privacy: User data stays on their device—nothing gets sent to remote servers.
  2. Low Latency: No server round-trips mean faster responses.
  3. Offline Capability: AI apps work even without an internet connection.
  4. Reduced Costs: Developers don’t need expensive infrastructure to serve models.
  5. User Control: Users can decide which models to enable, and they have the power to revoke permissions at any time.

Use Cases

This API opens the door for all kinds of innovative web applications:

  • AI-powered text editors that assist with writing without sacrificing privacy.
  • Language translation tools that run locally.
  • Intelligent form auto-completion to streamline data entry.
  • Creative tools that help users generate images, music, or video without needing a connection.
  • Offline chatbots or virtual assistants that don’t depend on cloud services.

Technical Considerations

  • Model Distribution: Models could be distributed at the OS level or through the browser itself. There are pros and cons to both approaches. OS-level distribution would allow broader access and updates, while browser-based distribution would be easier to roll out without coordination with OS teams.
  • Security: We need to prevent malicious sites from misusing models, and ensure permission requests are transparent and easy to manage for users.
  • Performance: Running models in the browser has its challenges, especially on low-power devices. The API should be designed to handle fallback mechanisms, where smaller models can be used if needed.
  • Cross-Browser Support: This API needs to work consistently across all major browsers.

Other Considerations

There are similar proposals, such as Prompt API proposal

I see this as an alternative approach to implementing AI APIs, one that makes them more open and flexible. This proposal isn’t about focusing on specific tasks like prompt generation or summarization. Instead, it’s about creating a bridge to a model runtime that gives access to models suited to each specific use case. I think building separate APIs for each task—like prompt-specific or translation-specific APIs—is the wrong direction. It’s better to have an API that lets the developer or user choose the model with the right capabilities for their needs.


Feedback

Please provide all feedback below.

@tomayac
Copy link

tomayac commented Oct 25, 2024

(Meta comment: Are you aware of the Prompt API proposal?)

@tomayac
Copy link

tomayac commented Oct 25, 2024

(Also, did you see #147 and #163 (both accepted) for more concrete tasks like translation, language detection, summarization, writing, and rewriting?)

@kenzic
Copy link
Author

kenzic commented Oct 25, 2024

Yes, I saw those, and have worked with the prompt api, which is what inspired this proposal. I see this as an alternative approach to implementing AI APIs, one that makes them more open and flexible. This proposal isn’t about focusing on specific tasks like prompt generation or summarization. Instead, it’s about creating a bridge to a model runtime that gives access to models suited to each specific use case. I think building separate APIs for each task—like prompt-specific or translation-specific APIs—is the wrong direction. It’s better to have an API that lets the developer or user choose the model with the right capabilities for their needs.

@tomayac
Copy link

tomayac commented Oct 25, 2024

(Great, was just wondering, since your proposal didn't mention these previous efforts.)

@kenzic
Copy link
Author

kenzic commented Oct 25, 2024

Thanks for the feedback. I updated the description to include "Other Considerations" which discusses this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants