You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ref: #7553 , required for supporting future vision models (#8010)
I initially planned to make a proposal PR for this, but turns out it's quite more complicated than I thought. Therefore, I create this issue for further discussion before actually implement it.
Possible Refactor Approaches
The problem can be divided into 2 parts:
How the llama_batch can be constructed?
How the cgraph should be modified?
For the second part (How the cgraph should be modified?), it should be simple: llm_build_inp_embd can be modified to concat tensors from "learned" embd and input embd. The pseudo-code looks like this:
This seems to be easier to implement than all other proposals. The only thing I'm not sure is that do we expect negative token ID to be a reserved use case?
Proposal 4: Completely refactor llama_batch to accept sequence list instead of token list
This is actually proposed by @slaren , but I'm not sure what it should look like in real world. Could you please explain it further?
I'm also tagging @ggerganov and @abetlen for futher discussion. Thank you!
The text was updated successfully, but these errors were encountered:
This is actually proposed by @slaren , but I'm not sure what it should look like in real world. Could you please explain it further?
structllama_batch_seq {
intn_tokens;
llama_token*token; // only one of token and embd can be non-nullfloat*embd;
enum { none, all, last } logits;
intseq_id;
intpos; // -1 = end
};
structllama_batch {
intn_seqs;
llama_batch_seq*seqs;
};
Background Description
Ref: #7553 , required for supporting future vision models (#8010)
I initially planned to make a proposal PR for this, but turns out it's quite more complicated than I thought. Therefore, I create this issue for further discussion before actually implement it.
Possible Refactor Approaches
The problem can be divided into 2 parts:
llama_batch
can be constructed?For the second part (How the cgraph should be modified?), it should be simple:
llm_build_inp_embd
can be modified to concat tensors from "learned" embd and input embd. The pseudo-code looks like this:The attention mask also need to be updated accordingly.
For the first part (How the
llama_batch
can be constructed?), the problem is that there are many different possible approach:Proposal 1: Add
n_embds
tollama_batch
The downside of this approach is that it's quite messy to keep track of
n_seq_id
,seq_id
,logits
Proposal 2: Add an overload version of
llama_decode/encode
The downside would be that this is kinda a "hacky" (not intuitive for developers), because one batch is now represented by 2
llama_batch
objects.Proposal 3: Keep
llama_batch
the same, but tokens ID < 0 are embeddingsFor example:
This seems to be easier to implement than all other proposals. The only thing I'm not sure is that do we expect negative token ID to be a reserved use case?
Proposal 4: Completely refactor
llama_batch
to accept sequence list instead of token listThis is actually proposed by @slaren , but I'm not sure what it should look like in real world. Could you please explain it further?
I'm also tagging @ggerganov and @abetlen for futher discussion. Thank you!
The text was updated successfully, but these errors were encountered: