Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JAIS model(s) #8118

Merged
merged 6 commits into from
Jul 2, 2024
Merged

Add JAIS model(s) #8118

merged 6 commits into from
Jul 2, 2024

Conversation

fmz
Copy link
Contributor

@fmz fmz commented Jun 25, 2024

Add support for Jais and Jais-chat: a new state-of-the-art Arabic-centric foundation and instruction-tuned open generative large language models. https://arxiv.org/abs/2308.16149

The model is essentially GPT2 with some modifications:

  1. Brand new vocabulary.
  2. ALiBi positional encoding.
  3. SwiGLU activations.
  4. Some random scaling factors

For this PR I only added support for 13B and 1.3B Jais

@@ -427,9 +427,6 @@ def get_vocab_base_pre(self, tokenizer) -> str:
# NOTE: if you get an error here, you need to update the convert-hf-to-gguf-update.py script
# or pull the latest version of the model from Huggingface
# don't edit the hashes manually!
if chkhsh == "0ef9807a4087ebef797fc749390439009c3b9eda9ad1a097abbe738f486c01e5":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure why this happened...
I ran convert-hf-to-gguf-update.py as instructed

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely you don't have access to the HF repos


return tensors


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many new lines

@@ -733,7 +733,6 @@ int main(int argc, char ** argv) {

// Console/Stream Output
fprintf(stdout, "%s", token_str.c_str());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

ggml.c Outdated
@@ -13516,13 +13516,13 @@ static void ggml_compute_forward_soft_max_f32(
} else {
for (int i = 0; i < nc; ++i) {
wp[i] += slope*mp_f32[i];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert changes in this file

llama.cpp Outdated
@@ -6700,7 +6733,6 @@ static bool llm_load_tensors(
case LLM_ARCH_BITNET:
{
model.tok_embd = ml.create_tensor(ctx_input, tn(LLM_TENSOR_TOKEN_EMBD, "weight"), {n_embd, n_vocab});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

@mofosyne mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Jun 25, 2024
@github-actions github-actions bot added the python python script changes label Jun 25, 2024
@fmz fmz force-pushed the jais branch 2 times, most recently from 22a6497 to 1a51b36 Compare June 26, 2024 16:16
@fmz
Copy link
Contributor Author

fmz commented Jun 26, 2024

@ggerganov @slaren
This PR is good to be reviewed. Please lmk if you have any comments/suggestions, especially w.r.t convert-hf-to-gguf.py

Thanks in advance!

src/llama.cpp Show resolved Hide resolved
src/llama.cpp Show resolved Hide resolved
@fmz
Copy link
Contributor Author

fmz commented Jun 28, 2024

@slaren @ggerganov
Thanks for your comments! I addressed everything so far. Let me know if there is anything else

@fmz
Copy link
Contributor Author

fmz commented Jun 28, 2024

Here is some sample output from Jais-13-chat (ignoring proper system-prompting):
1.

Instruction: Translate قهوة into English.

What is the English translation for the word قهوة؟

Pronunciation:

/kah-wah/

Defintion:

A dark, bitter fluid produced by the roasted seeds of several species of shrub of the genus Coffea. [end of text]

Instruction: جاوب باللغة الانجليزية: ما هو مشروبك المفضل؟

إجابة: My favorite drink is tea. [end of text]

src/llama.cpp Outdated
Comment on lines 4905 to 4906
// TODO: become GGUF KV parameter
hparams.f_max_alibi_bias = 8.0f;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a <arch>.attention.max_alibi_bias parameter. I know that this was copied from other archs that still hardcode this parameter, but I don't think we should do this for new archs. Instead, it should be added as metadata in convert-hf-to-gguf.py with gguf_writer.add_max_alibi_bias.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah perfect
I'll update it shortly

@fmz fmz requested a review from slaren July 1, 2024 18:20
convert-hf-to-gguf.py Outdated Show resolved Hide resolved
src/llama.cpp Outdated Show resolved Hide resolved
convert-hf-to-gguf.py Outdated Show resolved Hide resolved
@fmz
Copy link
Contributor Author

fmz commented Jul 2, 2024

@slaren @ggerganov Let me know if you have any more comments on this. If not, can you please merge it when you get the chance (I don't have permission)?

@slaren slaren merged commit 9689673 into ggerganov:master Jul 2, 2024
54 checks passed
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request Jul 2, 2024
* Add `JAIS` model(s)

* cleanup

* address review comments

* remove hack

* un-hardcode max-alibi-bias

* minor tweaks

---------

Co-authored-by: fmz <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 3, 2024
* Add `JAIS` model(s)

* cleanup

* address review comments

* remove hack

* un-hardcode max-alibi-bias

* minor tweaks

---------

Co-authored-by: fmz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants