Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from mlc-ai:main #315

Merged
merged 2 commits into from
Dec 19, 2024
Merged

[pull] main from mlc-ai:main #315

merged 2 commits into from
Dec 19, 2024

Conversation

pull[bot]
Copy link

@pull pull bot commented Dec 19, 2024

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

hrishi121 and others added 2 commits December 19, 2024 20:03
This PR adds support for Nemotron architecture, and is in reference
to #2901 [Request for Nemotron-Mini-4B-Instruct]

Based on my analysis of the Nemotron architecture in
the huggingface repository, it appears to share similarities
with the Llama architecture, but with the following key distinctions:

- The activation function used in the MLP is `relu2` (squared ReLU).
- The MLP includes `up_proj` and `down_proj`, but does not have
a `gate_proj` as seen in Llama.
- It uses `layernorm1p`, and the normalization layer incorporates a bias term.
- The architecture employs a `partial_rotary_factor`, which is similar
to the approach used in the Phi architecture.
This PR supports TP function of GPTJ Model and fix minor typo of OlMo Model.
@pull pull bot added the ⤵️ pull label Dec 19, 2024
@pull pull bot merged commit 1825fed into kp-forks:main Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants