Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Doge model #35891

Open
wants to merge 39 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
2a38123
Add Doge Model
LoserCheems Jan 25, 2025
5c96118
Fix code quality
LoserCheems Jan 25, 2025
0f689d6
Rollback an error commit
LoserCheems Jan 25, 2025
229cdca
Fix config for open-source weights
LoserCheems Jan 26, 2025
749dbcd
Revert "Fix config for open-source weights"
LoserCheems Jan 26, 2025
e2f5c36
Merge branch 'huggingface:main' into add-doge-model
LoserCheems Jan 27, 2025
e66977e
Merge branch 'main' into add-doge-model
LoserCheems Jan 27, 2025
ca7630a
Add modular_doge
LoserCheems Jan 27, 2025
17388cf
Merge branch 'main' into add-doge-model
LoserCheems Jan 27, 2025
79c0659
Update Doge inherits from Llama
LoserCheems Jan 28, 2025
f4d895c
Merge branch 'main' into add-doge-model
LoserCheems Jan 28, 2025
941d6b5
Fix import bug
LoserCheems Jan 28, 2025
4958ff1
Merge branch 'main' into add-doge-model
LoserCheems Jan 28, 2025
1466142
[docs] Add usage of doge model
LoserCheems Jan 28, 2025
aa4fcfd
Fix Doge import pretrainedconfig from modeling_utils to configuration…
LoserCheems Jan 28, 2025
c346728
Merge branch 'main' into add-doge-model
LoserCheems Jan 28, 2025
7cbea89
[docs] remove trust remote code from doge
LoserCheems Jan 28, 2025
cdcbd34
Merge branch 'main' into add-doge-model
LoserCheems Jan 28, 2025
c935266
Fix dynamo bug in doge model
LoserCheems Jan 29, 2025
3ab3187
Merge branch 'main' into add-doge-model
LoserCheems Jan 30, 2025
2c7e1c8
Merge branch 'main' into add-doge-model
LoserCheems Jan 30, 2025
9612ddb
Update docstrings
LoserCheems Jan 31, 2025
5f7545d
Merge branch 'main' into add-doge-model
LoserCheems Jan 31, 2025
decc891
Merge branch 'main' into add-doge-model
LoserCheems Feb 1, 2025
2e14753
Merge branch 'main' into add-doge-model
LoserCheems Feb 3, 2025
69f65d6
Merge branch 'main' into add-doge-model
LoserCheems Feb 4, 2025
8e09475
Import apply_rotary_pos_emb and repeat_kv from Llama
LoserCheems Feb 4, 2025
7ae953c
Merge branch 'main' into add-doge-model
LoserCheems Feb 4, 2025
ee6de6b
Merge branch 'main' into add-doge-model
LoserCheems Feb 5, 2025
49bca1f
Fix all nits
LoserCheems Feb 6, 2025
f9e6581
Merge branch 'main' into add-doge-model
LoserCheems Feb 6, 2025
d33af94
Fix code quality
LoserCheems Feb 6, 2025
e064520
Fix some bugs
LoserCheems Feb 6, 2025
408418d
Merge branch 'main' into add-doge-model
LoserCheems Feb 6, 2025
be9afcd
Fix code quality
LoserCheems Feb 6, 2025
1c99852
Remove inherited `_update_causal_mask` from Llama
LoserCheems Feb 6, 2025
6ae4982
Fix the wrong tensor orderings in DogeCDMoE
LoserCheems Feb 6, 2025
0835250
Merge branch 'main' into add-doge-model
LoserCheems Feb 6, 2025
b88f2de
Merge branch 'main' into add-doge-model
LoserCheems Feb 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,8 @@
title: DiffLlama
- local: model_doc/distilbert
title: DistilBERT
- local: model_doc/doge
title: Doge
- local: model_doc/dpr
title: DPR
- local: model_doc/electra
Expand Down
1 change: 1 addition & 0 deletions docs/source/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@ Flax), PyTorch, and/or TensorFlow.
| [DINOv2 with Registers](model_doc/dinov2_with_registers) | ✅ | ❌ | ❌ |
| [DistilBERT](model_doc/distilbert) | ✅ | ✅ | ✅ |
| [DiT](model_doc/dit) | ✅ | ❌ | ✅ |
| [Doge](model_doc/doge) | ✅ | ❌ | ❌ |
| [DonutSwin](model_doc/donut) | ✅ | ❌ | ❌ |
| [DPR](model_doc/dpr) | ✅ | ✅ | ❌ |
| [DPT](model_doc/dpt) | ✅ | ❌ | ❌ |
Expand Down
103 changes: 103 additions & 0 deletions docs/source/en/model_doc/doge.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# Doge


## Overview

Doge is a series of small language models based on the [Doge](https://github.com/LoserCheems/WonderfulMatrices) architecture, aiming to combine the advantages of state-space and self-attention algorithms, calculate dynamic masks from cached value states using the zero-order hold method, and solve the problem of existing mainstream language models getting lost in context. It uses the `wsd_scheduler` scheduler to pre-train on the `smollm-corpus`, and can continue training on new datasets or add sparse activation feedforward networks from stable stage checkpoints.

<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/refs%2Fpr%2F426/transformers/model_doc/doge_architecture.png" alt="drawing" width="600"/>

As shown in the figure below, the sequence transformation part of the Doge architecture uses `Dynamic Mask Attention`, which can be understood as using self-attention related to value states during training, and using state-space without past state decay during inference, to solve the problem of existing Transformers or SSMs getting lost in long text. The state transformation part of Doge uses `Cross Domain Mixture of Experts`, which consists of dense linear layers and sparse embedding layers, and can additionally increase sparse parameters to continue training from dense weight checkpoints without retraining the entire model, thereby reducing the cost of continuous iteration of the model. In addition, Doge also uses `RMSNorm` and `Residual` with learnable parameters to adapt the gradient range of deep models.

Checkout all Doge model checkpoints [here](https://huggingface.co/collections/SmallDoge/doge-slm-679cc991f027c4a3abbded4a).


## Usage

<details>
<summary>Using Doge-Base for text generation</summary>

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("SmallDoge/Doge-20M")
model = AutoModelForCausalLM.from_pretrained("SmallDoge/Doge-20M")
inputs = tokenizer("Hey how are you doing?", return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.batch_decode(outputs))
```
</details>

<details>
<summary>Using Doge-Instruct for question answering</summary>

```python
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, TextStreamer

tokenizer = AutoTokenizer.from_pretrained("SmallDoge/Doge-20M-Instruct")
model = AutoModelForCausalLM.from_pretrained("SmallDoge/Doge-20M-Instruct")

generation_config = GenerationConfig(
max_new_tokens=100,
use_cache=True,
do_sample=True,
temperature=0.8,
top_p=0.9,
repetition_penalty=1.0
)
steamer = TextStreamer(tokenizer=tokenizer, skip_prompt=True)

prompt = "Hi, how are you doing today?"
conversation = [
{"role": "user", "content": prompt}
]
inputs = tokenizer.apply_chat_template(
conversation=conversation,
tokenize=True,
return_tensors="pt",
)

outputs = model.generate(
inputs,
tokenizer=tokenizer,
generation_config=generation_config,
streamer=steamer
)
```
</details>

## DogeConfig

[[autodoc]] DogeConfig

## DogeModel

[[autodoc]] DogeModel
- forward

## DogeForCausalLM

[[autodoc]] DogeForCausalLM
- forward

## DogeForSequenceClassification

[[autodoc]] DogeForSequenceClassification
- forward
1 change: 1 addition & 0 deletions docs/source/en/perf_infer_gpu_one.md
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,7 @@ For now, Transformers supports SDPA inference and training for the following arc
* [Dinov2](https://huggingface.co/docs/transformers/en/model_doc/dinov2)
* [Dinov2_with_registers](https://huggingface.co/docs/transformers/en/model_doc/dinov2)
* [DistilBert](https://huggingface.co/docs/transformers/model_doc/distilbert#transformers.DistilBertModel)
* [Doge](https://huggingface.co/docs/transformers/model_doc/doge#transformers.DogeModel)
* [Dpr](https://huggingface.co/docs/transformers/model_doc/dpr#transformers.DprReader)
* [EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder_decoder#transformers.EncoderDecoderModel)
* [Emu3](https://huggingface.co/docs/transformers/model_doc/emu3)
Expand Down
16 changes: 16 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,7 @@
"DistilBertTokenizer",
],
"models.dit": [],
"models.doge": ["DogeConfig"],
"models.donut": [
"DonutProcessor",
"DonutSwinConfig",
Expand Down Expand Up @@ -2244,6 +2245,14 @@
"DistilBertPreTrainedModel",
]
)
_import_structure["models.doge"].extend(
[
"DogeForCausalLM",
"DogeForSequenceClassification",
"DogeModel",
"DogePreTrainedModel",
]
)
_import_structure["models.donut"].extend(
[
"DonutSwinModel",
Expand Down Expand Up @@ -5518,6 +5527,7 @@
DistilBertConfig,
DistilBertTokenizer,
)
from .models.doge import DogeConfig
from .models.donut import (
DonutProcessor,
DonutSwinConfig,
Expand Down Expand Up @@ -7237,6 +7247,12 @@
DistilBertModel,
DistilBertPreTrainedModel,
)
from .models.doge import (
DogeForCausalLM,
DogeForSequenceClassification,
DogeModel,
DogePreTrainedModel,
)
from .models.donut import (
DonutSwinModel,
DonutSwinPreTrainedModel,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@
dinov2_with_registers,
distilbert,
dit,
doge,
donut,
dpr,
dpt,
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@
("dinov2", "Dinov2Config"),
("dinov2_with_registers", "Dinov2WithRegistersConfig"),
("distilbert", "DistilBertConfig"),
("doge", "DogeConfig"),
("donut-swin", "DonutSwinConfig"),
("dpr", "DPRConfig"),
("dpt", "DPTConfig"),
Expand Down Expand Up @@ -426,6 +427,7 @@
("dinov2_with_registers", "DINOv2 with Registers"),
("distilbert", "DistilBERT"),
("dit", "DiT"),
("doge", "Doge"),
("donut-swin", "DonutSwin"),
("dpr", "DPR"),
("dpt", "DPT"),
Expand Down
3 changes: 3 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@
("dinov2", "Dinov2Model"),
("dinov2_with_registers", "Dinov2WithRegistersModel"),
("distilbert", "DistilBertModel"),
("doge", "DogeModel"),
("donut-swin", "DonutSwinModel"),
("dpr", "DPRQuestionEncoder"),
("dpt", "DPTModel"),
Expand Down Expand Up @@ -506,6 +507,7 @@
("data2vec-text", "Data2VecTextForCausalLM"),
("dbrx", "DbrxForCausalLM"),
("diffllama", "DiffLlamaForCausalLM"),
("doge", "DogeForCausalLM"),
("electra", "ElectraForCausalLM"),
("emu3", "Emu3ForCausalLM"),
("ernie", "ErnieForCausalLM"),
Expand Down Expand Up @@ -991,6 +993,7 @@
("deberta-v2", "DebertaV2ForSequenceClassification"),
("diffllama", "DiffLlamaForSequenceClassification"),
("distilbert", "DistilBertForSequenceClassification"),
("doge", "DogeForSequenceClassification"),
("electra", "ElectraForSequenceClassification"),
("ernie", "ErnieForSequenceClassification"),
("ernie_m", "ErnieMForSequenceClassification"),
Expand Down
28 changes: 28 additions & 0 deletions src/transformers/models/doge/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# coding=utf-8
# Copyright 2024 Jingze Shi and the HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import _LazyModule
from ...utils.import_utils import define_import_structure


if TYPE_CHECKING:
from .configuration_doge import *
from .modeling_doge import *
else:
import sys

_file = globals()["__file__"]
sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__)
Loading