huggingface · LoserCheems · Jan 25, 2025 · Jan 25, 2025 · Jan 25, 2025 · Jan 26, 2025
diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
@@ -388,6 +388,8 @@
         title: DiffLlama
       - local: model_doc/distilbert
         title: DistilBERT
+      - local: model_doc/doge
+        title: Doge
       - local: model_doc/dpr
         title: DPR
       - local: model_doc/electra

diff --git a/docs/source/en/index.md b/docs/source/en/index.md
@@ -133,6 +133,7 @@ Flax), PyTorch, and/or TensorFlow.
 |         [DINOv2 with Registers](model_doc/dinov2_with_registers)         |       ✅        |         ❌         |      ❌      |
 |                    [DistilBERT](model_doc/distilbert)                    |       ✅        |         ✅         |      ✅      |
 |                           [DiT](model_doc/dit)                           |       ✅        |         ❌         |      ✅      |
+|                          [Doge](model_doc/doge)                          |       ✅        |         ❌         |      ❌      |
 |                       [DonutSwin](model_doc/donut)                       |       ✅        |         ❌         |      ❌      |
 |                           [DPR](model_doc/dpr)                           |       ✅        |         ✅         |      ❌      |
 |                           [DPT](model_doc/dpt)                           |       ✅        |         ❌         |      ❌      |

diff --git a/docs/source/en/model_doc/doge.md b/docs/source/en/model_doc/doge.md
@@ -0,0 +1,103 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# Doge
+
+
+## Overview
+
+Doge is a series of small language models based on the [Doge](https://github.com/LoserCheems/WonderfulMatrices) architecture, aiming to combine the advantages of state-space and self-attention algorithms, calculate dynamic masks from cached value states using the zero-order hold method, and solve the problem of existing mainstream language models getting lost in context. It uses the `wsd_scheduler` scheduler to pre-train on the `smollm-corpus`, and can continue training on new datasets or add sparse activation feedforward networks from stable stage checkpoints.
+
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/refs%2Fpr%2F426/transformers/model_doc/doge_architecture.png" alt="drawing" width="600"/>
+
+As shown in the figure below, the sequence transformation part of the Doge architecture uses `Dynamic Mask Attention`, which can be understood as using self-attention related to value states during training, and using state-space without past state decay during inference, to solve the problem of existing Transformers or SSMs getting lost in long text. The state transformation part of Doge uses `Cross Domain Mixture of Experts`, which consists of dense linear layers and sparse embedding layers, and can additionally increase sparse parameters to continue training from dense weight checkpoints without retraining the entire model, thereby reducing the cost of continuous iteration of the model. In addition, Doge also uses `RMSNorm` and `Residual` with learnable parameters to adapt the gradient range of deep models.
+
+Checkout all Doge model checkpoints [here](https://huggingface.co/collections/SmallDoge/doge-slm-679cc991f027c4a3abbded4a).
+
+
+## Usage
+
+<details>
+<summary>Using Doge-Base for text generation</summary>
+
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+tokenizer = AutoTokenizer.from_pretrained("SmallDoge/Doge-20M")
+model = AutoModelForCausalLM.from_pretrained("SmallDoge/Doge-20M")
+inputs = tokenizer("Hey how are you doing?", return_tensors="pt")
+
+outputs = model.generate(**inputs, max_new_tokens=100)
+print(tokenizer.batch_decode(outputs))
+```
+</details>
+
+<details>
+<summary>Using Doge-Instruct for question answering</summary>
+
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, TextStreamer
+
+tokenizer = AutoTokenizer.from_pretrained("SmallDoge/Doge-20M-Instruct")
+model = AutoModelForCausalLM.from_pretrained("SmallDoge/Doge-20M-Instruct")
+
+generation_config = GenerationConfig(
+      max_new_tokens=100, 
+      use_cache=True, 
+      do_sample=True, 
+      temperature=0.8, 
+      top_p=0.9,
+      repetition_penalty=1.0
+)
+steamer = TextStreamer(tokenizer=tokenizer, skip_prompt=True)
+
+prompt = "Hi, how are you doing today?"
+conversation = [
+      {"role": "user", "content": prompt}
+]
+inputs = tokenizer.apply_chat_template(
+    conversation=conversation,
+    tokenize=True,
+    return_tensors="pt",
+)
+
+outputs = model.generate(
+    inputs, 
+    tokenizer=tokenizer,
+    generation_config=generation_config, 
+    streamer=steamer
+)
+```
+</details>
+
+## DogeConfig
+
+[[autodoc]] DogeConfig
+
+## DogeModel
+
+[[autodoc]] DogeModel
+    - forward
+
+## DogeForCausalLM
+
+[[autodoc]] DogeForCausalLM
+    - forward
+
+## DogeForSequenceClassification
+
+[[autodoc]] DogeForSequenceClassification
+    - forward
diff --git a/docs/source/en/perf_infer_gpu_one.md b/docs/source/en/perf_infer_gpu_one.md
@@ -249,6 +249,7 @@ For now, Transformers supports SDPA inference and training for the following arc
 * [Dinov2](https://huggingface.co/docs/transformers/en/model_doc/dinov2)
 * [Dinov2_with_registers](https://huggingface.co/docs/transformers/en/model_doc/dinov2)
 * [DistilBert](https://huggingface.co/docs/transformers/model_doc/distilbert#transformers.DistilBertModel)
+* [Doge](https://huggingface.co/docs/transformers/model_doc/doge#transformers.DogeModel)
 * [Dpr](https://huggingface.co/docs/transformers/model_doc/dpr#transformers.DprReader)
 * [EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder_decoder#transformers.EncoderDecoderModel)
 * [Emu3](https://huggingface.co/docs/transformers/model_doc/emu3)

diff --git a/src/transformers/__init__.py b/src/transformers/__init__.py
@@ -412,6 +412,7 @@
         "DistilBertTokenizer",
     ],
     "models.dit": [],
+    "models.doge": ["DogeConfig"],
     "models.donut": [
         "DonutProcessor",
         "DonutSwinConfig",
@@ -2244,6 +2245,14 @@
             "DistilBertPreTrainedModel",
         ]
     )
+    _import_structure["models.doge"].extend(
+        [
+            "DogeForCausalLM",
+            "DogeForSequenceClassification",
+            "DogeModel",
+            "DogePreTrainedModel",
+        ]
+    )
     _import_structure["models.donut"].extend(
         [
             "DonutSwinModel",
@@ -5518,6 +5527,7 @@
         DistilBertConfig,
         DistilBertTokenizer,
     )
+    from .models.doge import DogeConfig
     from .models.donut import (
         DonutProcessor,
         DonutSwinConfig,
@@ -7237,6 +7247,12 @@
             DistilBertModel,
             DistilBertPreTrainedModel,
         )
+        from .models.doge import (
+            DogeForCausalLM,
+            DogeForSequenceClassification,
+            DogeModel,
+            DogePreTrainedModel,
+        )
         from .models.donut import (
             DonutSwinModel,
             DonutSwinPreTrainedModel,

diff --git a/src/transformers/models/__init__.py b/src/transformers/models/__init__.py
@@ -83,6 +83,7 @@
     dinov2_with_registers,
     distilbert,
     dit,
+    doge,
     donut,
     dpr,
     dpt,

diff --git a/src/transformers/models/auto/configuration_auto.py b/src/transformers/models/auto/configuration_auto.py
@@ -99,6 +99,7 @@
         ("dinov2", "Dinov2Config"),
         ("dinov2_with_registers", "Dinov2WithRegistersConfig"),
         ("distilbert", "DistilBertConfig"),
+        ("doge", "DogeConfig"),
         ("donut-swin", "DonutSwinConfig"),
         ("dpr", "DPRConfig"),
         ("dpt", "DPTConfig"),
@@ -426,6 +427,7 @@
         ("dinov2_with_registers", "DINOv2 with Registers"),
         ("distilbert", "DistilBERT"),
         ("dit", "DiT"),
+        ("doge", "Doge"),
         ("donut-swin", "DonutSwin"),
         ("dpr", "DPR"),
         ("dpt", "DPT"),

diff --git a/src/transformers/models/auto/modeling_auto.py b/src/transformers/models/auto/modeling_auto.py
@@ -97,6 +97,7 @@
         ("dinov2", "Dinov2Model"),
         ("dinov2_with_registers", "Dinov2WithRegistersModel"),
         ("distilbert", "DistilBertModel"),
+        ("doge", "DogeModel"),
         ("donut-swin", "DonutSwinModel"),
         ("dpr", "DPRQuestionEncoder"),
         ("dpt", "DPTModel"),
@@ -506,6 +507,7 @@
         ("data2vec-text", "Data2VecTextForCausalLM"),
         ("dbrx", "DbrxForCausalLM"),
         ("diffllama", "DiffLlamaForCausalLM"),
+        ("doge", "DogeForCausalLM"),
         ("electra", "ElectraForCausalLM"),
         ("emu3", "Emu3ForCausalLM"),
         ("ernie", "ErnieForCausalLM"),
@@ -991,6 +993,7 @@
         ("deberta-v2", "DebertaV2ForSequenceClassification"),
         ("diffllama", "DiffLlamaForSequenceClassification"),
         ("distilbert", "DistilBertForSequenceClassification"),
+        ("doge", "DogeForSequenceClassification"),
         ("electra", "ElectraForSequenceClassification"),
         ("ernie", "ErnieForSequenceClassification"),
         ("ernie_m", "ErnieMForSequenceClassification"),

diff --git a/src/transformers/models/doge/__init__.py b/src/transformers/models/doge/__init__.py
@@ -0,0 +1,28 @@
+# coding=utf-8
+# Copyright 2024 Jingze Shi and the HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import TYPE_CHECKING
+
+from ...utils import _LazyModule
+from ...utils.import_utils import define_import_structure
+
+
+if TYPE_CHECKING:
+    from .configuration_doge import *
+    from .modeling_doge import *
+else:
+    import sys
+
+    _file = globals()["__file__"]
+    sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__)
-Original file line number
+Diff line change
@@ Expand Up / @@ -83,6 +83,7 @@ @@
         dinov2_with_registers,
         distilbert,
         dit,
+        doge,
         donut,
         dpr,
         dpt,
@@ Expand Down @@