actually defaulting to None is the right call, throughput suffers still

stanford-crfm · Nov 12, 2024 · f8ab86b · f8ab86b
1 parent c8ea018
commit f8ab86b
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/src/levanter/models/lm_model.py b/src/levanter/models/lm_model.py
@@ -71,7 +71,7 @@ def Pos(self) -> Axis:
     def Embed(self) -> Axis:
         pass
 
-    cross_entropy_block_size: Optional[int] = 64000
+    cross_entropy_block_size: Optional[int] = None
     """
     The block size for computing cross-entropy loss. This is the number of tokens that are processed together
     in a single block. This can be adjusted to fit within memory constraints. It's deliberately set to a large