-
Notifications
You must be signed in to change notification settings - Fork 297
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
6 changed files
with
35 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,23 @@ | ||
# Full Finetuninhg LLama2/LLama3 Optimized configuration | ||
|
||
In the pre-training section you saw the steps on how to do pre-training with | ||
MaxText. To perform full fine tuning, you need to pass the checkpoint to the | ||
training script. | ||
|
||
Following is the parameter to assign a checkpoint to the training script. | ||
|
||
- `load_parameters_path`: Path to the checkpoint directory | ||
|
||
The high level steps involve: | ||
- Converting the model checkpoints to MaxText formatted checkpoints | ||
- Preparing the dataset so that data can be fed into the training script. | ||
MaxText provides sample pipelines to load the data via tf.data or Pygrain from | ||
a disk or gcs bucket. Or it can also input data directly from the hugging face | ||
dataset. | ||
- Running the training script with the checkpoint | ||
- Note: You may need to change the training parameters to fit the model to the | ||
TPU or GPU shape and to obtain an optimized performance. | ||
|
||
## Parameters to achieve high MFU | ||
|
||
This page is in progress. | ||
This content is in progress. |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Terminologies | ||
|
||
- **FLOP**: Floating Point Operation | ||
- **FLOPS**: Plural form of FLOP | ||
- **FLOP/s** or **FLOPs**: stands for Floating Point Operations Per Second. | ||
- **MFU**: Model FLOP/s Utilization | ||
- **ICI**: Interchip-interconnect. | ||
- **HBM**: High Bandwidth Memory. Built with DRAM technology. Each chip usually has XX GiBs of HBM. | ||
- **VMEM**: Vector Memory. Built with SRAM technology. Each chip usually has XX MiBs of VMEM. | ||
- **DCN**: Data Center Network | ||
- **PCIe**: Peripheral Component Interconnect Express. How the TPUs communicate with the CPU. | ||
- **AI**: Arithmetic Intensity | ||
- **Rank**: Position or ID of a worker within a group of workers. This is not the same as Rank in linear algebra. |