Skip to content

Commit

Permalink
Improve content
Browse files Browse the repository at this point in the history
  • Loading branch information
melissawm committed Oct 29, 2024
1 parent 4d0ac8e commit 3c1ea57
Show file tree
Hide file tree
Showing 6 changed files with 35 additions and 7 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
-->


# How to run MaxText with XPK?
# How to run MaxText with XPK

This document focuses on steps required to setup XPK on TPU VM and assumes you have gone through the [README](https://github.com/google/xpk/blob/main/README.md) to understand XPK basics.

Expand Down
1 change: 0 additions & 1 deletion docs/advanced_usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
```{toctree}
getting_started/Run_MaxText_via_multihost_job.md
getting_started/Run_MaxText_via_multihost_runner.md
getting_started/Run_MaxText_via_xpk.md
getting_started/Use_Vertex_AI_Tensorboard.md
getting_started/Run_Llama2.md
```
20 changes: 19 additions & 1 deletion docs/full_finetuning.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,23 @@
# Full Finetuninhg LLama2/LLama3 Optimized configuration

In the pre-training section you saw the steps on how to do pre-training with
MaxText. To perform full fine tuning, you need to pass the checkpoint to the
training script.

Following is the parameter to assign a checkpoint to the training script.

- `load_parameters_path`: Path to the checkpoint directory

The high level steps involve:
- Converting the model checkpoints to MaxText formatted checkpoints
- Preparing the dataset so that data can be fed into the training script.
MaxText provides sample pipelines to load the data via tf.data or Pygrain from
a disk or gcs bucket. Or it can also input data directly from the hugging face
dataset.
- Running the training script with the checkpoint
- Note: You may need to change the training parameters to fit the model to the
TPU or GPU shape and to obtain an optimized performance.

## Parameters to achieve high MFU

This page is in progress.
This content is in progress.
3 changes: 0 additions & 3 deletions docs/gce_gke_xpk.md

This file was deleted.

3 changes: 2 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,7 @@ checkpointing.md
profiling.md
full_finetuning.md
inference.md
gce_gke_xpk.md
Run_MaxText_via_xpk.md
advanced_usage.md
terminologies.md
```
13 changes: 13 additions & 0 deletions docs/terminologies.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Terminologies

- **FLOP**: Floating Point Operation
- **FLOPS**: Plural form of FLOP
- **FLOP/s** or **FLOPs**: stands for Floating Point Operations Per Second.
- **MFU**: Model FLOP/s Utilization
- **ICI**: Interchip-interconnect.
- **HBM**: High Bandwidth Memory. Built with DRAM technology. Each chip usually has XX GiBs of HBM.
- **VMEM**: Vector Memory. Built with SRAM technology. Each chip usually has XX MiBs of VMEM.
- **DCN**: Data Center Network
- **PCIe**: Peripheral Component Interconnect Express. How the TPUs communicate with the CPU.
- **AI**: Arithmetic Intensity
- **Rank**: Position or ID of a worker within a group of workers. This is not the same as Rank in linear algebra.

0 comments on commit 3c1ea57

Please sign in to comment.