Improve content

AI-Hypercomputer · Oct 29, 2024 · 3c1ea57 · 3c1ea57
1 parent 4d0ac8e
commit 3c1ea57
Show file tree

Hide file tree

Showing 6 changed files with 35 additions and 7 deletions.
diff --git a/docs/getting_started/Run_MaxText_via_xpk.md → docs/Run_MaxText_via_xpk.md b/docs/getting_started/Run_MaxText_via_xpk.md → docs/Run_MaxText_via_xpk.md
@@ -15,7 +15,7 @@
 -->
 
 
-# How to run MaxText with XPK?
+# How to run MaxText with XPK
 
 This document focuses on steps required to setup XPK on TPU VM and assumes you have gone through the [README](https://github.com/google/xpk/blob/main/README.md) to understand XPK basics.
 

diff --git a/docs/advanced_usage.md b/docs/advanced_usage.md
@@ -3,7 +3,6 @@
 ```{toctree}
 getting_started/Run_MaxText_via_multihost_job.md
 getting_started/Run_MaxText_via_multihost_runner.md
-getting_started/Run_MaxText_via_xpk.md
 getting_started/Use_Vertex_AI_Tensorboard.md
 getting_started/Run_Llama2.md
 ```
diff --git a/docs/full_finetuning.md b/docs/full_finetuning.md
@@ -1,5 +1,23 @@
 # Full Finetuninhg LLama2/LLama3 Optimized configuration
 
+In the pre-training section you saw the steps on how to do pre-training with
+MaxText. To perform full fine tuning, you need to pass the checkpoint to the
+training script. 
+
+Following is the parameter to assign a checkpoint to the training script.
+
+- `load_parameters_path`: Path to the checkpoint directory
+
+The high level steps involve:
+- Converting the model checkpoints to MaxText formatted checkpoints
+- Preparing the dataset so that data can be fed into the training script.
+  MaxText provides sample pipelines to load the data via tf.data or Pygrain from
+  a disk or gcs bucket. Or it can also input data directly from the hugging face
+  dataset.
+- Running the training script with the checkpoint
+- Note: You may need to change the training parameters to fit the model to the
+  TPU or GPU shape and to obtain an optimized performance.
+
 ## Parameters to achieve high MFU
 
-This page is in progress.
+This content is in progress.
diff --git a/docs/gce_gke_xpk.md b/docs/gce_gke_xpk.md
diff --git a/docs/index.md b/docs/index.md
@@ -212,6 +212,7 @@ checkpointing.md
 profiling.md
 full_finetuning.md
 inference.md
-gce_gke_xpk.md
+Run_MaxText_via_xpk.md
 advanced_usage.md
+terminologies.md
 ```
diff --git a/docs/terminologies.md b/docs/terminologies.md
@@ -0,0 +1,13 @@
+# Terminologies
+
+- **FLOP**: Floating Point Operation
+- **FLOPS**: Plural form of FLOP
+- **FLOP/s** or **FLOPs**: stands for Floating Point Operations Per Second.
+- **MFU**: Model FLOP/s Utilization
+- **ICI**: Interchip-interconnect.
+- **HBM**: High Bandwidth Memory. Built with DRAM technology. Each chip usually has XX GiBs of HBM.
+- **VMEM**: Vector Memory. Built with SRAM technology. Each chip usually has XX MiBs of VMEM.
+- **DCN**: Data Center Network 
+- **PCIe**: Peripheral Component Interconnect Express. How the TPUs communicate with the CPU.
+- **AI**: Arithmetic Intensity
+- **Rank**: Position or ID of a worker within a group of workers. This is not the same as Rank in linear algebra.