zihangdai · aditya-malte · Jun 24, 2019 · Jun 24, 2019 · Jun 24, 2019 · Jun 24, 2019
diff --git a/README.md b/README.md
@@ -277,9 +277,10 @@ To run the code:
 - The SOTA performance (accuracy 81.75) of RACE is produced using XLNet-Large with sequence length 512 and batch size 32, which requires a large TPU v3-32 in the pod setting. Please refer to the script `script/tpu_race_large_bsz32.sh` for this setting.
 - Using XLNet-Large with sequence length 512 and batch size 8 on a TPU v3-8 can give you an accuracy of around 80.3 (see `script/tpu_race_large_bsz8.sh`).
 
-### Using Google Colab
+### Using Google Colab(TPU/GPU)
 
 [An example](notebooks/colab_imdb_gpu.ipynb) of using Google Colab with GPUs has been provided. Note that since the hardware is constrained in the example, the results are worse than the best we can get. It mainly serves as an example and should be modified accordingly to maximize performance.
+[An example](notebooks/colab_imdb_tpu.ipynb) of using Google Colab with TPUs has also been provided. The TPU's higher memory capacity allows us to achieve better perfomance.
 
 
 ## Custom Usage of XLNet

diff --git a/model_utils.py b/model_utils.py
@@ -41,6 +41,8 @@ def configure_tpu(FLAGS):
                     strategy.num_replicas_in_sync)
 
   per_host_input = tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2
+
+
   run_config = tf.contrib.tpu.RunConfig(
       master=master,
       model_dir=FLAGS.model_dir,
@@ -54,6 +56,20 @@ def configure_tpu(FLAGS):
       save_checkpoints_steps=FLAGS.save_steps,
       train_distribute=strategy
   )
+
+  if FLAGS.use_colab_tpu:
+    tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver(FLAGS.tpu_address)
+    is_per_host = tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2
+    run_config = tf.contrib.tpu.RunConfig(
+      cluster=tpu_cluster_resolver,
+      model_dir=FLAGS.output_dir,
+      save_checkpoints_steps=FLAGS.save_steps,
+      keep_checkpoint_max=FLAGS.max_save,
+      tpu_config=tf.contrib.tpu.TPUConfig(
+          iterations_per_loop=FLAGS.iterations,
+          num_shards=8,
+          per_host_input_for_training=is_per_host))
+
   return run_config