diff --git a/.gitignore b/.gitignore
index 36e2df362..5b1de5ab8 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,2 +1,3 @@
 __pycache__
+.mypy_cache/
 models/
diff --git a/DEVELOPERS.md b/DEVELOPERS.md
index 078999b0e..57fd519f0 100644
--- a/DEVELOPERS.md
+++ b/DEVELOPERS.md
@@ -28,6 +28,7 @@ pip3 install -r requirements.txt
 Download the model data
 ```
 python3 download_model.py 117M
+python3 download_model.py 345M
 ```
 
 ## Docker Installation
diff --git a/Dockerfile.cpu b/Dockerfile.cpu
index bb4bcb712..a02d2b320 100644
--- a/Dockerfile.cpu
+++ b/Dockerfile.cpu
@@ -6,3 +6,4 @@ WORKDIR /gpt-2
 ADD . /gpt-2
 RUN pip3 install -r requirements.txt
 RUN python3 download_model.py 117M
+RUN python3 download_model.py 345M
diff --git a/Dockerfile.gpu b/Dockerfile.gpu
index b7b013bd3..b3f87db14 100644
--- a/Dockerfile.gpu
+++ b/Dockerfile.gpu
@@ -15,3 +15,4 @@ WORKDIR /gpt-2
 ADD . /gpt-2
 RUN pip3 install -r requirements.txt
 RUN python3 download_model.py 117M
+RUN python3 download_model.py 345M
diff --git a/README.md b/README.md
index c1be039bc..2b83621dc 100644
--- a/README.md
+++ b/README.md
@@ -2,23 +2,23 @@
 
 Code and samples from the paper ["Language Models are Unsupervised Multitask Learners"](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf).
 
-For now, we have only released a smaller (117M parameter) version of GPT-2.
+We have currently released small (117M parameter) and medium (345M parameter) versions of GPT-2.
 
 See more details in our [blog post](https://blog.openai.com/better-language-models/).
 
 ## Usage
 
-This repository is meant to be a starting point for researchers and engineers to experiment with GPT-2-117M.  While GPT-2-117M is less proficient than GPT-2-1.5B, it is useful for a wide range of research and applications which could also apply to larger models.
+This repository is meant to be a starting point for researchers and engineers to experiment with GPT-2.
 
 ### Some caveats
 
-- GPT-2-117M robustness and worst case behaviors are not well-understood.  As with any machine-learned model, carefully evaluate GPT-2-117M for your use case, especially if used without fine-tuning or in safety-critical applications where reliability is important.
-- The dataset our GPT-2-117M was trained on contains many texts with [biases](https://twitter.com/TomerUllman/status/1101485289720242177) and factual inaccuracies, and thus GPT-2-117M is likely to be biased and inaccurate as well.
+- GPT-2 models' robustness and worst case behaviors are not well-understood.  As with any machine-learned model, carefully evaluate GPT-2 for your use case, especially if used without fine-tuning or in safety-critical applications where reliability is important.
+- The dataset our GPT-2 models were trained on contains many texts with [biases](https://twitter.com/TomerUllman/status/1101485289720242177) and factual inaccuracies, and thus GPT-2 models are likely to be biased and inaccurate as well.
 - To avoid having samples mistaken as human-written, we recommend clearly labeling samples as synthetic before wide dissemination.  Our models are often incoherent or inaccurate in subtle ways, which takes more than a quick read for a human to notice.
 
 ### Work with us
 
-Please [let us know](mailto:languagequestions@openai.com) if you’re doing interesting research with or working on applications of GPT-2-117M!  We’re especially interested in hearing from and potentially working with those who are studying
+Please [let us know](mailto:languagequestions@openai.com) if you’re doing interesting research with or working on applications of GPT-2!  We’re especially interested in hearing from and potentially working with those who are studying
 - Potential malicious use cases and defenses against them (e.g. the detectability of synthetic text)
 - The extent of problematic content (e.g. bias) being baked into the models and effective mitigations