Skip to content

Commit

Permalink
Fix documentation links and images, add more info on customizing atte…
Browse files Browse the repository at this point in the history
…ntion.

PiperOrigin-RevId: 649267636
  • Loading branch information
danieldjohnson authored and Penzai Developers committed Jul 4, 2024
1 parent 2c7bee8 commit 308d7b4
Show file tree
Hide file tree
Showing 4 changed files with 37 additions and 3 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ Documentation on Penzai can be found at
> boilerplate. It also includes a more flexible transformer implementation with
> support for more pretrained model variants. You can read about the
> differences between the two APIs in the
> ["Changes in the V2 API"](v2_differences) overview.
> ["Changes in the V2 API"][v2_differences] overview.
>
> We plan to stabilize the V2 API and move it out of experimental in release
> ``0.2.0``, replacing the V1 API. If you wish to keep the V1 behavior, we
Expand Down
Binary file modified docs/_static/readme_teaser.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
34 changes: 34 additions & 0 deletions docs/guides/howto_reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -457,6 +457,40 @@ patched_model = (
where `target` is the layer to linearize, `linearize_around` computes the input that the layer should be linearized at (e.g. by modifying its input activation or returning a constant), and `evaluate_at` computes the input that the linear approximation should be evaluated at (usually the same as the original input, but can also be different).


### Customizing attention masks in `TransformerLM`

By default, most `TransformerLM` architecture variants are specialized to causal attention masks, using the `pz.nn.ApplyCausalAttentionMask` layer (or sometimes `pz.nn.ApplyCausalSlidingWindowAttentionMask`). These layers use the token positions input to build a causal attention mask and apply it to the attention logits.

If you would like to customize the attention mask computation, you can swap out these layers for `pz.nn.ApplyExplicitAttentionMask` layers, using something like

```
explicit_attn_model = (
pz.select(model)
.at_instances_of(
pz.nn.ApplyCausalAttentionMask
| pz.nn.ApplyCausalSlidingWindowAttentionMask
)
.apply(lambda old: pz.nn.ApplyExplicitAttentionMask(
mask_input_name="attn_mask",
masked_out_value=old.masked_out_value,
))
)
```

This will create a copy of the model that expects a side input called `attn_mask`, and uses it to mask the inputs. You can call it using something like

```
# tokens should have named shape {..., "seq": n_seq}
# token_positions should have named shape {..., "seq": n_seq}
# attn_mask should be a boolean array with named shape
# {..., "seq": n_seq, "kv_seq": n_seq}
token_logits = explicit_attn_model(
tokens, token_positions=token_positions, attn_mask=attn_mask
)
```

For more control, you can also define your own layer and insert it in place of the attention masking logic.

----------------------------
## Training and Fine-Tuning Models (V2 API)

Expand Down
4 changes: 2 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -141,8 +141,8 @@ Here's how you could initialize and visualize a simple neural network::
To learn more about how to build and manipulate neural networks with Penzai,
we recommend starting with the
"How to Think in Penzai" tutorial
(`V1 API version <notebooks/how_to_think_in_penzai>`,
`V2 API version <notebooks/v2_how_to_think_in_penzai>`),
(:doc:`V1 API version <notebooks/how_to_think_in_penzai>`,
:doc:`V2 API version <notebooks/v2_how_to_think_in_penzai>`),
which gives a high-level overview of how to think about and use Penzai
models. Afterward, you could:

Expand Down

0 comments on commit 308d7b4

Please sign in to comment.