Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update wording in the tutorials #48

Merged
24 changes: 19 additions & 5 deletions docs/tutorials/basics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,14 @@
"source": [
"# Learn the basics\n",
"\n",
"This notebook walks you through the basics of PyTorch/Zuko distributions and transformations, how to parametrize probabilistic models, how to instantiate pre-built normalizing flows and finally how to create custom flow architectures. Training is covered in other tutorials."
"This notebook walks you through \n",
"\n",
"- the basics of PyTorch/Zuko distributions and transformations, \n",
"- how to parametrize probabilistic models, \n",
"- how to instantiate pre-built normalizing flows and finally \n",
"- how to create custom flow architectures. \n",
"\n",
"Training is covered in subsequent tutorials. This tutorial requires two central imports:"
]
},
{
Expand Down Expand Up @@ -94,7 +101,7 @@
"\n",
"$$ p(X = x) = p(Z = f(x)) \\left| \\det \\frac{\\partial f(x)}{\\partial x} \\right| $$\n",
"\n",
"and sampling from $p(X)$ can be performed by first drawing realizations $z \\sim p(Z)$ and then applying the inverse transformation $x = f^{-1}(z)$. Such combination of a base distribution and a bijective transformation is sometimes called a *normalizing flow* as the base distribution is often standard normal."
"and sampling from $p(X)$ can be performed by first drawing realizations $z \\sim p(Z)$ and then applying the inverse transformation $x = f^{-1}(z)$. Such combination of a base distribution and a bijective transformation is sometimes called a *normalizing flow*. The name indicates that the base distribution is a standard *normal* distribution."
]
},
{
Expand Down Expand Up @@ -130,7 +137,7 @@
"\n",
"When designing the distributions module, the PyTorch team decided that distributions and transformations should be lightweight objects that are used as part of computations but destroyed afterwards. Consequently, the [`Distribution`](torch.distributions.distribution.Distribution) and [`Transform`](torch.distributions.transforms.Transform) classes are not sub-classes of [`torch.nn.Module`](torch.nn.Module), which means that we cannot retrieve their parameters with `.parameters()`, send their internal tensor to GPU with `.to('cuda')` or train them as regular neural networks. In addition, the concepts of conditional distribution and transformation, which are essential for probabilistic inference, are impossible to express with the current interface.\n",
"\n",
"To solve these problems, [`zuko`](zuko) defines two concepts: the [`LazyDistribution`](zuko.flows.core.LazyDistribution) and the [`LazyTransform`](zuko.flows.core.LazyTransform), which are modules whose forward pass returns a distribution or transformation, respectively. These components hold the parameters of the distributions/transformations as well as the recipe to build them, such that the actual distribution/transformation objects are lazily built and destroyed when necessary. Importantly, because the creation of the distribution/transformation object is delayed, an eventual condition can be easily taken into account. This design enables lazy distributions to act like distributions while retaining features inherent to modules, such as trainable parameters."
"To solve these problems, [`zuko`](zuko) defines two concepts: the [`LazyDistribution`](zuko.flows.core.LazyDistribution) and the [`LazyTransform`](zuko.flows.core.LazyTransform), which are modules whose forward pass returns a distribution or transformation, respectively. These components hold the parameters of the distributions/transformations as well as the recipe to build them. This way, the actual distribution/transformation objects are lazily constructed and destroyed when necessary. Importantly, because the creation of the distribution/transformation object is delayed, an eventual condition can be easily taken into account. This design enables lazy distributions to act like distributions while retaining features inherent to modules, such as trainable parameters."
]
},
{
Expand All @@ -139,7 +146,7 @@
"source": [
"### Variational inference\n",
"\n",
"Let's say we have a dataset of pairs $(x, c) \\sim p(X, C)$ and want to model the distribution of $X$ given $c$, that is $p(X | c)$. The goal of variational inference is to find the model $q_{\\phi^\\star}(X | c)$ that is most similar to $p(X | c)$ among a family of (conditional) distributions $q_\\phi(X | c)$ distinguished by their parameters $\\phi$. Expressing the dissimilarity between two distributions as their [Kullback-Leibler](https://wikipedia.org/wiki/Kullback–Leibler_divergence) (KL) divergence, the variational inference objective becomes\n",
"Let's say we have a dataset of pairs $(x, c) \\sim p(X, C)$ and want to model the distribution of $X$ given $c$, that is $p(X | c)$. The goal of variational inference is to find the model $q_{\\phi^\\star}(X | c)$ that is most similar to $p(X | c)$ among a family of (conditional) distributions $q_\\phi(X | c)$ distinguished by their parameters $\\phi$. Expressing the dissimilarity between two distributions as their [Kullback-Leibler](https://wikipedia.org/wiki/Kullback\u2013Leibler_divergence) (KL) divergence, the variational inference objective becomes\n",
"\n",
"$$\n",
" \\begin{align}\n",
Expand Down Expand Up @@ -324,6 +331,13 @@
" optimizer.zero_grad()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note, `model(c)` calls the `forward` method as described above."
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -512,7 +526,7 @@
"source": [
"### Custom architecture\n",
"\n",
"Alternatively, a flow can be built as a custom [`Flow`](zuko.flows.core.Flow) object given a sequence of lazy transformations and a base lazy distribution. Follows a condensed example of many things that are possible in Zuko. But remember, with great power comes great responsibility (and great bugs)."
"Alternatively, a flow can be built as a custom [`Flow`](zuko.flows.core.Flow) object given a sequence of lazy transformations and a base lazy distribution. The following demonstrates a condensed example of many things that are possible in Zuko. But remember, with great power comes great responsibility (and great bugs)."
]
},
{
Expand Down
44 changes: 23 additions & 21 deletions docs/tutorials/forward_kl.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
"source": [
"## Dataset\n",
"\n",
"We consider the Two Moons dataset."
"We consider the *Two Moons* dataset for demonstrative purposes."
]
},
{
Expand Down Expand Up @@ -88,7 +88,7 @@
"source": [
"## Unconditional flow\n",
"\n",
"We use a neural spline flow (NSF) as density estimator $q_\\phi(x)$."
"We use a neural spline flow (NSF) as density estimator $q_\\phi(x)$. The goal of the unconditional flow is to sample the Two Moons \"distribution\" entirely."
]
},
{
Expand Down Expand Up @@ -173,14 +173,14 @@
"name": "stdout",
"output_type": "stream",
"text": [
"(0) 1.3520090579986572 ± 0.25871574878692627\n",
"(1) 1.147993564605713 ± 0.1022777259349823\n",
"(2) 1.1174802780151367 ± 0.09858577698469162\n",
"(3) 1.0956673622131348 ± 0.1021992415189743\n",
"(4) 1.0934643745422363 ± 0.09762168675661087\n",
"(5) 1.0758651494979858 ± 0.09098420292139053\n",
"(6) 1.0708422660827637 ± 0.09713941812515259\n",
"(7) 1.0695130825042725 ± 0.09372557699680328\n"
"(0) 1.3520090579986572 \u00b1 0.25871574878692627\n",
"(1) 1.147993564605713 \u00b1 0.1022777259349823\n",
"(2) 1.1174802780151367 \u00b1 0.09858577698469162\n",
"(3) 1.0956673622131348 \u00b1 0.1021992415189743\n",
"(4) 1.0934643745422363 \u00b1 0.09762168675661087\n",
"(5) 1.0758651494979858 \u00b1 0.09098420292139053\n",
"(6) 1.0708422660827637 \u00b1 0.09713941812515259\n",
"(7) 1.0695130825042725 \u00b1 0.09372557699680328\n"
]
}
],
Expand All @@ -201,7 +201,7 @@
"\n",
" losses = torch.stack(losses)\n",
"\n",
" print(f'({epoch})', losses.mean().item(), '±', losses.std().item())"
" print(f'({epoch})', losses.mean().item(), '\u00b1', losses.std().item())"
]
},
{
Expand Down Expand Up @@ -234,7 +234,7 @@
"source": [
"## Conditional flow\n",
"\n",
"We use a neural spline flow (NSF) as density estimator $q_\\phi(x | c)$, where $c$ is the label."
"We use a neural spline flow (NSF) as density estimator $q_\\phi(x | c)$, where $c$ is the label referencing a specific part of the Two Moons 'distribution'."
]
},
{
Expand All @@ -255,14 +255,14 @@
"name": "stdout",
"output_type": "stream",
"text": [
"(0) 0.7147052884101868 ± 0.4756987988948822\n",
"(1) 0.40776583552360535 ± 0.10716820508241653\n",
"(2) 0.3866541087627411 ± 0.10318031907081604\n",
"(3) 0.37453195452690125 ± 0.10870178788900375\n",
"(4) 0.3634893000125885 ± 0.10033125430345535\n",
"(5) 0.36492055654525757 ± 0.10960724204778671\n",
"(6) 0.3537733554840088 ± 0.09780355542898178\n",
"(7) 0.3559333086013794 ± 0.1038535088300705\n"
"(0) 0.7147052884101868 \u00b1 0.4756987988948822\n",
"(1) 0.40776583552360535 \u00b1 0.10716820508241653\n",
"(2) 0.3866541087627411 \u00b1 0.10318031907081604\n",
"(3) 0.37453195452690125 \u00b1 0.10870178788900375\n",
"(4) 0.3634893000125885 \u00b1 0.10033125430345535\n",
"(5) 0.36492055654525757 \u00b1 0.10960724204778671\n",
"(6) 0.3537733554840088 \u00b1 0.09780355542898178\n",
"(7) 0.3559333086013794 \u00b1 0.1038535088300705\n"
]
}
],
Expand All @@ -285,7 +285,7 @@
"\n",
" losses = torch.stack(losses)\n",
"\n",
" print(f'({epoch})', losses.mean().item(), '±', losses.std().item())"
" print(f'({epoch})', losses.mean().item(), '\u00b1', losses.std().item())"
]
},
{
Expand All @@ -305,6 +305,7 @@
}
],
"source": [
"# sample the flow while conditioning on the 'top' part of two moons\n",
"samples = flow(torch.tensor([0.0])).sample((16384,))\n",
"\n",
"plt.figure(figsize=(4.8, 4.8))\n",
Expand All @@ -329,6 +330,7 @@
}
],
"source": [
"# sample the flow while conditioning on the 'bottom' part of two moons\n",
"samples = flow(torch.tensor([1.0])).sample((16384,))\n",
"\n",
"plt.figure(figsize=(4.8, 4.8))\n",
Expand Down
20 changes: 10 additions & 10 deletions docs/tutorials/reverse_kl.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@
"source": [
"## Flow\n",
"\n",
"We use a neural spline flow (NSF) as density estimator $q_\\phi(x)$. However, we inverse the transformation(s), which makes sampling more efficient as the inverse call of an autoregressive transformation is $D$ (where $D$ is the number of features) times slower than its forward call."
"We use a neural spline flow (NSF) as density estimator $q_\\phi(x)$. However, we invert the transformation(s), which makes sampling more efficient as the inverse call of an autoregressive transformation is $D$ (where $D$ is the number of features) times slower than its forward call."
]
},
{
Expand Down Expand Up @@ -174,14 +174,14 @@
"name": "stdout",
"output_type": "stream",
"text": [
"(0) -1.012157678604126 ± 1.0205215215682983\n",
"(1) -1.5622574090957642 ± 0.03264421969652176\n",
"(2) -1.5753192901611328 ± 0.033491406589746475\n",
"(3) -1.5814640522003174 ± 0.025743382051587105\n",
"(4) -1.5768922567367554 ± 0.04906836897134781\n",
"(5) -1.5749255418777466 ± 0.13962876796722412\n",
"(6) -1.5877153873443604 ± 0.015589614398777485\n",
"(7) -1.5886530876159668 ± 0.029878195375204086\n"
"(0) -1.012157678604126 \u00b1 1.0205215215682983\n",
"(1) -1.5622574090957642 \u00b1 0.03264421969652176\n",
"(2) -1.5753192901611328 \u00b1 0.033491406589746475\n",
"(3) -1.5814640522003174 \u00b1 0.025743382051587105\n",
"(4) -1.5768922567367554 \u00b1 0.04906836897134781\n",
"(5) -1.5749255418777466 \u00b1 0.13962876796722412\n",
"(6) -1.5877153873443604 \u00b1 0.015589614398777485\n",
"(7) -1.5886530876159668 \u00b1 0.029878195375204086\n"
]
}
],
Expand All @@ -204,7 +204,7 @@
"\n",
" losses = torch.stack(losses)\n",
"\n",
" print(f'({epoch})', losses.mean().item(), '±', losses.std().item())"
" print(f'({epoch})', losses.mean().item(), '\u00b1', losses.std().item())"
]
},
{
Expand Down