Skip to content

Commit

Permalink
Increase BatchNorm1d test tol and fix notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
eric-zheng committed Oct 4, 2022
1 parent f7ed26e commit 80ed86b
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 9 deletions.
18 changes: 10 additions & 8 deletions hw2.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -156,14 +156,16 @@
"\n",
"Be careful to explicitly broadcast the bias term to the correct shape -- Needle does not support implicit broadcasting.\n",
"\n",
"Additionally note that, for all layers including this one, you should initialize the weight Tensor before the bias Tensor, and should initialize all Parameters using only functions from 'init'.\n",
"\n",
"##### Parameters\n",
"- `in_features` - size of each input sample\n",
"- `out_features` - size of each output sample\n",
"- `bias` - If set to `False`, the layer will not learn an additive bias.\n",
"\n",
"##### Variables\n",
"- `weight` - the learnable weights of shape (`in_features`, `out_features`). The values should be initialized with the Kaiming Uniform initialization with `fan_in = in_features`\n",
"- `bias` - the learnable bias of shape (`out_features`). The values should be initialized with the Kaiming Uniform initialize with `fan_in = out_features`. **Note the different in fan_in choice, due to their relative sizes**"
"- `bias` - the learnable bias of shape (`out_features`). The values should be initialized with the Kaiming Uniform initialize with `fan_in = out_features`. **Note the different in fan_in choice, due to their relative sizes**. "
]
},
{
Expand Down Expand Up @@ -347,7 +349,7 @@
"Applies layer normalization over a mini-batch of inputs as described in the paper [Layer Normalization](https://arxiv.org/abs/1607.06450).\n",
"\n",
"\\begin{equation}\n",
"y = w \\circ \\frac{z_i - \\textbf{E}[x]}{(\\textbf{Var}[x]+\\epsilon)^{1/2})} + b\n",
"y = w \\circ \\frac{x_i - \\textbf{E}[x]}{((\\textbf{Var}[x]+\\epsilon)^{1/2})} + b\n",
"\\end{equation}\n",
"\n",
"where $\\textbf{E}[x]$ denotes the empirical mean of the inputs, $\\textbf{Var}[x]$ denotes their empirical variance (not that here we are using the \"unbiased\" estimate of the variance, i.e., dividing by $N$ rather than by $N-1$), and $w$ and $b$ denote learnable scalar weights and biases respectively. Note you can assume the input to this layer by be a 2D tensor, with batches in the first dimension and features on the second.\n",
Expand All @@ -358,7 +360,7 @@
"\n",
"##### Variables\n",
"- `weight` - the learnable weights of size `dim`, elements initialized to 1.\n",
"- `bias` - the learnable bias of shape `dim`, elements initialized to 1.\n",
"- `bias` - the learnable bias of shape `dim`, elements initialized to 0 **(changed from 1)**.\n",
"___"
]
},
Expand Down Expand Up @@ -419,13 +421,13 @@
"Applies batch normalization over a mini-batch of inputs as described in the paper [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167).\n",
"\n",
"\\begin{equation}\n",
"y = w \\circ \\frac{z_i - \\textbf{E}[x]}{(\\textbf{Var}[x]+\\epsilon)^{1/2})} + b\n",
"y = w \\circ \\frac{z_i - \\textbf{E}[x]}{((\\textbf{Var}[x]+\\epsilon)^{1/2})} + b\n",
"\\end{equation}\n",
"\n",
"but where here the mean and variance refer to to the mean and variance over the _batch_dimensions. The function also computes a running average of mean/variance for all features at each layer $\\hat{\\mu}, \\hat{\\sigma}^2$, and at test time normalizes by these quantities:\n",
"\n",
"\\begin{equation}\n",
"y = \\frac{(x - \\hat{mu}}{((\\hat{\\sigma}^2_{i+1})_j+\\epsilon)^{1/2}}\n",
"y = \\frac{(x - \\hat{mu})}{((\\hat{\\sigma}^2_{i+1})_j+\\epsilon)^{1/2}}\n",
"\\end{equation}\n",
"\n",
"\n",
Expand Down Expand Up @@ -745,7 +747,7 @@
"source": [
"### Dataloader\n",
"\n",
"The Dataloader class provides an interface for assembling mini-batches of examples suitable for training using SGD-based approaches, backed by a Dataset object. In order to build the typical Dataloader interface (allowing users to iterate over all the mini-batches in the dataset), you will need the implement the `__iter__()` and `__next__()` calls in the class: `__iter__()` is called at the start of iteration, \n",
"The Dataloader class provides an interface for assembling mini-batches of examples suitable for training using SGD-based approaches, backed by a Dataset object. In order to build the typical Dataloader interface (allowing users to iterate over all the mini-batches in the dataset), you will need the implement the `__iter__()` and `__next__()` calls in the class: `__iter__()` is called at the start of iteration, while `__next__()` is called to grab the next mini-batch. Please note that subsequent calls to next will require you to return the following batches, so next is not a pure function.\n",
"___\n",
"\n",
"### Dataloader\n",
Expand Down Expand Up @@ -807,7 +809,7 @@
"___\n",
"\n",
"### MLPResNet\n",
"`ResidualBlock(dim, hidden_dim=100, num_blocks=3, num_classes=10, norm=nn.BatchNorm1d, drop_prob=0.1)`\n",
"`MLPResNet(dim, hidden_dim=100, num_blocks=3, num_classes=10, norm=nn.BatchNorm1d, drop_prob=0.1)`\n",
"\n",
"Implements an MLP ResNet as follows:\n",
"\n",
Expand All @@ -830,7 +832,7 @@
"\n",
"`epoch(dataloader, model, opt=None)`\n",
"\n",
"Executes one epoch of training or evaluation, iterating over the entire training dataset once (just like `nn_epoch` from previous homeworks). Returns the average accuracy (as a *float*) and the average loss over all samples (as a *float*). Set the model to `training` mode at the beginning of the function if `opt` is given; set the model to `eval` if `opt` is not given (i.e. `None`).\n",
"Executes one epoch of training or evaluation, iterating over the entire training dataset once (just like `nn_epoch` from previous homeworks). Returns the average error rate **(changed from accuracy)** (as a *float*) and the average loss over all samples (as a *float*). Set the model to `training` mode at the beginning of the function if `opt` is given; set the model to `eval` if `opt` is not given (i.e. `None`).\n",
"\n",
"##### Parameters\n",
"- `dataloader` (*`needle.data.DataLoader`*) - dataloader returning samples from the training dataset\n",
Expand Down
2 changes: 1 addition & 1 deletion tests/test_nn_and_optim.py
Original file line number Diff line number Diff line change
Expand Up @@ -768,7 +768,7 @@ def test_nn_batchnorm_backward_affine_1():
[ 4.6386719e-03, -8.9883804e-05, -4.5776367e-05, 4.3869019e-05],
[-7.7133179e-03, 2.7418137e-05, 6.6757202e-05, 7.4386597e-05],
[ 6.1874390e-03, 5.2213669e-05, 2.8610229e-05, -1.9073486e-06]],
dtype=np.float32), rtol=1e-5, atol=1e-5)
dtype=np.float32), rtol=1e-5, atol=1e-4)


def test_nn_batchnorm_running_mean_1():
Expand Down

0 comments on commit 80ed86b

Please sign in to comment.