Increase BatchNorm1d test tol and fix notebook

dlsyscourse · Oct 4, 2022 · 80ed86b · 80ed86b
1 parent f7ed26e
commit 80ed86b
Show file tree

Hide file tree

Showing 2 changed files with 11 additions and 9 deletions.
diff --git a/hw2.ipynb b/hw2.ipynb
@@ -156,14 +156,16 @@
     "\n",
     "Be careful to explicitly broadcast the bias term to the correct shape -- Needle does not support implicit broadcasting.\n",
     "\n",
+    "Additionally note that, for all layers including this one, you should initialize the weight Tensor before the bias Tensor, and should initialize all Parameters using only functions from 'init'.\n",
+    "\n",
     "##### Parameters\n",
     "- `in_features` - size of each input sample\n",
     "- `out_features` - size of each output sample\n",
     "- `bias` - If set to `False`, the layer will not learn an additive bias.\n",
     "\n",
     "##### Variables\n",
     "- `weight` - the learnable weights of shape (`in_features`, `out_features`). The values should be initialized with the Kaiming Uniform initialization with `fan_in = in_features`\n",
-    "- `bias` - the learnable bias of shape (`out_features`). The values should be initialized with the Kaiming Uniform initialize with `fan_in = out_features`. **Note the different in fan_in choice, due to their relative sizes**"
+    "- `bias` - the learnable bias of shape (`out_features`). The values should be initialized with the Kaiming Uniform initialize with `fan_in = out_features`. **Note the different in fan_in choice, due to their relative sizes**. "
    ]
   },
   {
@@ -347,7 +349,7 @@
     "Applies layer normalization over a mini-batch of inputs as described in the paper [Layer Normalization](https://arxiv.org/abs/1607.06450).\n",
     "\n",
     "\\begin{equation}\n",
-    "y = w \\circ \\frac{z_i - \\textbf{E}[x]}{(\\textbf{Var}[x]+\\epsilon)^{1/2})} + b\n",
+    "y = w \\circ \\frac{x_i - \\textbf{E}[x]}{((\\textbf{Var}[x]+\\epsilon)^{1/2})} + b\n",
     "\\end{equation}\n",
     "\n",
     "where $\\textbf{E}[x]$ denotes the empirical mean of the inputs, $\\textbf{Var}[x]$ denotes their empirical variance (not that here we are using the \"unbiased\" estimate of the variance, i.e., dividing by $N$ rather than by $N-1$), and $w$ and $b$ denote learnable scalar weights and biases respectively.  Note you can assume the input to this layer by be a 2D tensor, with batches in the first dimension and features on the second.\n",
@@ -358,7 +360,7 @@
     "\n",
     "##### Variables\n",
     "- `weight` - the learnable weights of size `dim`, elements initialized to 1.\n",
-    "- `bias` - the learnable bias of shape `dim`, elements initialized to 1.\n",
+    "- `bias` - the learnable bias of shape `dim`, elements initialized to 0 **(changed from 1)**.\n",
     "___"
    ]
   },
@@ -419,13 +421,13 @@
     "Applies batch normalization over a mini-batch of inputs as described in the paper [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167).\n",
     "\n",
     "\\begin{equation}\n",
-    "y = w \\circ \\frac{z_i - \\textbf{E}[x]}{(\\textbf{Var}[x]+\\epsilon)^{1/2})} + b\n",
+    "y = w \\circ \\frac{z_i - \\textbf{E}[x]}{((\\textbf{Var}[x]+\\epsilon)^{1/2})} + b\n",
     "\\end{equation}\n",
     "\n",
     "but where here the mean and variance refer to to the mean and variance over the _batch_dimensions.  The function also computes a running average of mean/variance for all features at each layer $\\hat{\\mu}, \\hat{\\sigma}^2$, and at test time normalizes by these quantities:\n",
     "\n",
     "\\begin{equation}\n",
-    "y = \\frac{(x - \\hat{mu}}{((\\hat{\\sigma}^2_{i+1})_j+\\epsilon)^{1/2}}\n",
+    "y = \\frac{(x - \\hat{mu})}{((\\hat{\\sigma}^2_{i+1})_j+\\epsilon)^{1/2}}\n",
     "\\end{equation}\n",
     "\n",
     "\n",
@@ -745,7 +747,7 @@
    "source": [
     "### Dataloader\n",
     "\n",
-    "The Dataloader class provides an interface for assembling mini-batches of examples suitable for training using SGD-based approaches, backed by a Dataset object.  In order to build the typical Dataloader interface (allowing users to iterate over all the mini-batches in the dataset), you will need the implement the `__iter__()` and `__next__()` calls in the class: `__iter__()` is called at the start of iteration, \n",
+    "The Dataloader class provides an interface for assembling mini-batches of examples suitable for training using SGD-based approaches, backed by a Dataset object.  In order to build the typical Dataloader interface (allowing users to iterate over all the mini-batches in the dataset), you will need the implement the `__iter__()` and `__next__()` calls in the class: `__iter__()` is called at the start of iteration, while `__next__()` is called to grab the next mini-batch. Please note that subsequent calls to next will require you to return the following batches, so next is not a pure function.\n",
     "___\n",
     "\n",
     "### Dataloader\n",
@@ -807,7 +809,7 @@
     "___\n",
     "\n",
     "### MLPResNet\n",
-    "`ResidualBlock(dim, hidden_dim=100, num_blocks=3, num_classes=10, norm=nn.BatchNorm1d, drop_prob=0.1)`\n",
+    "`MLPResNet(dim, hidden_dim=100, num_blocks=3, num_classes=10, norm=nn.BatchNorm1d, drop_prob=0.1)`\n",
     "\n",
     "Implements an MLP ResNet as follows:\n",
     "\n",
@@ -830,7 +832,7 @@
     "\n",
     "`epoch(dataloader, model, opt=None)`\n",
     "\n",
-    "Executes one epoch of training or evaluation, iterating over the entire training dataset once (just like `nn_epoch` from previous homeworks). Returns the average accuracy (as a *float*) and the average loss over all samples (as a *float*). Set the model to `training` mode at the beginning of the function if `opt` is given; set the model to `eval` if `opt` is not given (i.e. `None`).\n",
+    "Executes one epoch of training or evaluation, iterating over the entire training dataset once (just like `nn_epoch` from previous homeworks). Returns the average error rate **(changed from accuracy)** (as a *float*) and the average loss over all samples (as a *float*). Set the model to `training` mode at the beginning of the function if `opt` is given; set the model to `eval` if `opt` is not given (i.e. `None`).\n",
     "\n",
     "##### Parameters\n",
     "- `dataloader` (*`needle.data.DataLoader`*) - dataloader returning samples from the training dataset\n",

diff --git a/tests/test_nn_and_optim.py b/tests/test_nn_and_optim.py
@@ -768,7 +768,7 @@ def test_nn_batchnorm_backward_affine_1():
          [ 4.6386719e-03, -8.9883804e-05, -4.5776367e-05, 4.3869019e-05],
          [-7.7133179e-03, 2.7418137e-05, 6.6757202e-05, 7.4386597e-05],
          [ 6.1874390e-03, 5.2213669e-05, 2.8610229e-05, -1.9073486e-06]],
-         dtype=np.float32), rtol=1e-5, atol=1e-5)
+         dtype=np.float32), rtol=1e-5, atol=1e-4)
 
 
 def test_nn_batchnorm_running_mean_1():