From 02abdc1d81724993d8ff5475681837918817f62a Mon Sep 17 00:00:00 2001
From: Andrew White <white.d.andrew@gmail.com>
Date: Thu, 20 Oct 2022 15:36:06 -0400
Subject: [PATCH 01/15] Started on molnets chapter

---
 _config.yml      |   5 +-
 dl/molnets.ipynb | 136 +++++++++++++++++++--
 references.bib   | 298 +++++++++++++++++++++++++----------------------
 3 files changed, 285 insertions(+), 154 deletions(-)

diff --git a/_config.yml b/_config.yml
index d5110f71..3cfc319f 100644
--- a/_config.yml
+++ b/_config.yml
@@ -54,8 +54,9 @@ sphinx:
       "twitter:site": "@andrewwhite01"
 
 execute:
-  timeout: -1
-  exclude_patterns          : ['dl/pretraining.ipynb']
+  timeout: 10
+  exclude_patterns          : ['livecomj/*', 'dl/pretraining.ipynb', 'applied/*', 'ml/*', 'dl/xai.ipynb', 'NLP.ipynb', 'flows.ipynb',
+  'dl/Equivariant.ipynb', 'dl/gnn.ipynb', 'dl/data.ipynb', 'VAE.ipynb', 'dl/introduction.ipynb', 'dl/Hyperparameter_tuning.ipynb', 'attention.ipynb', 'layers.ipynb']
 
 #jupyter_execute_notebooks: force
 
diff --git a/dl/molnets.ipynb b/dl/molnets.ipynb
index 71401610..ec3f22ce 100644
--- a/dl/molnets.ipynb
+++ b/dl/molnets.ipynb
@@ -7,14 +7,14 @@
    "source": [
     "# Modern Molecular NNs\n",
     "\n",
-    "We have seen two chapters about equivariances in {doc}`data` and {doc}`Equivariant`. We have seen one chapter on dealing with molecules as objects with permutation equivariance {doc}`gnn`. We will combine these ideas and create neural networks that can treat arbitrary molecules with point clouds and permutation equivariance. We already saw SchNet is able to do this by working with an invariant point cloud representation (distance to atoms), but modern networks mix in ideas from {doc}`Equivariant`. This is a highly-active research area, especially for predicting energies, forces, and relaxed structures of molecules. \n",
+    "We have seen two chapters about equivariances in {doc}`data` and {doc}`Equivariant`. We have seen one chapter on dealing with molecules as objects with permutation equivariance {doc}`gnn`. We will combine these ideas and create neural networks that can treat arbitrary molecules with point clouds and permutation equivariance. We already saw SchNet is able to do this by working with an invariant point cloud representation (distance to atoms), but modern networks mix in ideas from {doc}`Equivariant` along with graph neural networks (GNN). This is a highly-active research area, especially for predicting energies, forces, and relaxed structures of molecules.\n",
     "\n",
     "```{admonition} Audience & Objectives\n",
     "This chapter assumes you have read {doc}`data`, {doc}`Equivariant`, and {doc}`gnn`. You should be able to\n",
     "\n",
     "  * Categorize a task (features/labels) by equivariance  \n",
     "  * Understand body-ordered expansions\n",
-    "  * Differentiate models based on their message passing, message type, and body-ordering\n",
+    "  * Differentiate models based on their message passing, message type, and body-ordering  \n",
     "```"
    ]
   },
@@ -28,40 +28,103 @@
     "```"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8c695b48",
+   "metadata": {
+    "tags": [
+     "remove-cell"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "# This cell is for making plots, not part of examples\n",
+    "import rdkit, rdkit.Chem, rdkit.Chem.rdDepictor, rdkit.Chem.Draw\n",
+    "from myst_nb import glue\n",
+    "import networkx as nx\n",
+    "import dmol\n",
+    "\n",
+    "m1 = rdkit.Chem.MolFromSmiles(\"C1CCC2CCCCC2C1\")\n",
+    "m2 = rdkit.Chem.MolFromSmiles(\"C1CCC(C1)C2CCCC2\")\n",
+    "glue(\n",
+    "    \"lwtest\",\n",
+    "    rdkit.Chem.Draw.MolsToGridImage(\n",
+    "        [m1, m2],\n",
+    "        legends=[\"decaline\", \"bicylopentyl\"],\n",
+    "        useSVG=True,\n",
+    "        subImgSize=(400, 400),\n",
+    "    ),\n",
+    "    display=False,\n",
+    ")"
+   ]
+  },
   {
    "cell_type": "markdown",
-   "id": "3c224845",
+   "id": "00f4b61c",
    "metadata": {},
    "source": [
     "# Expressiveness\n",
     "\n",
     "The Equivariant SO(3) ideas from {doc}`Equivariant` will not work on variable sized molecules because the layers are not permutation equivariant. We also know that graph neural networks (GNNs) have permutation equivariance and, with the correct choice of edge features, rotation and translation invariance. So why go beyond GNNs?\n",
     "\n",
-    "One reason is that the standard GNNs cannot distinguish certain types of graphs relevant for chemistry [Wesifeiler-Lehman Test] like decaline and bicylopentyl, which indeed hvae different properties. These can be distinguished if we also have (and use) their Cartesian coordinates. \n",
+    "One reason is that the standard GNNs cannot distinguish certain types of graphs relevant for chemistry is they cannot distinguish molecules like decaline and bicylopentyl, which indeed have different properties. Look at the {numref}`decaline-bicylopentyl` below and think about the degree and neighbors of the atoms near the mixing of the rings -- you'll see if you try to use message passing the two molecules are identical. This is known as the Wesifeiler-Lehman Test {cite}`weisfeiler1968reduction`.\n",
     "\n",
-    "There is also a common example called the \"Picasso Test\", which is that rotationally invariant image neural networks cannot tell if a human eye is rotated [].\n",
+    "```{glue:figure} lwtest\n",
+    "----\n",
+    "name: decaline-bicylopentyl\n",
+    "----\n",
+    "Comparison of decaline and bicylopentyl, which have identical output in most GNNs despite being different molecules.\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3c224845",
+   "metadata": {},
+   "source": [
+    "These can be distinguished if we also have (and use) their Cartesian coordinates. We cannot distinguish enantiomers with GNNs, except maybe with pre-computed node attributes. Even those start to breakdown when we have helical chirality that is not centered at any one molecule.\n",
     "\n",
-    "In the end though, most work on molecular neural networks is for **neural potentials**. These are neural networks that predict energy and forces given atom positions and elements. We know that the force on each atom is given by\n",
+    "These are arguments for using Cartesian coordinates in addition to a GNN, but why use equivariant neural networks? Most molnet research is for **neural potentials**. These are neural networks that predict energy and forces given atom positions and elements. We know that the force on each atom is given by\n",
     "\n",
     "\\begin{equation}\n",
     "F\\left(\\vec{r}\\right) = -\\nabla U\\left(\\vec{r}\\right)\n",
     "\\end{equation}\n",
     "\n",
-    "where $U\\left(\\vec{x}\\right)$ is the rotation invariant potential given all atom positions $\\vec{r}$. So if we're predicting a translation, rotation, and permutation invariant potential, why use equivariance? Performance. Models like SchNet or ANI are invariant and are not as accurate as models like TensorNet or Comorant that have equivariances in their internal layers but an invariant readout."
+    "where $U\\left(\\vec{x}\\right)$ is the rotation invariant potential given all atom positions $\\vec{r}$. So if we're predicting a translation, rotation, and permutation invariant potential, why use equivariance? Performance. Models like SchNet or ANI are invariant and are not as accurate as models like NequiP or TorchMD-NET that have equivariances in their internal layers."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "18e473a6",
+   "id": "9c39709e",
    "metadata": {},
    "source": [
     "## The Elements of Modern Molecular NNs\n",
     "\n",
-    "Over the beginning of 2022, a categorization has emerged of the main elements of modern molecular NNs (molnets): atomic cluster expansions (ACE), the body-order of the messages, and the architecture of the message passing neural network (MPNN). This categorization might also be viewed within the GNN theory as node features (ACE), message creation and aggregation (body-order), and node update (MPNN details). We'll details these different elements below, with the most foucus on ACEs. See {doc}`gnn` for more details on MPNNs.\n",
+    "There has been a flurry of ideas about molents in the last few years, especially with the advances in equivariant neural network layers. Batatia et al.{cite}`batatia2022design` have proposed a categorization of the main elements of molnets (which they call E(3)-equivariant NNs) that I will adopt here. They categorize the decisions to be made into three parts of the architecture: the atomic cluster expansions (ACE), the body-order of the messages, and the architecture of the message passing neural network (MPNN). This categorization might also be viewed within the GNN theory as node features (ACE), message creation and aggregation (body-order), and node update (MPNN details). See {doc}`gnn` for more details on MPNNs.\n",
     "\n",
+    "This is a relatively new categorization and certainly is not necessary to use. Most papers do not use this categorization and it takes some effort to put models into it. The benefit of thinking about models with this abstractions is it helps us differentiate between the very large number of models now being pursued in the literature. There is also a bit of chaos in teasing out what *differentiates* the best models from others. For example, NequIP"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41692e8d",
+   "metadata": {},
+   "source": [
+    "### Atom features\n",
+    "\n",
+    "Let's start with the general terminology for an atom. Of course, at input to these networks an atom is just a Cartesian coordinate $\\vec{r}_i$ and the element $z_i$. Within the message passing framework though, atoms are nodes and their feature vectors need to be organized a bit differently than usual GNNs. Namely, some of the features of an atom need to be treated in a special way to maintain equivariance and some of the features are like scalars and we can ignore the equivariance. One way to organize these is. \n",
+    " "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18e473a6",
+   "metadata": {},
+   "source": [
     "### Atomic Cluster Expansions\n",
     "\n",
-    "An ACE is a per-atom tensor. The main idea of ACE is to encode the local environment of an atom into a feaeture tensor that describes its neighborhood of nearby atoms. This is like distinguishing between an oxygen in an alcohol group vs an oxygen in an ether. Both are oxygens, but we expext them to behave differently. ACE is the same idea, but for nearby atoms in space instead of just on the molecular graph.\n",
+    "An ACE is a per-atom tensor. The main idea of ACE is to encode the local environment of an atom into a feature tensor that describes its neighborhood of nearby atoms. This is like distinguishing between an oxygen in an alcohol group vs an oxygen in an ether. Both are oxygens, but we expect them to behave differently. ACE is the same idea, but for nearby atoms in space instead of just on the molecular graph.\n",
     "\n",
     "The general equation for ACE (assuming O(3) equivariance) is [cite]:\n",
     "\n",
@@ -74,18 +137,67 @@
     "How is this different than a MPNN"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "39d7c753",
+   "metadata": {},
+   "source": [
+    "## Normalization\n",
+    "\n",
+    "* Physics-based energy/force normalization\n",
+    "* Pooling\n",
+    "* Layers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "961139e7",
+   "metadata": {},
+   "source": [
+    "## Running This Notebook\n",
+    "\n",
+    "\n",
+    "Click the &nbsp;<i aria-label=\"Launch interactive content\" class=\"fas fa-rocket\"></i>&nbsp; above to launch this page as an interactive Google Colab. See details below on installing packages.\n",
+    "\n",
+    "````{tip} My title\n",
+    ":class: dropdown\n",
+    "To install packages, execute this code in a new cell. \n",
+    "\n",
+    "```\n",
+    "!pip install dmol-book\n",
+    "```\n",
+    "\n",
+    "If you find install problems, you can get the latest working versions of packages used in [this book here](https://github.com/whitead/dmol-book/blob/master/package/setup.py)\n",
+    "\n",
+    "````"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a70e197",
+   "metadata": {},
+   "source": [
+    "## Cited References\n",
+    "\n",
+    "```{bibliography}\n",
+    ":style: unsrtalpha\n",
+    ":filter: docname in docnames\n",
+    "```"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "325352c7",
+   "id": "05dc69e8",
    "metadata": {},
    "outputs": [],
    "source": []
   }
  ],
  "metadata": {
+  "celltoolbar": "Tags",
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "Python 3",
    "language": "python",
    "name": "python3"
   },
diff --git a/references.bib b/references.bib
index 4174758c..48cdd98b 100644
--- a/references.bib
+++ b/references.bib
@@ -1,13 +1,13 @@
 @article{martins2012bbb,
-author = {Martins, Ines Filipa and Teixeira, Ana L and Pinheiro, Luis and Falcao, Andre O},
-doi = {10.1021/ci300124c},
-journal = {Journal of Chemical Information and Modeling},
-number = {6},
-pages = {1686--1697},
-title = {{A Bayesian Approach to in Silico Blood-Brain Barrier Penetration Modeling}},
-url = {https://doi.org/10.1021/ci300124c},
-volume = {52},
-year = {2012}
+  author  = {Martins, Ines Filipa and Teixeira, Ana L and Pinheiro, Luis and Falcao, Andre O},
+  doi     = {10.1021/ci300124c},
+  journal = {Journal of Chemical Information and Modeling},
+  number  = {6},
+  pages   = {1686--1697},
+  title   = {{A Bayesian Approach to in Silico Blood-Brain Barrier Penetration Modeling}},
+  url     = {https://doi.org/10.1021/ci300124c},
+  volume  = {52},
+  year    = {2012}
 }
 
 @article{ramakrishnan2014quantum,
@@ -868,13 +868,13 @@ @article{wang2021molclr
   journal = {arXiv preprint arXiv:2102.10056}
 }
 @misc{geiger2022e3nn,
-  doi = {10.48550/ARXIV.2207.09453},
-  url = {https://arxiv.org/abs/2207.09453},
-  author = {Geiger, Mario and Smidt, Tess},
-  keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Neural and Evolutionary Computing (cs.NE), FOS: Computer and information sciences, FOS: Computer and information sciences},
-  title = {e3nn: Euclidean Neural Networks},
+  doi       = {10.48550/ARXIV.2207.09453},
+  url       = {https://arxiv.org/abs/2207.09453},
+  author    = {Geiger, Mario and Smidt, Tess},
+  keywords  = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Neural and Evolutionary Computing (cs.NE), FOS: Computer and information sciences, FOS: Computer and information sciences},
+  title     = {e3nn: Euclidean Neural Networks},
   publisher = {arXiv},
-  year = {2022},
+  year      = {2022},
   copyright = {Creative Commons Attribution 4.0 International}
 }
 @article{heller2015inchi,
@@ -1452,10 +1452,10 @@ @inproceedings{bergstra2013making
   organization = {PMLR}
 }
 @inproceedings{louppe2017bayesian,
-  title={Bayesian optimisation with scikit-optimize},
-  author={Louppe, Gilles},
-  booktitle={PyData Amsterdam},
-  year={2017}
+  title     = {Bayesian optimisation with scikit-optimize},
+  author    = {Louppe, Gilles},
+  booktitle = {PyData Amsterdam},
+  year      = {2017}
 }
 
 @article{frostig2018compiling,
@@ -1565,183 +1565,201 @@ @article{bodnar2021weisfeiler
 }
 
 @article{timmons2020happenn,
-  title={HAPPENN is a novel tool for hemolytic activity prediction for therapeutic peptides which employs neural networks},
-  author={Timmons, P. Brendan and Hewage, Chandralal M.},
-  journal={Scientific reports},
-  volume={10},
-  number={1},
-  pages={1--18},
-  year={2020},
-  publisher={Nature Publishing Group}
+  title     = {HAPPENN is a novel tool for hemolytic activity prediction for therapeutic peptides which employs neural networks},
+  author    = {Timmons, P. Brendan and Hewage, Chandralal M.},
+  journal   = {Scientific reports},
+  volume    = {10},
+  number    = {1},
+  pages     = {1--18},
+  year      = {2020},
+  publisher = {Nature Publishing Group}
 }
 @inproceedings{khodamoradi2021aslr,
-  title={ASLR: An Adaptive Scheduler for Learning Rate},
-  author={Khodamoradi, Alireza and Denolf, Kristof and Vissers, Kees and Kastner, Ryan C},
-  booktitle={2021 International Joint Conference on Neural Networks (IJCNN)},
-  pages={1--8},
-  year={2021},
-  organization={IEEE}
+  title        = {ASLR: An Adaptive Scheduler for Learning Rate},
+  author       = {Khodamoradi, Alireza and Denolf, Kristof and Vissers, Kees and Kastner, Ryan C},
+  booktitle    = {2021 International Joint Conference on Neural Networks (IJCNN)},
+  pages        = {1--8},
+  year         = {2021},
+  organization = {IEEE}
 }
 @article{yang2020pacl,
-  title={PACL: piecewise arc cotangent decay learning rate for deep neural network training},
-  author={Yang, Haixu and Liu, Jihong and Sun, Hongwei and Zhang, Henggui},
-  journal={IEEE Access},
-  volume={8},
-  pages={112805--112813},
-  year={2020},
-  publisher={IEEE}
+  title     = {PACL: piecewise arc cotangent decay learning rate for deep neural network training},
+  author    = {Yang, Haixu and Liu, Jihong and Sun, Hongwei and Zhang, Henggui},
+  journal   = {IEEE Access},
+  volume    = {8},
+  pages     = {112805--112813},
+  year      = {2020},
+  publisher = {IEEE}
 }
 @article{vidyabharathi2021achieving,
-  title={Achieving generalization of deep learning models in a quick way by adapting T-HTR learning rate scheduler},
-  author={Vidyabharathi, D and Mohanraj, V and Kumar, J Senthil and Suresh, Y},
-  journal={Personal and Ubiquitous Computing},
-  pages={1--19},
-  year={2021},
-  publisher={Springer}
+  title     = {Achieving generalization of deep learning models in a quick way by adapting T-HTR learning rate scheduler},
+  author    = {Vidyabharathi, D and Mohanraj, V and Kumar, J Senthil and Suresh, Y},
+  journal   = {Personal and Ubiquitous Computing},
+  pages     = {1--19},
+  year      = {2021},
+  publisher = {Springer}
 }
 @article{goodfellow2017deep,
-  title={Deep learning (adaptive computation and machine learning series)},
-  author={Goodfellow, Ian and Bengio, Yoshua and Courville, Aaron},
-  journal={Cambridge Massachusetts},
-  pages={321--359},
-  year={2017}
+  title   = {Deep learning (adaptive computation and machine learning series)},
+  author  = {Goodfellow, Ian and Bengio, Yoshua and Courville, Aaron},
+  journal = {Cambridge Massachusetts},
+  pages   = {321--359},
+  year    = {2017}
 }
 @article{hoffer2017train,
-  title={Train longer, generalize better: closing the generalization gap in large batch training of neural networks},
-  author={Hoffer, Elad and Hubara, Itay and Soudry, Daniel},
-  journal={Advances in neural information processing systems},
-  volume={30},
-  year={2017}
+  title   = {Train longer, generalize better: closing the generalization gap in large batch training of neural networks},
+  author  = {Hoffer, Elad and Hubara, Itay and Soudry, Daniel},
+  journal = {Advances in neural information processing systems},
+  volume  = {30},
+  year    = {2017}
 }
 @article{keskar2016large,
-  title={On large-batch training for deep learning: Generalization gap and sharp minima},
-  author={Keskar, Nitish Shirish and Mudigere, Dheevatsa and Nocedal, Jorge and Smelyanskiy, Mikhail and Tang, Ping Tak Peter},
-  journal={arXiv preprint arXiv:1609.04836},
-  year={2016}
+  title   = {On large-batch training for deep learning: Generalization gap and sharp minima},
+  author  = {Keskar, Nitish Shirish and Mudigere, Dheevatsa and Nocedal, Jorge and Smelyanskiy, Mikhail and Tang, Ping Tak Peter},
+  journal = {arXiv preprint arXiv:1609.04836},
+  year    = {2016}
 }
 @inproceedings{lin2020extrapolation,
-  title={Extrapolation for large-batch training in deep learning},
-  author={Lin, Tao and Kong, Lingjing and Stich, Sebastian and Jaggi, Martin},
-  booktitle={International Conference on Machine Learning},
-  pages={6094--6104},
-  year={2020},
-  organization={PMLR}
+  title        = {Extrapolation for large-batch training in deep learning},
+  author       = {Lin, Tao and Kong, Lingjing and Stich, Sebastian and Jaggi, Martin},
+  booktitle    = {International Conference on Machine Learning},
+  pages        = {6094--6104},
+  year         = {2020},
+  organization = {PMLR}
 }
 
 @article{eger2019time,
-  title={Is it time to swish? Comparing deep learning activation functions across NLP tasks},
-  author={Eger, Steffen and Youssef, Paul and Gurevych, Iryna},
-  journal={arXiv preprint arXiv:1901.02671},
-  year={2019}
+  title   = {Is it time to swish? Comparing deep learning activation functions across NLP tasks},
+  author  = {Eger, Steffen and Youssef, Paul and Gurevych, Iryna},
+  journal = {arXiv preprint arXiv:1901.02671},
+  year    = {2019}
 }
 
 @article{hendrycks2016gaussian,
-  title={Gaussian error linear units (gelus)},
-  author={Hendrycks, Dan and Gimpel, Kevin},
-  journal={arXiv preprint arXiv:1606.08415},
-  year={2016}
+  title   = {Gaussian error linear units (gelus)},
+  author  = {Hendrycks, Dan and Gimpel, Kevin},
+  journal = {arXiv preprint arXiv:1606.08415},
+  year    = {2016}
 }
 
 @article{zhang2021motif,
-  title={Motif-based graph self-supervised learning for molecular property prediction},
-  author={Zhang, Zaixi and Liu, Qi and Wang, Hao and Lu, Chengqiang and Lee, Chee-Kong},
-  journal={Advances in Neural Information Processing Systems},
-  volume={34},
-  pages={15870--15882},
-  year={2021}
+  title   = {Motif-based graph self-supervised learning for molecular property prediction},
+  author  = {Zhang, Zaixi and Liu, Qi and Wang, Hao and Lu, Chengqiang and Lee, Chee-Kong},
+  journal = {Advances in Neural Information Processing Systems},
+  volume  = {34},
+  pages   = {15870--15882},
+  year    = {2021}
 }
 
 @article{you2020graph,
-  title={Graph contrastive learning with augmentations},
-  author={You, Yuning and Chen, Tianlong and Sui, Yongduo and Chen, Ting and Wang, Zhangyang and Shen, Yang},
-  journal={Advances in Neural Information Processing Systems},
-  volume={33},
-  pages={5812--5823},
-  year={2020}
+  title   = {Graph contrastive learning with augmentations},
+  author  = {You, Yuning and Chen, Tianlong and Sui, Yongduo and Chen, Ting and Wang, Zhangyang and Shen, Yang},
+  journal = {Advances in Neural Information Processing Systems},
+  volume  = {33},
+  pages   = {5812--5823},
+  year    = {2020}
 }
 
 @inproceedings{sun2021mocl,
-  title={MoCL: data-driven molecular fingerprint via knowledge-aware contrastive learning from molecular graph},
-  author={Sun, Mengying and Xing, Jing and Wang, Huijun and Chen, Bin and Zhou, Jiayu},
-  booktitle={Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery \& Data Mining},
-  pages={3585--3594},
-  year={2021}
+  title     = {MoCL: data-driven molecular fingerprint via knowledge-aware contrastive learning from molecular graph},
+  author    = {Sun, Mengying and Xing, Jing and Wang, Huijun and Chen, Bin and Zhou, Jiayu},
+  booktitle = {Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery \& Data Mining},
+  pages     = {3585--3594},
+  year      = {2021}
 }
 
 @article{liu2021pre,
-  title={Pre-training molecular graph representation with 3d geometry},
-  author={Liu, Shengchao and Wang, Hanchen and Liu, Weiyang and Lasenby, Joan and Guo, Hongyu and Tang, Jian},
-  journal={arXiv preprint arXiv:2110.07728},
-  year={2021}
+  title   = {Pre-training molecular graph representation with 3d geometry},
+  author  = {Liu, Shengchao and Wang, Hanchen and Liu, Weiyang and Lasenby, Joan and Guo, Hongyu and Tang, Jian},
+  journal = {arXiv preprint arXiv:2110.07728},
+  year    = {2021}
 }
 
 @inproceedings{erhan2010does,
-  title={Why does unsupervised pre-training help deep learning?},
-  author={Erhan, Dumitru and Courville, Aaron and Bengio, Yoshua and Vincent, Pascal},
-  booktitle={Proceedings of the thirteenth international conference on artificial intelligence and statistics},
-  pages={201--208},
-  year={2010},
-  organization={JMLR Workshop and Conference Proceedings}
+  title        = {Why does unsupervised pre-training help deep learning?},
+  author       = {Erhan, Dumitru and Courville, Aaron and Bengio, Yoshua and Vincent, Pascal},
+  booktitle    = {Proceedings of the thirteenth international conference on artificial intelligence and statistics},
+  pages        = {201--208},
+  year         = {2010},
+  organization = {JMLR Workshop and Conference Proceedings}
 }
 
 @article{xie2022self,
-  title={Self-supervised learning of graph neural networks: A unified review},
-  author={Xie, Yaochen and Xu, Zhao and Zhang, Jingtun and Wang, Zhengyang and Ji, Shuiwang},
-  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
-  year={2022},
-  publisher={IEEE}
+  title     = {Self-supervised learning of graph neural networks: A unified review},
+  author    = {Xie, Yaochen and Xu, Zhao and Zhang, Jingtun and Wang, Zhengyang and Ji, Shuiwang},
+  journal   = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
+  year      = {2022},
+  publisher = {IEEE}
 }
 
 @article{mao2020survey,
-  title={A survey on self-supervised pre-training for sequential transfer learning in neural networks},
-  author={Mao, Huanru Henry},
-  journal={arXiv preprint arXiv:2007.00800},
-  year={2020}
+  title   = {A survey on self-supervised pre-training for sequential transfer learning in neural networks},
+  author  = {Mao, Huanru Henry},
+  journal = {arXiv preprint arXiv:2007.00800},
+  year    = {2020}
 }
 
 @inproceedings{finn2017model,
-  title={Model-agnostic meta-learning for fast adaptation of deep networks},
-  author={Finn, Chelsea and Abbeel, Pieter and Levine, Sergey},
-  booktitle={International conference on machine learning},
-  pages={1126--1135},
-  year={2017},
-  organization={PMLR}
+  title        = {Model-agnostic meta-learning for fast adaptation of deep networks},
+  author       = {Finn, Chelsea and Abbeel, Pieter and Levine, Sergey},
+  booktitle    = {International conference on machine learning},
+  pages        = {1126--1135},
+  year         = {2017},
+  organization = {PMLR}
 }
 
 @inproceedings{wang2019smiles,
-  title={SMILES-BERT: large scale unsupervised pre-training for molecular property prediction},
-  author={Wang, Sheng and Guo, Yuzhi and Wang, Yuhong and Sun, Hongmao and Huang, Junzhou},
-  booktitle={Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics},
-  pages={429--436},
-  year={2019}
+  title     = {SMILES-BERT: large scale unsupervised pre-training for molecular property prediction},
+  author    = {Wang, Sheng and Guo, Yuzhi and Wang, Yuhong and Sun, Hongmao and Huang, Junzhou},
+  booktitle = {Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics},
+  pages     = {429--436},
+  year      = {2019}
 }
 @article{kingma2014adam,
-  title={Adam: A method for stochastic optimization},
-  author={Kingma, Diederik P and Ba, Jimmy},
-  journal={arXiv preprint arXiv:1412.6980},
-  year={2014}
+  title   = {Adam: A method for stochastic optimization},
+  author  = {Kingma, Diederik P and Ba, Jimmy},
+  journal = {arXiv preprint arXiv:1412.6980},
+  year    = {2014}
 }
 @article{omalley2019kerastuner,
-  title={Keras tuner},
-  author={O’Malley, Tom and Bursztein, Elie and Long, James and Chollet, Fran{\c{c}}ois and Jin, Haifeng and Invernizzi, Luca and others},
-  journal={Retrieved May},
-  volume={21},
-  pages={2020},
-  year={2019}
+  title   = {Keras tuner},
+  author  = {O’Malley, Tom and Bursztein, Elie and Long, James and Chollet, Fran{\c{c}}ois and Jin, Haifeng and Invernizzi, Luca and others},
+  journal = {Retrieved May},
+  volume  = {21},
+  pages   = {2020},
+  year    = {2019}
 }
 
 @article{anderson2019cormorant,
-  title={Cormorant: Covariant molecular neural networks},
-  author={Anderson, Brandon and Hy, Truong Son and Kondor, Risi},
-  journal={Advances in neural information processing systems},
-  volume={32},
-  year={2019}
+  title   = {Cormorant: Covariant molecular neural networks},
+  author  = {Anderson, Brandon and Hy, Truong Son and Kondor, Risi},
+  journal = {Advances in neural information processing systems},
+  volume  = {32},
+  year    = {2019}
 }
 
 
 @article{spellings2021geometric,
-  title={Geometric Algebra Attention Networks for Small Point Clouds},
-  author={Spellings, Matthew},
-  journal={arXiv preprint arXiv:2110.02393},
-  year={2021}
+  title   = {Geometric Algebra Attention Networks for Small Point Clouds},
+  author  = {Spellings, Matthew},
+  journal = {arXiv preprint arXiv:2110.02393},
+  year    = {2021}
+}
+
+
+@article{weisfeiler1968reduction,
+  title   = {The reduction of a graph to canonical form and the algebra which appears therein},
+  author  = {Weisfeiler, Boris and Leman, Andrei},
+  journal = {NTI, Series},
+  volume  = {2},
+  number  = {9},
+  pages   = {12--16},
+  year    = {1968}
+}
+
+@article{batatia2022design,
+  title={The Design Space of E (3)-Equivariant Atom-Centered Interatomic Potentials},
+  author={Batatia, Ilyes and Batzner, Simon and Kov{\'a}cs, D{\'a}vid P{\'e}ter and Musaelian, Albert and Simm, Gregor NC and Drautz, Ralf and Ortner, Christoph and Kozinsky, Boris and Cs{\'a}nyi, G{\'a}bor},
+  journal={arXiv preprint arXiv:2205.06643},
+  year={2022}
 }

From 2b5070322f50ff5821033bfd7a2cc8b89f644722 Mon Sep 17 00:00:00 2001
From: Andrew White <white.d.andrew@gmail.com>
Date: Tue, 25 Oct 2022 17:28:38 -0400
Subject: [PATCH 02/15] Fixed inserted figure

---
 _static/custom.css |  4 +--
 dl/molnets.ipynb   | 71 ++++++++++++++++++++++++++++------------------
 package/setup.py   |  2 +-
 3 files changed, 47 insertions(+), 30 deletions(-)

diff --git a/_static/custom.css b/_static/custom.css
index 99c62276..ac09486c 100644
--- a/_static/custom.css
+++ b/_static/custom.css
@@ -134,7 +134,7 @@ td>p {
   display: inline !important;
 }
 
-div.sidebar,
+/* div.sidebar,
 aside.sidebar {
   border: none;
   clear: right;
@@ -146,7 +146,7 @@ aside.sidebar {
   vertical-align: baseline;
   position: relative;
   background-color: #FFF;
-}
+} */
 
 .wh-flex-center {
   display: flex;
diff --git a/dl/molnets.ipynb b/dl/molnets.ipynb
index ec3f22ce..4182e46d 100644
--- a/dl/molnets.ipynb
+++ b/dl/molnets.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "a135561d",
+   "id": "07239da2",
    "metadata": {},
    "source": [
     "# Modern Molecular NNs\n",
@@ -20,7 +20,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "65f4ca21",
+   "id": "32f2daac",
    "metadata": {},
    "source": [
     "```{warning}\n",
@@ -31,7 +31,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "8c695b48",
+   "id": "65c17fce",
    "metadata": {
     "tags": [
      "remove-cell"
@@ -45,23 +45,38 @@
     "import networkx as nx\n",
     "import dmol\n",
     "\n",
+    "# I hate to do this manually, but I cannot get the\n",
+    "# damn molecular fonts to be big enough\n",
+    "import skunk\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "\n",
+    "def _mol2svg(m, size):\n",
+    "    d = rdkit.Chem.Draw.rdMolDraw2D.MolDraw2DSVG(*size)\n",
+    "    d.DrawMolecule(m)\n",
+    "    d.FinishDrawing()\n",
+    "    return d.GetDrawingText()\n",
+    "\n",
+    "\n",
     "m1 = rdkit.Chem.MolFromSmiles(\"C1CCC2CCCCC2C1\")\n",
     "m2 = rdkit.Chem.MolFromSmiles(\"C1CCC(C1)C2CCCC2\")\n",
-    "glue(\n",
-    "    \"lwtest\",\n",
-    "    rdkit.Chem.Draw.MolsToGridImage(\n",
-    "        [m1, m2],\n",
-    "        legends=[\"decaline\", \"bicylopentyl\"],\n",
-    "        useSVG=True,\n",
-    "        subImgSize=(400, 400),\n",
-    "    ),\n",
-    "    display=False,\n",
-    ")"
+    "s1 = _mol2svg(m1, (200, 200))\n",
+    "s2 = _mol2svg(m2, (200, 200))\n",
+    "_, axs = plt.subplots(1, 2, squeeze=True)\n",
+    "axs[0].set_title(\"decaline\")\n",
+    "axs[1].set_title(\"bicylopentyl\")\n",
+    "axs[0].axis(\"off\")\n",
+    "axs[1].axis(\"off\")\n",
+    "skunk.connect(axs[0], \"m1\")\n",
+    "skunk.connect(axs[1], \"m2\")\n",
+    "svg = skunk.insert({\"m1\": s1, \"m2\": s2})\n",
+    "with open(\"lwtest.svg\", \"w\") as f:\n",
+    "    f.write(svg)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "00f4b61c",
+   "id": "1ac69144",
    "metadata": {},
    "source": [
     "# Expressiveness\n",
@@ -70,17 +85,19 @@
     "\n",
     "One reason is that the standard GNNs cannot distinguish certain types of graphs relevant for chemistry is they cannot distinguish molecules like decaline and bicylopentyl, which indeed have different properties. Look at the {numref}`decaline-bicylopentyl` below and think about the degree and neighbors of the atoms near the mixing of the rings -- you'll see if you try to use message passing the two molecules are identical. This is known as the Wesifeiler-Lehman Test {cite}`weisfeiler1968reduction`.\n",
     "\n",
-    "```{glue:figure} lwtest\n",
-    "----\n",
-    "name: decaline-bicylopentyl\n",
-    "----\n",
+    "\n",
+    "```{figure} lwtest.svg\n",
+    "---\n",
+    "alt: \"decaline and bicyclopentyl structures drawn side-by-side, which visually are different.\"\n",
+    "name: \"decaline-bicylopentyl\"\n",
+    "---\n",
     "Comparison of decaline and bicylopentyl, which have identical output in most GNNs despite being different molecules.\n",
     "```"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3c224845",
+   "id": "7741abfd",
    "metadata": {},
    "source": [
     "These can be distinguished if we also have (and use) their Cartesian coordinates. We cannot distinguish enantiomers with GNNs, except maybe with pre-computed node attributes. Even those start to breakdown when we have helical chirality that is not centered at any one molecule.\n",
@@ -96,7 +113,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9c39709e",
+   "id": "9b465c7d",
    "metadata": {},
    "source": [
     "## The Elements of Modern Molecular NNs\n",
@@ -108,7 +125,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "41692e8d",
+   "id": "f76d2ecb",
    "metadata": {},
    "source": [
     "### Atom features\n",
@@ -119,7 +136,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "18e473a6",
+   "id": "2ea75a85",
    "metadata": {},
    "source": [
     "### Atomic Cluster Expansions\n",
@@ -139,7 +156,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "39d7c753",
+   "id": "a8ee8dba",
    "metadata": {},
    "source": [
     "## Normalization\n",
@@ -151,7 +168,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "961139e7",
+   "id": "013b2fdf",
    "metadata": {},
    "source": [
     "## Running This Notebook\n",
@@ -174,7 +191,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9a70e197",
+   "id": "bdf38ddc",
    "metadata": {},
    "source": [
     "## Cited References\n",
@@ -188,7 +205,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "05dc69e8",
+   "id": "e926b47b",
    "metadata": {},
    "outputs": [],
    "source": []
@@ -211,7 +228,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.12"
+   "version": "3.8.5"
   }
  },
  "nbformat": 4,
diff --git a/package/setup.py b/package/setup.py
index 1f43c2a7..3d3e5ebc 100644
--- a/package/setup.py
+++ b/package/setup.py
@@ -12,7 +12,7 @@
     license="MIT",
     packages=["dmol"],
     install_requires=[
-        "jupyter-book==0.12.3",
+        "jupyter-book==0.13.1",
         "matplotlib",
         "numpy",
         "jax",

From ba4060ae2bf49b05f5df3cde5be36bc925dcd207 Mon Sep 17 00:00:00 2001
From: Andrew White <white.d.andrew@gmail.com>
Date: Wed, 26 Oct 2022 17:19:26 -0400
Subject: [PATCH 03/15] Fixed GNN equiv/invar issues

---
 dl/gnn.ipynb     |  6 +++---
 dl/molnets.ipynb | 29 ++++++++++++++---------------
 2 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/dl/gnn.ipynb b/dl/gnn.ipynb
index 80766e55..e76a0bbf 100644
--- a/dl/gnn.ipynb
+++ b/dl/gnn.ipynb
@@ -227,15 +227,15 @@
     "f_k = \\sigma\\left( \\sum_i \\sum_j v_{ij}w_{jk}  \\right)\n",
     "\\end{equation}\n",
     "\n",
-    "This equation shows that we first multiply every node ($v_{ij}$) feature by trainable weights $w_{jk}$, sum over all node features, and then apply an activation. This will yield a single feature vector for the graph. Is this equation permutation equivariant? Yes, because the node index in our expression is index $i$ which can be re-ordered without affecting the output.\n",
+    "This equation shows that we first multiply every node ($v_{ij}$) feature by trainable weights $w_{jk}$, sum over all node features, and then apply an activation. This will yield a single feature vector for the graph. Is this equation permutation invariant? Yes, because the node index in our expression is index $i$ which can be re-ordered without affecting the output.\n",
     "\n",
-    "Let's see an example that is similar, but not permutation equivariant:\n",
+    "Let's see an example that is similar, but not permutation invariant:\n",
     "\n",
     "\\begin{equation}\n",
     "f_k = \\sigma\\left( \\sum_i v_{ij}w_{ik}  \\right)\n",
     "\\end{equation}\n",
     "\n",
-    "This is a small change. We have one weight vector per node now. This makes the trainable weights depend on the ordering of the nodes. Then if we swap the node ordering, our weights will no longer align. So if we were to input two methanol molecules, which should have the same output, but we switched two atom numbers, we would get different answers. These simple examples differ from real GNNs in two important ways: (i) they give a single feature vector output, which throws away per-node information, and (ii) they do not use the adjacency matrix. Let's see a real GNN that has these properties while maintaining permutation equivariant."
+    "This is a small change. We have one weight vector per node now. This makes the trainable weights depend on the ordering of the nodes. Then if we swap the node ordering, our weights will no longer align. So if we were to input two methanol molecules, which should have the same output, but we switched two atom numbers, we would get different answers. These simple examples differ from real GNNs in two important ways: (i) they give a single feature vector output, which throws away per-node information, and (ii) they do not use the adjacency matrix. Let's see a real GNN that has these properties while maintaining permutation invariance --- or equivariance (swapping inputs swaps outputs the same way)."
    ]
   },
   {
diff --git a/dl/molnets.ipynb b/dl/molnets.ipynb
index 4182e46d..72781b62 100644
--- a/dl/molnets.ipynb
+++ b/dl/molnets.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "07239da2",
+   "id": "8025a9b6",
    "metadata": {},
    "source": [
     "# Modern Molecular NNs\n",
@@ -20,7 +20,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "32f2daac",
+   "id": "3922d8a4",
    "metadata": {},
    "source": [
     "```{warning}\n",
@@ -31,7 +31,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "65c17fce",
+   "id": "4e618055",
    "metadata": {
     "tags": [
      "remove-cell"
@@ -76,7 +76,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1ac69144",
+   "id": "0f7062a1",
    "metadata": {},
    "source": [
     "# Expressiveness\n",
@@ -97,7 +97,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7741abfd",
+   "id": "6cbb302c",
    "metadata": {},
    "source": [
     "These can be distinguished if we also have (and use) their Cartesian coordinates. We cannot distinguish enantiomers with GNNs, except maybe with pre-computed node attributes. Even those start to breakdown when we have helical chirality that is not centered at any one molecule.\n",
@@ -113,30 +113,29 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9b465c7d",
+   "id": "8aab34c9",
    "metadata": {},
    "source": [
     "## The Elements of Modern Molecular NNs\n",
     "\n",
     "There has been a flurry of ideas about molents in the last few years, especially with the advances in equivariant neural network layers. Batatia et al.{cite}`batatia2022design` have proposed a categorization of the main elements of molnets (which they call E(3)-equivariant NNs) that I will adopt here. They categorize the decisions to be made into three parts of the architecture: the atomic cluster expansions (ACE), the body-order of the messages, and the architecture of the message passing neural network (MPNN). This categorization might also be viewed within the GNN theory as node features (ACE), message creation and aggregation (body-order), and node update (MPNN details). See {doc}`gnn` for more details on MPNNs.\n",
     "\n",
-    "This is a relatively new categorization and certainly is not necessary to use. Most papers do not use this categorization and it takes some effort to put models into it. The benefit of thinking about models with this abstractions is it helps us differentiate between the very large number of models now being pursued in the literature. There is also a bit of chaos in teasing out what *differentiates* the best models from others. For example, NequIP"
+    "This is a relatively new categorization and certainly is not necessary to use. Most papers do not use this categorization and it takes some effort to put models into it. The benefit of thinking about models with this abstractions is it helps us differentiate between the very large number of models now being pursued in the literature. There is also a bit of chaos in teasing out what *differentiates* the best models from others. For example, it took a while to discover that the most important features in NequIP were data normalization and how atom embeddings are treated {cite}`batatia2022design`. This categorization is also improving how these models are designed."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f76d2ecb",
+   "id": "7732f356",
    "metadata": {},
    "source": [
     "### Atom features\n",
     "\n",
-    "Let's start with the general terminology for an atom. Of course, at input to these networks an atom is just a Cartesian coordinate $\\vec{r}_i$ and the element $z_i$. Within the message passing framework though, atoms are nodes and their feature vectors need to be organized a bit differently than usual GNNs. Namely, some of the features of an atom need to be treated in a special way to maintain equivariance and some of the features are like scalars and we can ignore the equivariance. One way to organize these is. \n",
-    " "
+    "Let's start with the general terminology for an atom. Of course, the input to these networks an atom is just a Cartesian coordinate $\\vec{r}_i$ and the element $z_i$. As we pass through GNN layers the features will become larger. The atoms are the nodes. The atom features need to be organized a bit differently than previously because some of the features should be invariant with respect to the group --- SO(3) --- and some need to be equivariant. "
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2ea75a85",
+   "id": "faaf213d",
    "metadata": {},
    "source": [
     "### Atomic Cluster Expansions\n",
@@ -156,7 +155,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a8ee8dba",
+   "id": "470c3063",
    "metadata": {},
    "source": [
     "## Normalization\n",
@@ -168,7 +167,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "013b2fdf",
+   "id": "39849016",
    "metadata": {},
    "source": [
     "## Running This Notebook\n",
@@ -191,7 +190,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bdf38ddc",
+   "id": "3c219ed7",
    "metadata": {},
    "source": [
     "## Cited References\n",
@@ -205,7 +204,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "e926b47b",
+   "id": "596a40bc",
    "metadata": {},
    "outputs": [],
    "source": []

From eb951d4645d7da8f0f8754201ba33420220cb04b Mon Sep 17 00:00:00 2001
From: Andrew White <white.d.andrew@gmail.com>
Date: Fri, 4 Nov 2022 18:48:24 -0400
Subject: [PATCH 04/15] doing this for ma boi

---
 dl/Equivariant.ipynb |  21 ++--
 dl/gnn.ipynb         |   6 +-
 dl/molnets.ipynb     | 246 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 261 insertions(+), 12 deletions(-)

diff --git a/dl/Equivariant.ipynb b/dl/Equivariant.ipynb
index fdf35d6e..622a163c 100644
--- a/dl/Equivariant.ipynb
+++ b/dl/Equivariant.ipynb
@@ -1323,15 +1323,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## G-Equivariant Convolutions on Compact Groups\n",
-    "\n",
     "Now we can represent scalar valued functions ($f(g): G \\rightarrow \\mathcal{R}$) on groups as by multiplying some set of constants by the irreducible group representations:\n",
     "\n",
     "$$\n",
     "f(g) =  f_0\\cdot\\rho_0(g) + \\vec{f}_1\\cdot\\rho_1(g)+\\ldots+\\vec{f}_k\\cdot\\rho_k(g)\n",
     "$$ (fft)\n",
     "\n",
-    "The individual $\\vec{f}_i$s are called **fragments** to distinguish them from the actual irreps (which are functions). You can sort of see this as analogous to doing a Fourier transform, or some other method of turning a function into a linear sum of basis set functions .We could even more compactly write this as $f(g) =  f_0\\oplus\\vec{f}_1\\oplus\\ldots\\oplus\\vec{f}_k$ where $\\oplus$ just means concatenating them together since we know what the irreps are for the group.\n",
+    "The individual $\\vec{f}_i$s are called **fragments** to distinguish them from the actual irreps (which are functions). You can sort of see this as analogous to doing a Fourier transform, or some other method of turning a function into a linear sum of basis set functions. We could even more compactly write this as $f(g) =  f_0\\oplus\\vec{f}_1\\oplus\\ldots\\oplus\\vec{f}_k$ where $\\oplus$ just means concatenating them together since we know what the irreps are for the group.\n",
     "\n",
     "\n",
     "What is unusual about Equation {eq}`fft` is that for non-abelian groups the individual irreps are matrices of increasing size. How then do we get a scalar out of a product of a vector and a matrix? It turns out we can just do an elementwise dot-product (flatten out the matrix). If you go to construct the fragments for SO(3) (we'll see how in just a moment), you'll notice that they are non-unique. That is, the right-hand side seems to be \"overpowered\" in its representation - it seems capable of representing more. \n",
@@ -1342,9 +1340,14 @@
     "h(g) = a\\cdot g = \\rho(a)\\rho(g) =  a_0\\cdot\\rho_0(g) \\oplus A_1\\rho_1(g)\\oplus\\ldots+A_k\\rho_k(g)\n",
     "$$ (fft)\n",
     "\n",
-    "where now the $A$'s a matrices. We'll see below that we can replace the irreps, which are for representing group elements, with spherical harmonics in SO(3). Spherical harmonics are vectors instead of matrices that act like irreps (you can multiply them by matrices $\\rho(g)$ and they rotate correctly) but are just big enough to represent scalar valued functions.\n",
-    "\n",
-    "----\n",
+    "where now the $A$'s a matrices. We'll see below that we can replace the irreps, which are for representing group elements, with spherical harmonics in SO(3). Spherical harmonics are vectors instead of matrices that act like irreps (you can multiply them by matrices $\\rho(g)$ and they rotate correctly) but are just big enough to represent scalar valued functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## G-Equivariant Convolutions on Compact Groups\n",
     "\n",
     "We'd like to revisit now the G-Equivariant convolution layer equation:\n",
     "\n",
@@ -1358,7 +1361,7 @@
     "\\psi(f) = f_0 w_0\\oplus \\vec{f}_1 w_1 \\oplus\\ldots\\oplus \\vec{f}_k w_k\n",
     "$$ (compact-gequiv)\n",
     "\n",
-    "This result says we just multiply the irreducible representations by weights, but do not mix across irreps. The weights become matrices if we start to allow multiple channels (multiple fragments). An important point then is how we actually can learn if there is no communication between irreps. That's where the nonlinearity comes in. It is discussed in more depth below, but the most common nonlinearity is to take a tensor product (all irreps times all irreps) and then reduce that by multiplying the larger rank tensor by a special tensor for the group called Clebsch-Gordan coefficients that reduces it equivariantly back down to the direct sum of irreps. This enables mixing between the irreps and is nonlinear."
+    "This result says we just multiply the irreducible representations by weights, but do not mix across irreps. The weights become matrices if we start to allow multiple channels (multiple fragments). How we actually can *learn* if there is no communication between irreps? That's where the nonlinearity comes in. It is discussed in more depth below, but the most common nonlinearity is to take a tensor product (all irreps times all irreps) and then reduce that by multiplying the larger rank tensor by a special tensor for the group called Clebsch-Gordan coefficients that reduces it equivariantly back down to the direct sum of irreps. This enables mixing between the irreps and is nonlinear."
    ]
   },
   {
@@ -2091,7 +2094,7 @@
    "source": [
     "## Exercises\n",
     "\n",
-    "1. Does the picture {glue:}`hex-6` represent a point in the space or function in the space? Justify your answer\n",
+    "1. Does the picture {glue:}`hex-1` represent a point in the space or function in the space? Justify your answer\n",
     "2. In the $Z_6$ examples, our stabilizer group is the identity -- $|G| = |\\mathcal{X}|$. Now consider including rotations up to $r^11$ but keep the space the same so that $|G| = 2|\\mathcal{X}|$. What would the stabilizer group be?\n",
     "3. Is the standard representation always faithful?\n",
     "4. Let's redefine our space for p4m to have $c$ channels like $\\mathcal{R}^{d\\times c}$. Can we construct a group action that makes this space homogeneous?\n",
@@ -2285,7 +2288,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.5"
+   "version": "3.8.12"
   }
  },
  "nbformat": 4,
diff --git a/dl/gnn.ipynb b/dl/gnn.ipynb
index 80766e55..adedec72 100644
--- a/dl/gnn.ipynb
+++ b/dl/gnn.ipynb
@@ -110,8 +110,8 @@
     "import numpy as np\n",
     "import tensorflow as tf\n",
     "import pandas as pd\n",
-    "import rdkit, rdkit.Chem, rdkit.Chem.rdDepictor, rdkit.Chem.Draw\n",
-    "import networkx as nx\n",
+    "import rdkit, rdkit.Chem, , rdkit.Chem.Draw\n",
+    "import networkx as nxrdkit.Chem.rdDepictor\n",
     "import dmol"
    ]
   },
@@ -1786,7 +1786,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.5"
+   "version": "3.8.12"
   }
  },
  "nbformat": 4,
diff --git a/dl/molnets.ipynb b/dl/molnets.ipynb
index ec3f22ce..66151d73 100644
--- a/dl/molnets.ipynb
+++ b/dl/molnets.ipynb
@@ -191,6 +191,252 @@
    "id": "05dc69e8",
    "metadata": {},
    "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import matplotlib as mpl\n",
+    "import numpy as np\n",
+    "import tensorflow as tf\n",
+    "import pandas as pd\n",
+    "import rdkit, rdkit.Chem, rdkit.Chem.rdDepictor, rdkit.Chem.Draw\n",
+    "import networkx as nx\n",
+    "import dmol\n",
+    "\n",
+    "my_elements = {6: \"C\", 8: \"O\", 1: \"H\"}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "aa0b3498",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def smiles2graph(sml):\n",
+    "    \"\"\"Argument for the RD2NX function should be a valid SMILES sequence\n",
+    "    returns: the graph\n",
+    "    \"\"\"\n",
+    "    m = rdkit.Chem.MolFromSmiles(sml)\n",
+    "    m = rdkit.Chem.AddHs(m)\n",
+    "    order_string = {\n",
+    "        rdkit.Chem.rdchem.BondType.SINGLE: 1,\n",
+    "        rdkit.Chem.rdchem.BondType.DOUBLE: 2,\n",
+    "        rdkit.Chem.rdchem.BondType.TRIPLE: 3,\n",
+    "        rdkit.Chem.rdchem.BondType.AROMATIC: 4,\n",
+    "    }\n",
+    "    N = len(list(m.GetAtoms()))\n",
+    "    nodes = np.zeros((N, len(my_elements)))\n",
+    "    lookup = list(my_elements.keys())\n",
+    "    for i in m.GetAtoms():\n",
+    "        nodes[i.GetIdx(), lookup.index(i.GetAtomicNum())] = 1\n",
+    "\n",
+    "    adj = np.zeros((N, N, 5))\n",
+    "    for j in m.GetBonds():\n",
+    "        u = min(j.GetBeginAtomIdx(), j.GetEndAtomIdx())\n",
+    "        v = max(j.GetBeginAtomIdx(), j.GetEndAtomIdx())\n",
+    "        order = j.GetBondType()\n",
+    "        if order in order_string:\n",
+    "            order = order_string[order]\n",
+    "        else:\n",
+    "            raise Warning(\"Ignoring bond order\" + order)\n",
+    "        adj[u, v, order] = 1\n",
+    "        adj[v, u, order] = 1\n",
+    "    return nodes, adj\n",
+    "\n",
+    "\n",
+    "# THIS CELL IS USED TO GENERATE A FIGURE\n",
+    "# AND NOT RELATED TO CHAPTER\n",
+    "# YOU CAN SKIP IT\n",
+    "from myst_nb import glue\n",
+    "from moviepy.editor import VideoClip\n",
+    "from moviepy.video.io.bindings import mplfig_to_npimage\n",
+    "\n",
+    "\n",
+    "def draw_vector(x, y, s, v, ax, cmap, **kwargs):\n",
+    "    x += s / 2\n",
+    "    y += s / 2\n",
+    "    for vi in v:\n",
+    "        if cmap is not None:\n",
+    "            ax.add_patch(\n",
+    "                mpl.patches.Rectangle((x, y), s * 1.5, s, facecolor=cmap(vi), **kwargs)\n",
+    "            )\n",
+    "        else:\n",
+    "            ax.add_patch(\n",
+    "                mpl.patches.Rectangle(\n",
+    "                    (x, y), s * 1.5, s, facecolor=\"#FFF\", edgecolor=\"#333\", **kwargs\n",
+    "                )\n",
+    "            )\n",
+    "        ax.text(\n",
+    "            x + s * 1.5 / 2,\n",
+    "            y + s / 2,\n",
+    "            \"{:.2f}\".format(vi),\n",
+    "            verticalalignment=\"center\",\n",
+    "            horizontalalignment=\"center\",\n",
+    "        )\n",
+    "        y += s\n",
+    "\n",
+    "\n",
+    "def draw_key(x, y, s, v, ax, cmap, **kwargs):\n",
+    "    x += s / 2\n",
+    "    y += s / 2\n",
+    "    for vi in v:\n",
+    "        ax.add_patch(\n",
+    "            mpl.patches.Rectangle((x, y), s * 1.5, s, facecolor=cmap(1.0), **kwargs)\n",
+    "        )\n",
+    "        ax.text(\n",
+    "            x + s * 1.5 / 2,\n",
+    "            y + s / 2,\n",
+    "            vi,\n",
+    "            verticalalignment=\"center\",\n",
+    "            horizontalalignment=\"center\",\n",
+    "        )\n",
+    "        y += s\n",
+    "    ax.text(\n",
+    "        x, y + s / 2, \"Key:\", verticalalignment=\"center\", horizontalalignment=\"left\"\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "def draw(\n",
+    "    nodes, adj, ax, highlight=None, key=False, labels=None, mask=None, draw_nodes=None\n",
+    "):\n",
+    "    G = nx.Graph()\n",
+    "    for i in range(adj.shape[0]):\n",
+    "        for j in range(adj.shape[0]):\n",
+    "            if np.any(adj[i, j]):\n",
+    "                G.add_edge(i, j)\n",
+    "    if mask is None:\n",
+    "        mask = [True] * len(G)\n",
+    "    if draw_nodes is None:\n",
+    "        draw_nodes = nodes\n",
+    "    # go from atomic number to element\n",
+    "    elements = np.argmax(draw_nodes, axis=-1)\n",
+    "    el_labels = {i: list(my_elements.values())[e] for i, e in enumerate(elements)}\n",
+    "    try:\n",
+    "        pos = nx.nx_agraph.graphviz_layout(G, prog=\"sfdp\")\n",
+    "    except ImportError:\n",
+    "        pos = nx.spring_layout(G, iterations=100, seed=4, k=1)\n",
+    "    pos = nx.rescale_layout_dict(pos)\n",
+    "    c = [\"white\"] * len(G)\n",
+    "    all_h = []\n",
+    "    if highlight is not None:\n",
+    "        for i, h in enumerate(highlight):\n",
+    "            for hj in h:\n",
+    "                c[hj] = \"C{}\".format(i + 1)\n",
+    "                all_h.append(hj)\n",
+    "    nx.draw(G, ax=ax, pos=pos, labels=el_labels, node_size=700, node_color=c)\n",
+    "    cmap = plt.get_cmap(\"Wistia\")\n",
+    "    for i in range(len(G)):\n",
+    "        if not mask[i]:\n",
+    "            continue\n",
+    "        if i in all_h:\n",
+    "            draw_vector(*pos[i], 0.15, nodes[i], ax, cmap)\n",
+    "        else:\n",
+    "            draw_vector(*pos[i], 0.15, nodes[i], ax, None)\n",
+    "    if key:\n",
+    "        draw_key(-1, -1, 0.15, my_elements.values(), ax, cmap)\n",
+    "    if labels is not None:\n",
+    "        legend_elements = []\n",
+    "        for i, l in enumerate(labels):\n",
+    "            p = mpl.lines.Line2D(\n",
+    "                [0], [0], marker=\"o\", color=\"C{}\".format(i + 1), label=l, markersize=15\n",
+    "            )\n",
+    "            legend_elements.append(p)\n",
+    "        ax.legend(handles=legend_elements)\n",
+    "    ax.set_xlim(-1.2, 1.2)\n",
+    "    ax.set_ylim(-1.2, 1.2)\n",
+    "    ax.set_facecolor(\"#f5f4e9\")\n",
+    "\n",
+    "\n",
+    "nodes, adj = smiles2graph(\"CO\")\n",
+    "fig = plt.figure(figsize=(8, 5))\n",
+    "draw(nodes, adj, plt.gca(), highlight=[[1], [5, 0]], labels=[\"center\", \"neighbors\"])\n",
+    "fig.set_facecolor(\"#f5f4e9\")\n",
+    "glue(\"dframe\", plt.gcf(), display=False)\n",
+    "\n",
+    "# THIS CELL IS USED TO GENERATE A FIGURE\n",
+    "# AND NOT RELATED TO CHAPTER\n",
+    "# YOU CAN SKIP IT\n",
+    "fig, axs = plt.subplots(1, 2, squeeze=True, figsize=(14, 6), dpi=100)\n",
+    "order = [5, 1, 0, 2, 3, 4]\n",
+    "time_per_node = 2\n",
+    "last_layer = [0]\n",
+    "layers = 2\n",
+    "input_nodes = np.copy(nodes)\n",
+    "fig.set_facecolor(\"#f5f4e9\")\n",
+    "\n",
+    "\n",
+    "def make_frame(t):\n",
+    "    axs[0].clear()\n",
+    "    axs[1].clear()\n",
+    "\n",
+    "    layer_i = int(t / (time_per_node * len(order)))\n",
+    "    axs[0].set_title(f\"Layer {layer_i + 1} Input\")\n",
+    "    axs[1].set_title(f\"Layer {layer_i + 1} Output\")\n",
+    "\n",
+    "    flat_adj = np.sum(adj, axis=-1)\n",
+    "    out_nodes = np.einsum(\n",
+    "        \"i,ij,jk->ik\",\n",
+    "        1 / (np.sum(flat_adj, axis=1) + 1),\n",
+    "        flat_adj + np.eye(*flat_adj.shape),\n",
+    "        nodes,\n",
+    "    )\n",
+    "\n",
+    "    if last_layer[0] != layer_i:\n",
+    "        print(\"recomputing\")\n",
+    "        nodes[:] = out_nodes\n",
+    "        last_layer[0] = layer_i\n",
+    "\n",
+    "    t -= layer_i * time_per_node * len(order)\n",
+    "    i = order[int(t / time_per_node)]\n",
+    "    print(last_layer, layer_i, i, t)\n",
+    "    mask = [False] * nodes.shape[0]\n",
+    "    for j in order[: int(t / time_per_node) + 1]:\n",
+    "        mask[j] = True\n",
+    "    print(mask, i)\n",
+    "    neighs = list(np.where(adj[i])[0])\n",
+    "    if (t - int(t / time_per_node) * time_per_node) >= time_per_node / 4:\n",
+    "        draw(\n",
+    "            nodes,\n",
+    "            adj,\n",
+    "            axs[0],\n",
+    "            highlight=[[i], neighs],\n",
+    "            labels=[\"center\", \"neighbors\"],\n",
+    "            draw_nodes=input_nodes,\n",
+    "        )\n",
+    "    else:\n",
+    "        draw(\n",
+    "            nodes,\n",
+    "            adj,\n",
+    "            axs[0],\n",
+    "            highlight=[[i]],\n",
+    "            labels=[\"center\", \"neighbors\"],\n",
+    "            draw_nodes=input_nodes,\n",
+    "        )\n",
+    "    if (t - int(t / time_per_node) * time_per_node) < time_per_node / 2:\n",
+    "        mask[j] = False\n",
+    "    draw(\n",
+    "        out_nodes,\n",
+    "        adj,\n",
+    "        axs[1],\n",
+    "        highlight=[[i]],\n",
+    "        key=True,\n",
+    "        mask=mask,\n",
+    "        draw_nodes=input_nodes,\n",
+    "    )\n",
+    "    fig.set_facecolor(\"#f5f4e9\")\n",
+    "    return mplfig_to_npimage(fig)\n",
+    "\n",
+    "\n",
+    "animation = VideoClip(make_frame, duration=time_per_node * nodes.shape[0] * layers)\n",
+    "\n",
+    "# animation.write_gif(\"../_static/images/gcn.gif\", fps=2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d91a3c41",
+   "metadata": {},
+   "outputs": [],
    "source": []
   }
  ],

From cfa88fbd9675d42bff6a95d55abde1f500488b77 Mon Sep 17 00:00:00 2001
From: Andrew White <white.d.andrew@gmail.com>
Date: Sat, 12 Nov 2022 20:04:38 -0500
Subject: [PATCH 05/15] Added citation for schnet critique

---
 dl/molnets.ipynb |  4 ++--
 references.bib   | 15 +++++++++++----
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/dl/molnets.ipynb b/dl/molnets.ipynb
index 664c572f..bb02bb57 100644
--- a/dl/molnets.ipynb
+++ b/dl/molnets.ipynb
@@ -108,7 +108,7 @@
     "F\\left(\\vec{r}\\right) = -\\nabla U\\left(\\vec{r}\\right)\n",
     "\\end{equation}\n",
     "\n",
-    "where $U\\left(\\vec{x}\\right)$ is the rotation invariant potential given all atom positions $\\vec{r}$. So if we're predicting a translation, rotation, and permutation invariant potential, why use equivariance? Performance. Models like SchNet or ANI are invariant and are not as accurate as models like NequiP or TorchMD-NET that have equivariances in their internal layers."
+    "where $U\\left(\\vec{x}\\right)$ is the rotation invariant potential given all atom positions $\\vec{r}$. So if we're predicting a translation, rotation, and permutation invariant potential, why use equivariance? Part of it is performance. Models like SchNet or ANI are invariant and are not as accurate as models like NequiP or TorchMD-NET that have equivariances in their internal layers. Another reason is that there are indeed specific 3D configurations that should have different energies (according to quantum chemistry calculations), but are invariant if treatd with pairwise distance alone {cite}`pozdnyakov2022incompleteness`."
    ]
   },
   {
@@ -473,7 +473,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.5"
+   "version": "3.8.12"
   }
  },
  "nbformat": 4,
diff --git a/references.bib b/references.bib
index 48cdd98b..4d274eca 100644
--- a/references.bib
+++ b/references.bib
@@ -1758,8 +1758,15 @@ @article{weisfeiler1968reduction
 }
 
 @article{batatia2022design,
-  title={The Design Space of E (3)-Equivariant Atom-Centered Interatomic Potentials},
-  author={Batatia, Ilyes and Batzner, Simon and Kov{\'a}cs, D{\'a}vid P{\'e}ter and Musaelian, Albert and Simm, Gregor NC and Drautz, Ralf and Ortner, Christoph and Kozinsky, Boris and Cs{\'a}nyi, G{\'a}bor},
-  journal={arXiv preprint arXiv:2205.06643},
-  year={2022}
+  title   = {The Design Space of E (3)-Equivariant Atom-Centered Interatomic Potentials},
+  author  = {Batatia, Ilyes and Batzner, Simon and Kov{\'a}cs, D{\'a}vid P{\'e}ter and Musaelian, Albert and Simm, Gregor NC and Drautz, Ralf and Ortner, Christoph and Kozinsky, Boris and Cs{\'a}nyi, G{\'a}bor},
+  journal = {arXiv preprint arXiv:2205.06643},
+  year    = {2022}
+}
+
+@article{pozdnyakov2022incompleteness,
+  title   = {Incompleteness of graph convolutional neural networks for points clouds in three dimensions},
+  author  = {Pozdnyakov, Sergey N and Ceriotti, Michele},
+  journal = {arXiv preprint arXiv:2201.07136},
+  year    = {2022}
 }

From c92b58e52b2a87b843f6ef80de9af6e5aba0e6bf Mon Sep 17 00:00:00 2001
From: Andrew White <white.d.andrew@gmail.com>
Date: Wed, 23 Nov 2022 17:16:38 -0500
Subject: [PATCH 06/15] Some progress on drawing

---
 dl/molnets.ipynb | 109 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 74 insertions(+), 35 deletions(-)

diff --git a/dl/molnets.ipynb b/dl/molnets.ipynb
index bb02bb57..20c15ddf 100644
--- a/dl/molnets.ipynb
+++ b/dl/molnets.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "8025a9b6",
+   "id": "a4389326",
    "metadata": {},
    "source": [
     "# Modern Molecular NNs\n",
@@ -20,7 +20,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3922d8a4",
+   "id": "763e14e1",
    "metadata": {},
    "source": [
     "```{warning}\n",
@@ -31,7 +31,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "4e618055",
+   "id": "8d468e41",
    "metadata": {
     "tags": [
      "remove-cell"
@@ -76,7 +76,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0f7062a1",
+   "id": "1bed2a9e",
    "metadata": {},
    "source": [
     "# Expressiveness\n",
@@ -97,7 +97,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6cbb302c",
+   "id": "68b5d551",
    "metadata": {},
    "source": [
     "These can be distinguished if we also have (and use) their Cartesian coordinates. We cannot distinguish enantiomers with GNNs, except maybe with pre-computed node attributes. Even those start to breakdown when we have helical chirality that is not centered at any one molecule.\n",
@@ -113,7 +113,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8aab34c9",
+   "id": "a18782f2",
    "metadata": {},
    "source": [
     "## The Elements of Modern Molecular NNs\n",
@@ -125,7 +125,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7732f356",
+   "id": "242add02",
    "metadata": {},
    "source": [
     "### Atom features\n",
@@ -135,7 +135,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "faaf213d",
+   "id": "3e34c1b9",
    "metadata": {},
    "source": [
     "### Atomic Cluster Expansions\n",
@@ -155,7 +155,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "470c3063",
+   "id": "cef0d13f",
    "metadata": {},
    "source": [
     "## Normalization\n",
@@ -167,7 +167,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "39849016",
+   "id": "b083f55a",
    "metadata": {},
    "source": [
     "## Running This Notebook\n",
@@ -190,7 +190,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3c219ed7",
+   "id": "6bb5fcb7",
    "metadata": {},
    "source": [
     "## Cited References\n",
@@ -204,7 +204,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "596a40bc",
+   "id": "7bf44568",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -223,7 +223,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "aa0b3498",
+   "id": "0f525d6f",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -256,9 +256,24 @@
     "            raise Warning(\"Ignoring bond order\" + order)\n",
     "        adj[u, v, order] = 1\n",
     "        adj[v, u, order] = 1\n",
-    "    return nodes, adj\n",
-    "\n",
-    "\n",
+    "    return nodes, adj"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0d8cb7c6",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6dc9231b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
     "# THIS CELL IS USED TO GENERATE A FIGURE\n",
     "# AND NOT RELATED TO CHAPTER\n",
     "# YOU CAN SKIP IT\n",
@@ -271,24 +286,37 @@
     "    x += s / 2\n",
     "    y += s / 2\n",
     "    for vi in v:\n",
-    "        if cmap is not None:\n",
-    "            ax.add_patch(\n",
-    "                mpl.patches.Rectangle((x, y), s * 1.5, s, facecolor=cmap(vi), **kwargs)\n",
-    "            )\n",
-    "        else:\n",
-    "            ax.add_patch(\n",
-    "                mpl.patches.Rectangle(\n",
-    "                    (x, y), s * 1.5, s, facecolor=\"#FFF\", edgecolor=\"#333\", **kwargs\n",
+    "        L = len(vi)\n",
+    "        for j, vij in enumerate(vi):\n",
+    "            if cmap is not None:\n",
+    "                ax.add_patch(\n",
+    "                    mpl.patches.Rectangle(\n",
+    "                        (x + j * s * 1.5 / L, y),\n",
+    "                        s * 1.5 / L,\n",
+    "                        s,\n",
+    "                        facecolor=cmap(vi),\n",
+    "                        **kwargs,\n",
+    "                    )\n",
+    "                )\n",
+    "            else:\n",
+    "                ax.add_patch(\n",
+    "                    mpl.patches.Rectangle(\n",
+    "                        (x + j * s * 1.5 / L, y),\n",
+    "                        s * 1.5 / L,\n",
+    "                        s,\n",
+    "                        facecolor=\"#FFF\",\n",
+    "                        edgecolor=\"#333\",\n",
+    "                        **kwargs,\n",
+    "                    )\n",
     "                )\n",
+    "            ax.text(\n",
+    "                x + j * s * 1.5 / L + s * 1.5 / L / 2,\n",
+    "                y + s / 2,\n",
+    "                \"{:.2f}\".format(vi),\n",
+    "                verticalalignment=\"center\",\n",
+    "                horizontalalignment=\"center\",\n",
     "            )\n",
-    "        ax.text(\n",
-    "            x + s * 1.5 / 2,\n",
-    "            y + s / 2,\n",
-    "            \"{:.2f}\".format(vi),\n",
-    "            verticalalignment=\"center\",\n",
-    "            horizontalalignment=\"center\",\n",
-    "        )\n",
-    "        y += s\n",
+    "            y += s\n",
     "\n",
     "\n",
     "def draw_key(x, y, s, v, ax, cmap, **kwargs):\n",
@@ -363,9 +391,20 @@
     "\n",
     "\n",
     "nodes, adj = smiles2graph(\"CO\")\n",
+    "print(nodes)\n",
+    "nodes_vectors = [[n, n] for n in nodes]\n",
     "fig = plt.figure(figsize=(8, 5))\n",
     "draw(nodes, adj, plt.gca(), highlight=[[1], [5, 0]], labels=[\"center\", \"neighbors\"])\n",
-    "fig.set_facecolor(\"#f5f4e9\")\n",
+    "fig.set_facecolor(\"#f5f4e9\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fd8e064f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
     "glue(\"dframe\", plt.gcf(), display=False)\n",
     "\n",
     "# THIS CELL IS USED TO GENERATE A FIGURE\n",
@@ -450,7 +489,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "d91a3c41",
+   "id": "2e7cf098",
    "metadata": {},
    "outputs": [],
    "source": []
@@ -473,7 +512,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.12"
+   "version": "3.8.5"
   }
  },
  "nbformat": 4,

From c9ca94cabbd3032e8ad2007282fe44dfdedcc783 Mon Sep 17 00:00:00 2001
From: Andrew White <white.d.andrew@gmail.com>
Date: Sun, 4 Dec 2022 21:48:48 -0500
Subject: [PATCH 07/15] Added more exercises

---
 dl/gnn.ipynb    | 74 +++++++++++++++++++++++++++++++++++++++----------
 dl/layers.ipynb | 13 ++++-----
 dl/xai.ipynb    | 21 ++++++++++++--
 references.bib  | 10 +++++++
 4 files changed, 93 insertions(+), 25 deletions(-)

diff --git a/dl/gnn.ipynb b/dl/gnn.ipynb
index 1c12b7d0..1b88800e 100644
--- a/dl/gnn.ipynb
+++ b/dl/gnn.ipynb
@@ -1724,21 +1724,6 @@
     "Adding node labels is not enough generally. Molecules can interconvert between stereoisomers at chiral centers through a process called tautomerization. There are also types of stereochemistry that are not at a specific atom, like rotamers that are around a bond. Then there is stereochemistry that involves multiple atoms like axial helecene. As shown in {numref}`helicene`, the molecule has no chiral centers but is \"optically active\" (experimentally measured to be chiral) because of its helix which can be left- or right-handed. "
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Relevant Videos\n",
-    "\n",
-    "### Intro to GNNs\n",
-    "\n",
-    "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube-nocookie.com/embed/uF53xsT7mjc\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen></iframe>\n",
-    "\n",
-    "### Overview of GNN with Molecule, Compiler Examples\n",
-    "\n",
-    "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube-nocookie.com/embed/zCEYiCxrL_0\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen></iframe>\n"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -1756,6 +1741,65 @@
     "* You can convert xyz coordinates into a graph and use a GNN like SchNet "
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Exercises\n",
+    "\n",
+    "1. Write ethanol as a graph with one-hot node features and an adjacency tensor.\n",
+    "\n",
+    "2. The GCN as presented is supposed to be permutation invariant. Recall the key GCN equation is :\n",
+    "\n",
+    "$$\n",
+    "v_{il}' = \\sigma\\left(\\frac{1}{d_i}e_{ij}v_{jk}w_{lk}\\right)\n",
+    "$$\n",
+    "\n",
+    "where $i$ is receiving node, $j$ is sending node, $k$ is input features, and $l$ is output features. $v$ is node features, $v'$ is updated node features, $e$ is edge, $d$ is degree, and $w$ are trainable parameters.  Determine and show if the following variations of the GCN equation are permutation invariant:\n",
+    "\n",
+    "Variant 1:\n",
+    "$$\n",
+    "v_{ik}' = \\sigma\\left(\\frac{1}{d_i}e_{ij}v_{jk}w_{k}\\right)\n",
+    "$$\n",
+    "\n",
+    "Variant 2:\n",
+    "$$\n",
+    "v_{ik}' = \\sigma\\left(\\frac{1}{d_i}e_{ij}v_{jk}w_{jk}\\right)\n",
+    "$$\n",
+    "\n",
+    "\n",
+    "Variant 3:\n",
+    "$$\n",
+    "    v_{il}' = \\sigma\\left(\\frac{1}{d_i}e_{ij}v_{jk}w_{ilk}\\right)\n",
+    "$$\n",
+    "\n",
+    "\n",
+    "3. With the Battaglia equations, can you have an 'inverse' GCN that only updates edge features but not node features? Could it learn anything?\n",
+    "\n",
+    "4. We would like to identify a specific atom in a molecule, like a potential site of a reaction. We have training data consisting of graphs and specific nodes. Is this regression or classification? What would be an appropriate readout and what would be the correct loss? \n",
+    "\n",
+    "5. Propose a modification to the GCN that predicts edge properties. Contrast your network with SchNet - how is it different?\n",
+    "\n",
+    "6. What is the relationship between the number of GCN layers and maximum number of bonds which information can pass between?\n",
+    "\n",
+    "7. How many layers would it take for a GNN to detect if a molecule has a 6-membered ring?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Relevant Videos\n",
+    "\n",
+    "### Intro to GNNs\n",
+    "\n",
+    "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube-nocookie.com/embed/uF53xsT7mjc\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen></iframe>\n",
+    "\n",
+    "### Overview of GNN with Molecule, Compiler Examples\n",
+    "\n",
+    "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube-nocookie.com/embed/zCEYiCxrL_0\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen></iframe>\n"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
diff --git a/dl/layers.ipynb b/dl/layers.ipynb
index 0645dcbe..4698acbb 100644
--- a/dl/layers.ipynb
+++ b/dl/layers.ipynb
@@ -331,14 +331,13 @@
     "\n",
     "### Batch Normalization\n",
     "\n",
-    "It is arguable if batch normalization is a regularization technique -- there have been probably 10,000 papers on why it works. Batch normalization is a layer that is added to a neural network with trainable weights, but its trainable weights are not updated via gradient descent of the loss. Batch normalization has a layer equation of:\n",
-    "\n",
+    "It is arguable if batch normalization is a regularization technique -- there is often debate about why it's effective. Batch normalization is a layer that is added to a neural network with trainable weights {cite}`ioffe2015batch`. Batch normalization has a layer equation of:\n",
     "\n",
     "\\begin{equation}\n",
-    "f(X) = \\frac{X - \\bar{X}}{S}\n",
+    "f(X) = \\gamma\\frac{X - \\bar{X}(B)}{S(B)} + \\beta\n",
     "\\end{equation}\n",
     "\n",
-    "where $\\bar{X}$ and $S$ are the sample mean and variance taken across the batch axis (zeroth axis of $X$). This has the effect of \"smoothing\" out the magnitudes of values seen between batches. Remember that activations like ReLU depend on values being near 0 (since the nonlinear part is at $x = 0$) and tanh has the most change in output around $x = 0$, so you typically want your intermediate layer outputs to be around $0$. At inference time you may not have batches or your batches may be a different size, so $\\bar{X}$ and $S$ are set to the average across all batches seen in training data. A common explanation of batch normalization is that it smooths out the optimization landscape by forcing layer outputs to be approximately normal{cite}`santurkar2018does`.\n",
+    "where $\\bar{X}$ and $S$ are the sample mean and variance taken across the batch axis. This has the effect of \"smoothing\" out the magnitudes of values seen between batches. $\\gamma$ and $\\beta$ are optional trainable parameters that can move the output mean and variance to be $\\beta$ and $\\gamma$, respectively. Remember that activations like ReLU depend on values being near 0 (since the nonlinear part is at $x = 0$) and tanh has the most change in output around $x = 0$, so you typically want your intermediate layer outputs to be around $0$. But, $\\gamma$ and $\\beta$ allow the optimum output to be learned. At inference time you may not have batches or your batches may be a different size, so $\\bar{X}$ and $S$ are set to the average across all batches seen in training data. A common explanation of batch normalization is that it smooths out the optimization landscape by forcing layer outputs to be approximately normal{cite}`santurkar2018does`.\n",
     "\n",
     "```{margin}\n",
     "**Inference** is the word for when you use your model to make predictions. Training is when you train the model and inference is when you use the model. \n",
@@ -346,7 +345,7 @@
     "\n",
     "#### Layer Normalization\n",
     "\n",
-    "Batch normalization depends on there being a constant batch size. Some kinds of data, like text or a graphs, have different sizes and so the batch mean/variance can change significantly. **Layer normalization** avoids this problem by normalizing across the *features* (the non-zero axes) instead of the batch. This has a similar effect of making the layer output features behave well-centered at 0 but without having highly variable means/variances because of batch to batch variation. You'll see these in graph neural networks and recurrent neural networks, with both take variable sized inputs. \n",
+    "Batch normalization depends on there being a constant batch size. Some kinds of data, like text or a graphs, have different sizes and so the batch mean/variance can change significantly. **Layer normalization** avoids this problem by normalizing across the *features* (the non-batch axis/channel axis) instead of the batch. This has a similar effect of making the layer output features behave well-centered at 0 but without having highly variable means/variances because of batch to batch variation. You'll see these in graph neural networks and recurrent neural networks, with both take variable sized inputs. \n",
     "\n",
     "### Dropout\n",
     "\n",
@@ -645,7 +644,7 @@
   "celltoolbar": "Tags",
   "hide_input": false,
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "Python 3",
    "language": "python",
    "name": "python3"
   },
@@ -659,7 +658,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.13"
+   "version": "3.8.12"
   }
  },
  "nbformat": 4,
diff --git a/dl/xai.ipynb b/dl/xai.ipynb
index e9355f86..fae20c6a 100644
--- a/dl/xai.ipynb
+++ b/dl/xai.ipynb
@@ -1041,7 +1041,7 @@
     "To install packages, execute this code in a new cell\n",
     "\n",
     "```\n",
-    "!pip install exmol jupyter-book matplotlib numpy pandas seaborn sklearn mordred[full] rdkit-pypi\n",
+    "!pip install exmol jupyter-book matplotlib numpy pandas seaborn sklearn mordred[full] rdkit\n",
     "```\n",
     "\n",
     "````"
@@ -1183,7 +1183,7 @@
     "# Model training\n",
     "model.fit(X_train, y_train, epochs=50, batch_size=32, verbose=0)\n",
     "_, accuracy = model.evaluate(X_test, y_test)\n",
-    "print(f\"Model accuracy: {accuracy*100:.2f}%\")"
+    "print(f\"Model accuracy: {accuracy:.2%}\")"
    ]
   },
   {
@@ -1295,7 +1295,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Summary\n",
+    "## Chapter Summary \n",
     "\n",
     "* Interpretation of deep learning models is imperative for ensuring model correctness, making predictions useful to humans, and can be required for legal compliance.\n",
     "* Interpretability of neural networks is part of a broader topic of explainability in AI (XAI), a topic that is in its infancy\n",
@@ -1307,6 +1307,21 @@
     "* `exmol` is a software that generate model agnostic molecular counterfactual explanations."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Exercises\n",
+    "\n",
+    "1. Computing feature importance requires computing $\\nabla \\hat{f}(x)$ - the gradient of the output with respect to the input. Is this the same gradient we compute when training a neural network?\n",
+    "\n",
+    "2. Why might $\\nabla \\hat{f}(x)$ be more difficult when the input is a graph (molecule) instead of an image or dense vector?\n",
+    "\n",
+    "3. Some of the attributes of an explanation are if it's actionable, if it's faithful (agrees with NN), if it's sparse, and if it's complete. Make a table comparing these attributes of explanations generated by training data importance, feature importance, surrogate models, and counterfactual methods. \n",
+    "\n",
+    "4. Can we average feature importances across the whole training dataset to provide a global explanation?"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
diff --git a/references.bib b/references.bib
index 4d274eca..b5d47990 100644
--- a/references.bib
+++ b/references.bib
@@ -1770,3 +1770,13 @@ @article{pozdnyakov2022incompleteness
   journal = {arXiv preprint arXiv:2201.07136},
   year    = {2022}
 }
+
+
+@inproceedings{ioffe2015batch,
+  title={Batch normalization: Accelerating deep network training by reducing internal covariate shift},
+  author={Ioffe, Sergey and Szegedy, Christian},
+  booktitle={International conference on machine learning},
+  pages={448--456},
+  year={2015},
+  organization={PMLR}
+}

From c939fce4b31cf8b82c9c4daaca4af99f42e83777 Mon Sep 17 00:00:00 2001
From: Andrew White <white.d.andrew@gmail.com>
Date: Sun, 4 Dec 2022 23:47:09 -0500
Subject: [PATCH 08/15] More progress on molnet animation

---
 dl/gnn.ipynb     |   6 +--
 dl/molnets.ipynb | 111 ++++++++++++++++++++++++++++++++---------------
 2 files changed, 78 insertions(+), 39 deletions(-)

diff --git a/dl/gnn.ipynb b/dl/gnn.ipynb
index 40e364b1..e546a89d 100644
--- a/dl/gnn.ipynb
+++ b/dl/gnn.ipynb
@@ -110,8 +110,8 @@
     "import numpy as np\n",
     "import tensorflow as tf\n",
     "import pandas as pd\n",
-    "import rdkit, rdkit.Chem, , rdkit.Chem.Draw\n",
-    "import networkx as nxrdkit.Chem.rdDepictor\n",
+    "import rdkit, rdkit.Chem, rdkit.Chem.rdDepictor, rdkit.Chem.Draw\n",
+    "import networkx as nx\n",
     "import dmol"
    ]
   },
@@ -1732,7 +1732,7 @@
     "\n",
     "* Molecules can be represented by graphs by using one-hot encoded feature vectors that show the elemental identity of each node (atom) and an adjacency matrix that show immediate neighbors (bonded atoms).\n",
     "* Graph neural networks are a category of deep neural networks that have graphs as inputs.\n",
-    "* One of the early GNNs is the Kipf & Welling GCN. The input to the GCN is the node feature vector and the adjacency matrix, and returns the updated node feature vector. The main reason a GCN is *permutation equivariant* is because it pools over each nodes' neighbors in a permutation *invariant* way (e.g., averaging). \n",
+    "* One of the early GNNs is the Kipf & Welling GCN. The input to the GCN is the node feature vector and the adjacency matrix, and returns the updated node feature vector. The GCN is permutation invariant because it averages over the neighbors. \n",
     "* A GCN can be viewed as a message-passing layer, in which we have senders and receivers. Messages are computed from neighboring nodes, which when aggregated update that node. \n",
     "* A gated graph neural network is a variant of the message passing layer, for which the nodes are updated according to a gated recurrent unit function. \n",
     "* The aggregation of messages is sometimes called pooling, for which there are multiple reduction operations. \n",
diff --git a/dl/molnets.ipynb b/dl/molnets.ipynb
index 20c15ddf..85f6e8ea 100644
--- a/dl/molnets.ipynb
+++ b/dl/molnets.ipynb
@@ -224,7 +224,11 @@
    "cell_type": "code",
    "execution_count": null,
    "id": "0f525d6f",
-   "metadata": {},
+   "metadata": {
+    "tags": [
+     "hide-cell"
+    ]
+   },
    "outputs": [],
    "source": [
     "def smiles2graph(sml):\n",
@@ -259,14 +263,6 @@
     "    return nodes, adj"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0d8cb7c6",
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -285,23 +281,24 @@
     "def draw_vector(x, y, s, v, ax, cmap, **kwargs):\n",
     "    x += s / 2\n",
     "    y += s / 2\n",
-    "    for vi in v:\n",
-    "        L = len(vi)\n",
-    "        for j, vij in enumerate(vi):\n",
+    "    # iterate vertically\n",
+    "    L = max([len(vi) for vi in v])\n",
+    "    for dy, vi in enumerate(v):\n",
+    "        for dx, vij in enumerate(vi):\n",
     "            if cmap is not None:\n",
     "                ax.add_patch(\n",
     "                    mpl.patches.Rectangle(\n",
-    "                        (x + j * s * 1.5 / L, y),\n",
+    "                        (x + dx * s * 1.5 / L, y + dy * s),\n",
     "                        s * 1.5 / L,\n",
     "                        s,\n",
-    "                        facecolor=cmap(vi),\n",
+    "                        facecolor=cmap(vij),\n",
     "                        **kwargs,\n",
     "                    )\n",
     "                )\n",
     "            else:\n",
     "                ax.add_patch(\n",
     "                    mpl.patches.Rectangle(\n",
-    "                        (x + j * s * 1.5 / L, y),\n",
+    "                        (x + dx * s * 1.5 / L, y + dy * s),\n",
     "                        s * 1.5 / L,\n",
     "                        s,\n",
     "                        facecolor=\"#FFF\",\n",
@@ -310,13 +307,13 @@
     "                    )\n",
     "                )\n",
     "            ax.text(\n",
-    "                x + j * s * 1.5 / L + s * 1.5 / L / 2,\n",
-    "                y + s / 2,\n",
-    "                \"{:.2f}\".format(vi),\n",
+    "                x + dx * s * 1.5 / L + s * 1.5 / L / 2,\n",
+    "                y + s / 2 + dy * s,\n",
+    "                \"{:.2f}\".format(vij),\n",
     "                verticalalignment=\"center\",\n",
     "                horizontalalignment=\"center\",\n",
+    "                fontsize=5,\n",
     "            )\n",
-    "            y += s\n",
     "\n",
     "\n",
     "def draw_key(x, y, s, v, ax, cmap, **kwargs):\n",
@@ -340,7 +337,15 @@
     "\n",
     "\n",
     "def draw(\n",
-    "    nodes, adj, ax, highlight=None, key=False, labels=None, mask=None, draw_nodes=None\n",
+    "    nodes,\n",
+    "    node_features,\n",
+    "    adj,\n",
+    "    ax,\n",
+    "    highlight=None,\n",
+    "    key=False,\n",
+    "    labels=None,\n",
+    "    mask=None,\n",
+    "    draw_nodes=None,\n",
     "):\n",
     "    G = nx.Graph()\n",
     "    for i in range(adj.shape[0]):\n",
@@ -366,15 +371,37 @@
     "            for hj in h:\n",
     "                c[hj] = \"C{}\".format(i + 1)\n",
     "                all_h.append(hj)\n",
-    "    nx.draw(G, ax=ax, pos=pos, labels=el_labels, node_size=700, node_color=c)\n",
+    "    # now we add all edges to close (using pos) atoms to emphasize spatial locality\n",
+    "    for i in range(adj.shape[0]):\n",
+    "        for j in range(adj.shape[0]):\n",
+    "            if (\n",
+    "                i != j\n",
+    "                and not np.any(adj[i, j])\n",
+    "                and np.linalg.norm(np.array(pos[i]) - np.array(pos[j])) < 1\n",
+    "            ):\n",
+    "                G.add_edge(i, j, space=True)\n",
+    "    # set-up edge colors based on if they are space or not\n",
+    "    edge_colors = [\"#000\"] * len(G.edges)\n",
+    "    for i, (u, v, d) in enumerate(G.edges(data=True)):\n",
+    "        if d.get(\"space\", False):\n",
+    "            edge_colors[i] = \"#AAA\"\n",
+    "    nx.draw(\n",
+    "        G,\n",
+    "        ax=ax,\n",
+    "        pos=pos,\n",
+    "        labels=el_labels,\n",
+    "        node_size=700,\n",
+    "        node_color=c,\n",
+    "        edge_color=edge_colors,\n",
+    "    )\n",
     "    cmap = plt.get_cmap(\"Wistia\")\n",
     "    for i in range(len(G)):\n",
     "        if not mask[i]:\n",
     "            continue\n",
     "        if i in all_h:\n",
-    "            draw_vector(*pos[i], 0.15, nodes[i], ax, cmap)\n",
+    "            draw_vector(*pos[i], 0.15, node_features[i], ax, cmap)\n",
     "        else:\n",
-    "            draw_vector(*pos[i], 0.15, nodes[i], ax, None)\n",
+    "            draw_vector(*pos[i], 0.15, node_features[i], ax, None)\n",
     "    if key:\n",
     "        draw_key(-1, -1, 0.15, my_elements.values(), ax, cmap)\n",
     "    if labels is not None:\n",
@@ -392,10 +419,19 @@
     "\n",
     "nodes, adj = smiles2graph(\"CO\")\n",
     "print(nodes)\n",
-    "nodes_vectors = [[n, n] for n in nodes]\n",
+    "nodes_vectors = [[n, n[:2]] for i, n in enumerate(nodes)]\n",
+    "print(nodes_vectors[1])\n",
     "fig = plt.figure(figsize=(8, 5))\n",
-    "draw(nodes, adj, plt.gca(), highlight=[[1], [5, 0]], labels=[\"center\", \"neighbors\"])\n",
-    "fig.set_facecolor(\"#f5f4e9\")"
+    "draw(\n",
+    "    nodes,\n",
+    "    nodes_vectors,\n",
+    "    adj,\n",
+    "    plt.gca(),\n",
+    "    highlight=[[1], [5, 0]],\n",
+    "    labels=[\"center\", \"neighbors\"],\n",
+    ")\n",
+    "fig.set_facecolor(\"#f5f4e9\")\n",
+    "glue(\"dframe\", plt.gcf(), display=False)"
    ]
   },
   {
@@ -428,13 +464,14 @@
     "    axs[1].set_title(f\"Layer {layer_i + 1} Output\")\n",
     "\n",
     "    flat_adj = np.sum(adj, axis=-1)\n",
-    "    out_nodes = np.einsum(\n",
-    "        \"i,ij,jk->ik\",\n",
-    "        1 / (np.sum(flat_adj, axis=1) + 1),\n",
-    "        flat_adj + np.eye(*flat_adj.shape),\n",
-    "        nodes,\n",
-    "    )\n",
-    "\n",
+    "    #     out_nodes = np.einsum(\n",
+    "    #         \"i,ij,jk->ik\",\n",
+    "    #         1 / (np.sum(flat_adj, axis=1) + 1),\n",
+    "    #         flat_adj + np.eye(*flat_adj.shape),\n",
+    "    #         nodes,\n",
+    "    #     )\n",
+    "    out_nodes = nodes\n",
+    "    node_vectors = [[n, n[:2]] for i, n in enumerate(out_nodes)]\n",
     "    if last_layer[0] != layer_i:\n",
     "        print(\"recomputing\")\n",
     "        nodes[:] = out_nodes\n",
@@ -451,6 +488,7 @@
     "    if (t - int(t / time_per_node) * time_per_node) >= time_per_node / 4:\n",
     "        draw(\n",
     "            nodes,\n",
+    "            node_vectors,\n",
     "            adj,\n",
     "            axs[0],\n",
     "            highlight=[[i], neighs],\n",
@@ -460,6 +498,7 @@
     "    else:\n",
     "        draw(\n",
     "            nodes,\n",
+    "            node_vectors,\n",
     "            adj,\n",
     "            axs[0],\n",
     "            highlight=[[i]],\n",
@@ -470,6 +509,7 @@
     "        mask[j] = False\n",
     "    draw(\n",
     "        out_nodes,\n",
+    "        node_vectors,\n",
     "        adj,\n",
     "        axs[1],\n",
     "        highlight=[[i]],\n",
@@ -482,8 +522,7 @@
     "\n",
     "\n",
     "animation = VideoClip(make_frame, duration=time_per_node * nodes.shape[0] * layers)\n",
-    "\n",
-    "# animation.write_gif(\"../_static/images/gcn.gif\", fps=2)"
+    "animation.write_gif(\"../_static/images/molnet.gif\", fps=2)"
    ]
   },
   {
@@ -512,7 +551,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.5"
+   "version": "3.8.12"
   }
  },
  "nbformat": 4,

From c47401614c2be3f36a6475c50e6cd4bb1cd98dd2 Mon Sep 17 00:00:00 2001
From: Andrew White <white.d.andrew@gmail.com>
Date: Mon, 5 Dec 2022 00:09:34 -0500
Subject: [PATCH 09/15] Removed disables

---
 _config.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/_config.yml b/_config.yml
index 13125730..2b2a828c 100644
--- a/_config.yml
+++ b/_config.yml
@@ -54,8 +54,8 @@ sphinx:
       "twitter:site": "@andrewwhite01"
 
 execute:
-  exclude_patterns          : ['livecomj/*', 'dl/pretraining.ipynb', 'applied/*', 'ml/*', 'dl/xai.ipynb', 'NLP.ipynb', 'flows.ipynb',
-  'dl/Equivariant.ipynb', 'dl/gnn.ipynb', 'dl/data.ipynb', 'VAE.ipynb', 'dl/introduction.ipynb', 'dl/Hyperparameter_tuning.ipynb', 'attention.ipynb', 'layers.ipynb']
+  #exclude_patterns          : ['livecomj/*', 'dl/pretraining.ipynb', 'applied/*', 'ml/*', 'dl/xai.ipynb', 'NLP.ipynb', 'flows.ipynb',
+  #'dl/Equivariant.ipynb', 'dl/gnn.ipynb', 'dl/data.ipynb', 'VAE.ipynb', 'dl/introduction.ipynb', 'dl/Hyperparameter_tuning.ipynb', 'attention.ipynb', 'layers.ipynb']
   timeout: -1
 
 #jupyter_execute_notebooks: force

From 1c983de17fb73c4b3c9b9317cf2ecd968c0c6c07 Mon Sep 17 00:00:00 2001
From: Andrew White <white.d.andrew@gmail.com>
Date: Mon, 5 Dec 2022 06:47:09 -0500
Subject: [PATCH 10/15] Incremented version and added changelog notes

---
 changelog.md     | 8 ++++++++
 package/setup.py | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/changelog.md b/changelog.md
index c7ac2526..54cdc23d 100644
--- a/changelog.md
+++ b/changelog.md
@@ -1,5 +1,13 @@
 # Changelog
 
+## Version 1.3.0 (2022-12-05)
+
+* Fixed sklean pip install name
+* Updated batch norm discussion
+* Added exercises to GNN, XAI chapters
+* Fixed discussion of pooling invariance
+* Progress on molnet chapter
+
 ## Version 1.2.0 (2022-10-04)
 
 * Added simpletransformers and e3nn dependencies
diff --git a/package/setup.py b/package/setup.py
index f5890548..db264b8f 100644
--- a/package/setup.py
+++ b/package/setup.py
@@ -4,7 +4,7 @@
 
 setup(
     name="dmol-book",
-    version="1.2.0",
+    version="1.3.0",
     description="Style and Imports for dmol Book",
     author="Andrew D White",
     author_email="andrew.white@rochester.edu",

From 0f7991d89ebeb2f5afa97aec92e83a193e8022db Mon Sep 17 00:00:00 2001
From: Andrew White <white.d.andrew@gmail.com>
Date: Wed, 14 Dec 2022 00:34:51 -0500
Subject: [PATCH 11/15] Fixed alt texts

---
 .github/workflows/check-book.yml              |  4 +-
 .../workflows/deploy-jupyter-book-preview.yml | 47 +++++++++++++++++++
 .github/workflows/deploy-jupyter-book.yml     |  9 ++--
 _static/custom.js                             |  7 +++
 dl/Hyperparameter_tuning.ipynb                |  2 +
 dl/NLP.ipynb                                  | 10 +++-
 dl/gnn.ipynb                                  |  3 ++
 dl/layers.ipynb                               | 11 ++++-
 dl/molnets.ipynb                              |  9 +++-
 dl/pretraining.ipynb                          |  5 +-
 index.md                                      |  3 +-
 11 files changed, 95 insertions(+), 15 deletions(-)
 create mode 100644 .github/workflows/deploy-jupyter-book-preview.yml

diff --git a/.github/workflows/check-book.yml b/.github/workflows/check-book.yml
index 6f4cd382..8d7ef527 100644
--- a/.github/workflows/check-book.yml
+++ b/.github/workflows/check-book.yml
@@ -2,9 +2,9 @@ name: check-book
 
 on:
   push:
-    branches: [ master ]
+    branches: [ main ]
   pull_request:
-    branches: [ master ]
+    branches: [ main ]
 
 jobs:
   check-build-book:
diff --git a/.github/workflows/deploy-jupyter-book-preview.yml b/.github/workflows/deploy-jupyter-book-preview.yml
new file mode 100644
index 00000000..f3849e8e
--- /dev/null
+++ b/.github/workflows/deploy-jupyter-book-preview.yml
@@ -0,0 +1,47 @@
+name: deploy-book
+
+# Only run this on PRs
+on:
+  pull_request:
+    branches: [ main ]
+
+  workflow_dispatch:
+
+# This job installs dependencies, build the book, and pushes it to `gh-pages`
+jobs:
+
+  deploy-book:
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v2
+    - uses: s-weigand/setup-conda@v1
+      with:
+        python-version: 3.8
+
+    - name: Install dependencies
+      run: |
+        conda install -c conda-forge pygraphviz
+        cd package && python -m pip --use-deprecated=legacy-resolver --no-cache-dir install .
+        pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.12.0+cpu.html
+    - name: Build dmol-book package
+      run: |
+        pip install build
+        cd package && python -m build --sdist --wheel --outdir dist/ .
+    - name: Publish distribution 📦 to PyPI
+      uses: pypa/gh-action-pypi-publish@master
+      continue-on-error: true
+      with:
+        password: ${{ secrets.PYPI_API_TOKEN }}
+        packages_dir: package/dist/
+    - name: Build the book
+      run: env TF_CPP_MIN_LOG_LEVEL=3 jupyter-book build .
+    - name: Deploy Jupyter book to GitHub pages
+      uses: peaceiris/actions-gh-pages@v3.9.0
+      with:
+        github_token: ${{ secrets.GITHUB_TOKEN }}
+        publish_dir: _build/html
+        force_orphan: true
+        #keep_files: true
+        destination_dir: ${{ steps.myref.outputs.branch }}
+        cname: dmol.pub
diff --git a/.github/workflows/deploy-jupyter-book.yml b/.github/workflows/deploy-jupyter-book.yml
index da0cc159..038b888f 100644
--- a/.github/workflows/deploy-jupyter-book.yml
+++ b/.github/workflows/deploy-jupyter-book.yml
@@ -1,10 +1,9 @@
 name: deploy-book
 
-# Only run this when the master branch changes
+# Only run this when the main branch changes
 on:
   push:
     branches:
-    - master
     - main
 
   workflow_dispatch:
@@ -31,7 +30,7 @@ jobs:
         pip install build
         cd package && python -m build --sdist --wheel --outdir dist/ .
     - name: Publish distribution 📦 to PyPI
-      uses: pypa/gh-action-pypi-publish@master
+      uses: pypa/gh-action-pypi-publish@v1
       continue-on-error: true
       with:
         password: ${{ secrets.PYPI_API_TOKEN }}
@@ -39,9 +38,11 @@ jobs:
     - name: Build the book
       run: env TF_CPP_MIN_LOG_LEVEL=3 jupyter-book build .
     - name: Deploy Jupyter book to GitHub pages
-      uses: peaceiris/actions-gh-pages@v3.6.1
+      uses: peaceiris/actions-gh-pages@v3.9.0
       with:
         github_token: ${{ secrets.GITHUB_TOKEN }}
         publish_dir: _build/html
         force_orphan: true
+        #keep_files: true
         cname: dmol.pub
+        destination_dir: latest
diff --git a/_static/custom.js b/_static/custom.js
index 06a6cbb8..fa151edf 100644
--- a/_static/custom.js
+++ b/_static/custom.js
@@ -32,6 +32,12 @@ function insertAnchors(element) {
         newButtonContainer.classList.add('wh-flex-center')
     }
 }
+
+function addAlts(img) {
+    // we overwrite because some weird alts show-up
+    img.alt = 'Output image from code cell above'
+}
+
 function halfSize(img) {
     // we render at 200 dpi, so need to half the size of images
     // check if it's already modified
@@ -47,6 +53,7 @@ function addImgAnchors() {
     figs.forEach(insertAnchors)
     let cellOutputs = document.querySelectorAll('.cell_output img')
     cellOutputs.forEach(halfSize)
+    cellOutputs.forEach(addAlts)
     cellOutputs.forEach(insertAnchors)
 }
 
diff --git a/dl/Hyperparameter_tuning.ipynb b/dl/Hyperparameter_tuning.ipynb
index 633161a5..13a2184c 100644
--- a/dl/Hyperparameter_tuning.ipynb
+++ b/dl/Hyperparameter_tuning.ipynb
@@ -57,6 +57,7 @@
     "----\n",
     "name: loss_lr\n",
     "width: 1000px\n",
+    "alt: Effect of learning rate on loss.\n",
     "----\n",
     "Effect of learning rate on loss. \n",
     "```\n",
@@ -68,6 +69,7 @@
     "----\n",
     "name: decay_lr\n",
     "width: 750px\n",
+    "alt: Decay schedules effect on training\n",
     "----\n",
     "Decay schedules on learning rate can possibly help escaping the local minima. \n",
     "```\n",
diff --git a/dl/NLP.ipynb b/dl/NLP.ipynb
index e3386139..f78862e7 100644
--- a/dl/NLP.ipynb
+++ b/dl/NLP.ipynb
@@ -114,6 +114,7 @@
     "----\n",
     "name: rnn\n",
     "width: 400px\n",
+    "alt: Unrolled picture of RNN\n",
     "----\n",
     "Unrolled picture of RNN. \n",
     "```\n",
@@ -472,7 +473,7 @@
  "metadata": {
   "celltoolbar": "Tags",
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3.7.8 64-bit",
    "language": "python",
    "name": "python3"
   },
@@ -486,7 +487,12 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.12"
+   "version": "3.7.8"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "3e5a039a7a113538395a7d74f5574b0c5900118222149a18efb009bf03645fce"
+   }
   }
  },
  "nbformat": 4,
diff --git a/dl/gnn.ipynb b/dl/gnn.ipynb
index e546a89d..26749867 100644
--- a/dl/gnn.ipynb
+++ b/dl/gnn.ipynb
@@ -36,6 +36,7 @@
     "```{figure} ./methanol.jpg\n",
     "----\n",
     "name: methanol\n",
+    "alt: depiction of methanol as lewis dot structure\n",
     "width: 400px\n",
     "----\n",
     "Methanol with atoms numbered so that we can convert it to a graph. \n",
@@ -485,6 +486,7 @@
     "```{figure} ../_static/images/gcn.gif\n",
     "----\n",
     "name: gcnanim\n",
+    "alt: animation of the graph convolution layer operation\n",
     "----\n",
     "Animation of the graph convolution layer operation. The left is input, right is output node features. Note that two layers are shown (see title change). As the animation plays out, you can see how the information about the atoms propagates through the molecule via the averaging over neigbhors. So the oxygen goes from being just an oxygen, to an oxygen bonded to C and H, to an oxygen bonded to an H and CH3. The colors just reflect the same information in the numerical values.\n",
     "```\n",
@@ -1717,6 +1719,7 @@
     "name: helicene\n",
     "width: 500px\n",
     "class: autoplay-video\n",
+    "alt: rotating video helicene\n",
     "----\n",
     "This is a molecule with axial stereochemistry. Its small helix could be either left or right-handed. \n",
     "```\n",
diff --git a/dl/layers.ipynb b/dl/layers.ipynb
index 4698acbb..0d6a8983 100644
--- a/dl/layers.ipynb
+++ b/dl/layers.ipynb
@@ -29,6 +29,7 @@
     "---\n",
     "width: 600px\n",
     "name: fig-nn\n",
+    "alt: A neural network consisting of an input 3 x 64 x 64 image, a convolutional layer, a max pooling layer, a fully connected layer, and an output layer (128)\n",
     "---\n",
     "A typical neural network architecture is composed of multiple layers. This network is used to classify images.\n",
     "```\n",
@@ -355,6 +356,7 @@
     "----\n",
     "name: drop_out\n",
     "width: 250px\n",
+    "alt: A gif showing how dropout works.\n",
     "----\n",
     "Dropout. \n",
     "```\n",
@@ -644,7 +646,7 @@
   "celltoolbar": "Tags",
   "hide_input": false,
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3.7.8 64-bit",
    "language": "python",
    "name": "python3"
   },
@@ -658,7 +660,12 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.12"
+   "version": "3.7.8"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "3e5a039a7a113538395a7d74f5574b0c5900118222149a18efb009bf03645fce"
+   }
   }
  },
  "nbformat": 4,
diff --git a/dl/molnets.ipynb b/dl/molnets.ipynb
index 85f6e8ea..625868a4 100644
--- a/dl/molnets.ipynb
+++ b/dl/molnets.ipynb
@@ -537,7 +537,7 @@
  "metadata": {
   "celltoolbar": "Tags",
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3.7.8 64-bit",
    "language": "python",
    "name": "python3"
   },
@@ -551,7 +551,12 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.12"
+   "version": "3.7.8"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "3e5a039a7a113538395a7d74f5574b0c5900118222149a18efb009bf03645fce"
+   }
   }
  },
  "nbformat": 4,
diff --git a/dl/pretraining.ipynb b/dl/pretraining.ipynb
index 47ab00f5..657b8e65 100644
--- a/dl/pretraining.ipynb
+++ b/dl/pretraining.ipynb
@@ -44,6 +44,7 @@
     "```{figure} ../_static/images/test_error_pretraining.png\n",
     "----\n",
     "name: tept\n",
+    "alt: Test error comparison. Comparing test loss error on MNIST data, with 400 different iterations each. On the left, red and blue correspond to test error for one layer with and without pretraining, respectively. The right image has four layers instead of one.\n",
     "----\n",
     "Test error comparison. Comparing test loss error on MNIST data, with 400 different iterations each. On the left, red and blue correspond to test error for one layer with and without pretraining, respectively. The right image has four layers instead of one. \n",
     "```"
@@ -115,6 +116,7 @@
     "```{figure} ../_static/images/TL_FT.gif\n",
     "----\n",
     "name: tlft\n",
+    "alt: Comparison of fine-tuning and transfer learning with a general model architecture. Starting with the top middle block (original model), follow the flow chart for different situations.\n",
     "----\n",
     "Comparison of fine-tuning and transfer learning with a general model architecture. Starting with the top middle block (original model), follow the flow chart for different situations.\n",
     "```"
@@ -147,8 +149,9 @@
     "```{figure} ../_static/images/ssl_graphs.png\n",
     "----\n",
     "name: ptgnn\n",
+    "alt: Comparison of contrastive and predictive models in the context of self-supervised learning for GNNs. On the left, contrastive models require data pairs and discriminate between positive and negative examples, and an example architecture is provided. On the right, predictive models have data(self)-generated labels and predict outputs based on input properties. An example architecture is provided.\n",
     "----\n",
-    "Comparison of contrastive and predictive models in the context of self-supervised learning for GNNs. On the left, contrastive models require data pairs and descriminate between positive and negative examples, and an example architecture is provided. On the right, predictive models have data(self)-generated labels and predict outputs based on input properties. An example architecture is provided.\n",
+    "Comparison of contrastive and predictive models in the context of self-supervised learning for GNNs. On the left, contrastive models require data pairs and discriminate between positive and negative examples, and an example architecture is provided. On the right, predictive models have data(self)-generated labels and predict outputs based on input properties. An example architecture is provided.\n",
     "```"
    ]
   },
diff --git a/index.md b/index.md
index 5cfd69bd..1876cb7a 100644
--- a/index.md
+++ b/index.md
@@ -8,7 +8,7 @@ html_meta:
   "twitter:image": "https://dmol.pub/_static/robot-chem.png"
   "twitter:site": "@andrewwhite01"
 ---
-![Picture of art installation of networked cables](_static/images/header.png)
+![Header image showing molecules plotted in two different ways](_static/images/header.png)
 
 # Overview
 
@@ -101,7 +101,6 @@ Thank you to contributors for offering suggestions, identifying errors, and help
 18. Santanu Poddar
 19. Robert Bridges
 
-
 ## Citation
 
 Please cite the [livecommsj overview article](https://doi.org/10.33011/livecoms.3.1.1499):

From be221805d7ebbb2f09c405602ef7df9e563fd75d Mon Sep 17 00:00:00 2001
From: Andrew White <white.d.andrew@gmail.com>
Date: Mon, 19 Dec 2022 22:07:06 -0500
Subject: [PATCH 12/15] Completed first draft of animation

---
 dl/gnn.ipynb     |   8 +-
 dl/molnets.ipynb | 269 ++++++++++++++++++++++++++++++-----------------
 2 files changed, 175 insertions(+), 102 deletions(-)

diff --git a/dl/gnn.ipynb b/dl/gnn.ipynb
index c8021e47..4a2d0308 100644
--- a/dl/gnn.ipynb
+++ b/dl/gnn.ipynb
@@ -488,14 +488,14 @@
     "name: gcnanim\n",
     "alt: animation of the graph convolution layer operation\n",
     "----\n",
-    "Animation of the graph convolution layer operation. The left is input, right is output node features. Note that two layers are shown (see title change). As the animation plays out, you can see how the information about the atoms propagates through the molecule via the averaging over neigbhors. So the oxygen goes from being just an oxygen, to an oxygen bonded to C and H, to an oxygen bonded to an H and CH3. The colors just reflect the same information in the numerical values.\n",
+    "Animation of the graph convolution layer operation. The left is input, right is output node features. Note that two layers are shown (see title change). As the animation plays out, you can see how the information about the atoms propagates through the molecule via the averaging over neighbors. So the oxygen goes from being just an oxygen, to an oxygen bonded to C and H, to an oxygen bonded to an H and CH3. The colors just reflect the same information in the numerical values.\n",
     "```\n",
     "\n",
     "\n",
     "### GCN Implementation\n",
     "\n",
     "Let's now create a tensor implementation of the GCN. We'll skip the activation and trainable weights for now.\n",
-    "We must first compute our rank 2 adjacency matrix. The `smiles2graph` code above computes an adjacency tensor with feature vectors. We can fix that with a simple reduction and add the identity at the same time\n"
+    "We must first compute our rank 2 adjacency matrix. The `smiles2graph` code above computes an adjacency tensor with feature vectors. We can fix that with a simple reduction and add the identity at the same time"
    ]
   },
   {
@@ -1819,7 +1819,7 @@
  "metadata": {
   "celltoolbar": "Tags",
   "kernelspec": {
-   "display_name": "Python 3.7.8 64-bit",
+   "display_name": "Python 3",
    "language": "python",
    "name": "python3"
   },
@@ -1833,7 +1833,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.8"
+   "version": "3.8.12"
   },
   "vscode": {
    "interpreter": {
diff --git a/dl/molnets.ipynb b/dl/molnets.ipynb
index e363b899..8e0917a1 100644
--- a/dl/molnets.ipynb
+++ b/dl/molnets.ipynb
@@ -153,6 +153,32 @@
     "How is this different than a MPNN"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "1d86d7b1",
+   "metadata": {},
+   "source": [
+    "```{glue:figure} dframe\n",
+    "----\n",
+    "name: dframe\n",
+    "----\n",
+    "Intermediate step of the graph convolution layer. The 3D vectors are the node features and start as one-hot, so a `[1.00, 0.00, 0.00]` means hydrogen. The center node will be updated by averaging its neighbors features.\n",
+    "```\n",
+    "\n",
+    "\n",
+    "To help understand the GCN layer, look at {numref}`dframe`. It shows an intermediate step of the GCN layer. Each node feature is represented here as a one-hot encoded vector at input. The animation in {numref}`gcnanim` shows the averaging process over neighbor features.  To make this animation easy to follow, the trainable weights and activation functions are not considered. Note that the animation repeats for a second layer. Watch how the \"information\" about there being an oxygen atom in the molecule is propagated only after two layers to each atom. All GNNs operate with similar approaches, so try to understand how this animation works. \n",
+    "\n",
+    "\n",
+    "\n",
+    "```{figure} ../_static/images/molnet.gif\n",
+    "----\n",
+    "name: gcnanim\n",
+    "alt: animation of the graph convolution layer operation\n",
+    "----\n",
+    "Animation of the graph convolution layer operation. The left is input, right is output node features. Note that two layers are shown (see title change). As the animation plays out, you can see how the information about the atoms propagates through the molecule via the averaging over neighbors. So the oxygen goes from being just an oxygen, to an oxygen bonded to C and H, to an oxygen bonded to an H and CH3. The colors just reflect the same information in the numerical values.\n",
+    "```"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "cef0d13f",
@@ -231,6 +257,17 @@
    },
    "outputs": [],
    "source": [
+    "def mol2coords(mol):\n",
+    "    drawer = rdkit.Chem.Draw.rdMolDraw2D.MolDraw2DCairo(-1, -1)\n",
+    "    drawer.drawOptions().scalingFactor = 1.0  # 1 pixel per angstrom\n",
+    "    drawer.DrawMolecule(mol)\n",
+    "    drawer.FinishDrawing()\n",
+    "    return {\n",
+    "        i: [drawer.GetDrawCoords(i).x, drawer.GetDrawCoords(i).y]\n",
+    "        for i in range(mol.GetNumAtoms())\n",
+    "    }\n",
+    "\n",
+    "\n",
     "def smiles2graph(sml):\n",
     "    \"\"\"Argument for the RD2NX function should be a valid SMILES sequence\n",
     "    returns: the graph\n",
@@ -260,14 +297,26 @@
     "            raise Warning(\"Ignoring bond order\" + order)\n",
     "        adj[u, v, order] = 1\n",
     "        adj[v, u, order] = 1\n",
-    "    return nodes, adj"
+    "    pos = mol2coords(m)\n",
+    "    aug_adj = np.zeros((N, N))\n",
+    "    for i in range(adj.shape[0]):\n",
+    "        for j in range(adj.shape[0]):\n",
+    "            if i != j and np.linalg.norm(np.array(pos[i]) - np.array(pos[j])) < 2:\n",
+    "                aug_adj[i, j] = 1\n",
+    "                aug_adj[j, i] = 1\n",
+    "    aug_adj[np.diag_indices(N)] = 0\n",
+    "    return nodes, adj, aug_adj, pos"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "6dc9231b",
-   "metadata": {},
+   "metadata": {
+    "tags": [
+     "remove-cell"
+    ]
+   },
    "outputs": [],
    "source": [
     "# THIS CELL IS USED TO GENERATE A FIGURE\n",
@@ -278,19 +327,37 @@
     "from moviepy.video.io.bindings import mplfig_to_npimage\n",
     "\n",
     "\n",
-    "def draw_vector(x, y, s, v, ax, cmap, **kwargs):\n",
-    "    x += s / 2\n",
-    "    y += s / 2\n",
+    "def draw_vector(x, y, s, v, color_i, draw_i, ax, cmap, **kwargs):\n",
+    "    x -= s / 2\n",
     "    # iterate vertically\n",
     "    L = max([len(vi) for vi in v])\n",
+    "    y += L * s - s * 2 / 4\n",
     "    for dy, vi in enumerate(v):\n",
+    "        ax.text(\n",
+    "            x,\n",
+    "            y - s / 2 - dy * s,\n",
+    "            \"l={}\".format(dy),\n",
+    "            verticalalignment=\"center\",\n",
+    "            horizontalalignment=\"right\",\n",
+    "            fontsize=7,\n",
+    "        )\n",
+    "        if dy == 0:\n",
+    "            for dx, vij in enumerate(v[-1]):\n",
+    "                ax.text(\n",
+    "                    x + dx * s * 1.5 / L + s * 1.5 / L / 2,\n",
+    "                    y - dy * s,\n",
+    "                    \"k={}\".format(dx),\n",
+    "                    verticalalignment=\"bottom\",\n",
+    "                    horizontalalignment=\"center\",\n",
+    "                    fontsize=7,\n",
+    "                )\n",
     "        for dx, vij in enumerate(vi):\n",
-    "            if cmap is not None:\n",
+    "            if cmap is not None and (color_i is None or dy in color_i):\n",
     "                ax.add_patch(\n",
     "                    mpl.patches.Rectangle(\n",
-    "                        (x + dx * s * 1.5 / L, y + dy * s),\n",
+    "                        (x + dx * s * 1.5 / L, y - dy * s),\n",
     "                        s * 1.5 / L,\n",
-    "                        s,\n",
+    "                        -s,\n",
     "                        facecolor=cmap(vij),\n",
     "                        **kwargs,\n",
     "                    )\n",
@@ -298,60 +365,50 @@
     "            else:\n",
     "                ax.add_patch(\n",
     "                    mpl.patches.Rectangle(\n",
-    "                        (x + dx * s * 1.5 / L, y + dy * s),\n",
+    "                        (x + dx * s * 1.5 / L, y - dy * s),\n",
     "                        s * 1.5 / L,\n",
-    "                        s,\n",
+    "                        -s,\n",
     "                        facecolor=\"#FFF\",\n",
     "                        edgecolor=\"#333\",\n",
     "                        **kwargs,\n",
     "                    )\n",
     "                )\n",
-    "            ax.text(\n",
-    "                x + dx * s * 1.5 / L + s * 1.5 / L / 2,\n",
-    "                y + s / 2 + dy * s,\n",
-    "                \"{:.2f}\".format(vij),\n",
-    "                verticalalignment=\"center\",\n",
-    "                horizontalalignment=\"center\",\n",
-    "                fontsize=5,\n",
-    "            )\n",
-    "\n",
-    "\n",
-    "def draw_key(x, y, s, v, ax, cmap, **kwargs):\n",
-    "    x += s / 2\n",
-    "    y += s / 2\n",
-    "    for vi in v:\n",
-    "        ax.add_patch(\n",
-    "            mpl.patches.Rectangle((x, y), s * 1.5, s, facecolor=cmap(1.0), **kwargs)\n",
-    "        )\n",
-    "        ax.text(\n",
-    "            x + s * 1.5 / 2,\n",
-    "            y + s / 2,\n",
-    "            vi,\n",
-    "            verticalalignment=\"center\",\n",
-    "            horizontalalignment=\"center\",\n",
-    "        )\n",
-    "        y += s\n",
-    "    ax.text(\n",
-    "        x, y + s / 2, \"Key:\", verticalalignment=\"center\", horizontalalignment=\"left\"\n",
-    "    )\n",
+    "            if draw_i is None or dy in draw_i:\n",
+    "                ax.text(\n",
+    "                    x + dx * s * 1.5 / L + s * 1.5 / L / 2,\n",
+    "                    y - s / 2 - dy * s,\n",
+    "                    \"{:.2}\".format(vij),\n",
+    "                    verticalalignment=\"center\",\n",
+    "                    horizontalalignment=\"center\",\n",
+    "                    fontsize=8,\n",
+    "                )\n",
     "\n",
     "\n",
     "def draw(\n",
     "    nodes,\n",
     "    node_features,\n",
     "    adj,\n",
+    "    aug_adj,\n",
+    "    pos,\n",
     "    ax,\n",
     "    highlight=None,\n",
     "    key=False,\n",
     "    labels=None,\n",
     "    mask=None,\n",
+    "    draw_vector_mask=None,\n",
+    "    color_vector_mask=None,\n",
     "    draw_nodes=None,\n",
     "):\n",
     "    G = nx.Graph()\n",
+    "    # now we add all edges to close (using pos) atoms to emphasize spatial locality\n",
     "    for i in range(adj.shape[0]):\n",
     "        for j in range(adj.shape[0]):\n",
-    "            if np.any(adj[i, j]):\n",
+    "            if i != j and np.any(aug_adj):\n",
     "                G.add_edge(i, j)\n",
+    "    for i in range(adj.shape[0]):\n",
+    "        for j in range(adj.shape[0]):\n",
+    "            if np.any(adj[i, j]):\n",
+    "                G.add_edge(i, j, bond=True)\n",
     "    if mask is None:\n",
     "        mask = [True] * len(G)\n",
     "    if draw_nodes is None:\n",
@@ -359,10 +416,6 @@
     "    # go from atomic number to element\n",
     "    elements = np.argmax(draw_nodes, axis=-1)\n",
     "    el_labels = {i: list(my_elements.values())[e] for i, e in enumerate(elements)}\n",
-    "    try:\n",
-    "        pos = nx.nx_agraph.graphviz_layout(G, prog=\"sfdp\")\n",
-    "    except ImportError:\n",
-    "        pos = nx.spring_layout(G, iterations=100, seed=4, k=1)\n",
     "    pos = nx.rescale_layout_dict(pos)\n",
     "    c = [\"white\"] * len(G)\n",
     "    all_h = []\n",
@@ -371,20 +424,11 @@
     "            for hj in h:\n",
     "                c[hj] = \"C{}\".format(i + 1)\n",
     "                all_h.append(hj)\n",
-    "    # now we add all edges to close (using pos) atoms to emphasize spatial locality\n",
-    "    for i in range(adj.shape[0]):\n",
-    "        for j in range(adj.shape[0]):\n",
-    "            if (\n",
-    "                i != j\n",
-    "                and not np.any(adj[i, j])\n",
-    "                and np.linalg.norm(np.array(pos[i]) - np.array(pos[j])) < 1\n",
-    "            ):\n",
-    "                G.add_edge(i, j, space=True)\n",
     "    # set-up edge colors based on if they are space or not\n",
-    "    edge_colors = [\"#000\"] * len(G.edges)\n",
+    "    edge_colors = [\"#00000033\"] * len(G.edges)\n",
     "    for i, (u, v, d) in enumerate(G.edges(data=True)):\n",
-    "        if d.get(\"space\", False):\n",
-    "            edge_colors[i] = \"#AAA\"\n",
+    "        if d.get(\"bond\"):\n",
+    "            edge_colors[i] = \"#000\"\n",
     "    nx.draw(\n",
     "        G,\n",
     "        ax=ax,\n",
@@ -399,33 +443,37 @@
     "        if not mask[i]:\n",
     "            continue\n",
     "        if i in all_h:\n",
-    "            draw_vector(*pos[i], 0.15, node_features[i], ax, cmap)\n",
+    "            draw_vector(\n",
+    "                *pos[i],\n",
+    "                0.25,\n",
+    "                node_features[i],\n",
+    "                color_vector_mask,\n",
+    "                draw_vector_mask,\n",
+    "                ax,\n",
+    "                cmap,\n",
+    "            )\n",
     "        else:\n",
-    "            draw_vector(*pos[i], 0.15, node_features[i], ax, None)\n",
-    "    if key:\n",
-    "        draw_key(-1, -1, 0.15, my_elements.values(), ax, cmap)\n",
-    "    if labels is not None:\n",
-    "        legend_elements = []\n",
-    "        for i, l in enumerate(labels):\n",
-    "            p = mpl.lines.Line2D(\n",
-    "                [0], [0], marker=\"o\", color=\"C{}\".format(i + 1), label=l, markersize=15\n",
+    "            draw_vector(\n",
+    "                *pos[i], 0.25, node_features[i], color_vector_mask, None, ax, None\n",
     "            )\n",
-    "            legend_elements.append(p)\n",
-    "        ax.legend(handles=legend_elements)\n",
     "    ax.set_xlim(-1.2, 1.2)\n",
     "    ax.set_ylim(-1.2, 1.2)\n",
     "    ax.set_facecolor(\"#f5f4e9\")\n",
     "\n",
     "\n",
-    "nodes, adj = smiles2graph(\"CO\")\n",
-    "print(nodes)\n",
-    "nodes_vectors = [[n, n[:2]] for i, n in enumerate(nodes)]\n",
-    "print(nodes_vectors[1])\n",
+    "nodes, adj, aug_adj, pos = smiles2graph(\"CO\")\n",
+    "# create plausible starting vectors\n",
+    "node_vectors = [[[n[1] @ [-1, 0.5, 0.3]]] for n in enumerate(nodes)]\n",
+    "for i in pos:\n",
+    "    p = pos[i]\n",
+    "    node_vectors[i].append(p + [p[0] - p[1]])\n",
     "fig = plt.figure(figsize=(8, 5))\n",
     "draw(\n",
     "    nodes,\n",
-    "    nodes_vectors,\n",
+    "    node_vectors,\n",
     "    adj,\n",
+    "    aug_adj,\n",
+    "    pos,\n",
     "    plt.gca(),\n",
     "    highlight=[[1], [5, 0]],\n",
     "    labels=[\"center\", \"neighbors\"],\n",
@@ -438,7 +486,11 @@
    "cell_type": "code",
    "execution_count": null,
    "id": "fd8e064f",
-   "metadata": {},
+   "metadata": {
+    "tags": [
+     "remove-cell"
+    ]
+   },
    "outputs": [],
    "source": [
     "glue(\"dframe\", plt.gcf(), display=False)\n",
@@ -446,52 +498,64 @@
     "# THIS CELL IS USED TO GENERATE A FIGURE\n",
     "# AND NOT RELATED TO CHAPTER\n",
     "# YOU CAN SKIP IT\n",
-    "fig, axs = plt.subplots(1, 2, squeeze=True, figsize=(14, 6), dpi=100)\n",
+    "fig, axs = plt.subplots(1, 2, squeeze=True, figsize=(14, 7.5), dpi=100)\n",
     "order = [5, 1, 0, 2, 3, 4]\n",
-    "time_per_node = 2\n",
+    "time_per_node = 4\n",
     "last_layer = [0]\n",
     "layers = 2\n",
     "input_nodes = np.copy(nodes)\n",
+    "\n",
     "fig.set_facecolor(\"#f5f4e9\")\n",
     "\n",
     "\n",
     "def make_frame(t):\n",
     "    axs[0].clear()\n",
     "    axs[1].clear()\n",
-    "\n",
     "    layer_i = int(t / (time_per_node * len(order)))\n",
-    "    axs[0].set_title(f\"Layer {layer_i + 1} Input\")\n",
-    "    axs[1].set_title(f\"Layer {layer_i + 1} Output\")\n",
-    "\n",
-    "    flat_adj = np.sum(adj, axis=-1)\n",
-    "    #     out_nodes = np.einsum(\n",
-    "    #         \"i,ij,jk->ik\",\n",
-    "    #         1 / (np.sum(flat_adj, axis=1) + 1),\n",
-    "    #         flat_adj + np.eye(*flat_adj.shape),\n",
-    "    #         nodes,\n",
-    "    #     )\n",
-    "    out_nodes = nodes\n",
-    "    node_vectors = [[n, n[:2]] for i, n in enumerate(out_nodes)]\n",
+    "\n",
+    "    # I think I need to loop over k/l, not node\n",
+    "    out_node_vectors = np.zeros_like(node_vectors)\n",
+    "    for i in range(len(node_vectors[0])):\n",
+    "        M = len(node_vectors[0][i])\n",
+    "        vin = np.array([n[i] + [0] * (M - i - 2) for n in node_vectors])\n",
+    "        vout = np.einsum(\n",
+    "            \"i,ij,jk->ik\",\n",
+    "            1 / (np.sum(aug_adj, axis=1) + 1),\n",
+    "            aug_adj + np.eye(*aug_adj.shape),\n",
+    "            vin,\n",
+    "        )\n",
+    "        for j, vj in enumerate(out_node_vectors):\n",
+    "            vj[i] = vout[j]\n",
     "    if last_layer[0] != layer_i:\n",
     "        print(\"recomputing\")\n",
-    "        nodes[:] = out_nodes\n",
+    "        for nv, onv in zip(node_vectors, out_node_vectors):\n",
+    "            onv[:] = nv\n",
     "        last_layer[0] = layer_i\n",
     "\n",
     "    t -= layer_i * time_per_node * len(order)\n",
     "    i = order[int(t / time_per_node)]\n",
-    "    print(last_layer, layer_i, i, t)\n",
     "    mask = [False] * nodes.shape[0]\n",
     "    for j in order[: int(t / time_per_node) + 1]:\n",
     "        mask[j] = True\n",
-    "    print(mask, i)\n",
-    "    neighs = list(np.where(adj[i])[0])\n",
-    "    if (t - int(t / time_per_node) * time_per_node) >= time_per_node / 4:\n",
+    "    neighs = list(np.where(aug_adj[i])[0])\n",
+    "    # fraction of how far along we are\n",
+    "    dt = (t - int(t / time_per_node) * time_per_node) / time_per_node\n",
+    "    color_vector_mask = list(range(0, max(0, int((dt - 0.25) / 0.25))))\n",
+    "    stage = \"\"\n",
+    "    if color_vector_mask:\n",
+    "        stage = f\"Irreps {color_vector_mask[-1]}\"\n",
+    "    if dt >= 0.25:\n",
+    "        if not stage:\n",
+    "            stage = \"Find Neighbors\"\n",
     "        draw(\n",
     "            nodes,\n",
     "            node_vectors,\n",
     "            adj,\n",
+    "            aug_adj,\n",
+    "            pos,\n",
     "            axs[0],\n",
     "            highlight=[[i], neighs],\n",
+    "            color_vector_mask=color_vector_mask,\n",
     "            labels=[\"center\", \"neighbors\"],\n",
     "            draw_nodes=input_nodes,\n",
     "        )\n",
@@ -500,24 +564,33 @@
     "            nodes,\n",
     "            node_vectors,\n",
     "            adj,\n",
+    "            aug_adj,\n",
+    "            pos,\n",
     "            axs[0],\n",
     "            highlight=[[i]],\n",
     "            labels=[\"center\", \"neighbors\"],\n",
     "            draw_nodes=input_nodes,\n",
     "        )\n",
-    "    if (t - int(t / time_per_node) * time_per_node) < time_per_node / 2:\n",
+    "    if dt < 0.5:\n",
     "        mask[j] = False\n",
     "    draw(\n",
-    "        out_nodes,\n",
-    "        node_vectors,\n",
+    "        nodes,\n",
+    "        out_node_vectors,\n",
     "        adj,\n",
+    "        aug_adj,\n",
+    "        pos,\n",
     "        axs[1],\n",
     "        highlight=[[i]],\n",
-    "        key=True,\n",
     "        mask=mask,\n",
+    "        color_vector_mask=color_vector_mask,\n",
+    "        draw_vector_mask=color_vector_mask,\n",
     "        draw_nodes=input_nodes,\n",
     "    )\n",
+    "    axs[0].set_title(f\"Layer {layer_i + 1} Input\\n{stage}\")\n",
+    "    axs[1].set_title(f\"Layer {layer_i + 1} Output\\n{stage}\")\n",
     "    fig.set_facecolor(\"#f5f4e9\")\n",
+    "    if t == 0:\n",
+    "        plt.tight_layout()\n",
     "    return mplfig_to_npimage(fig)\n",
     "\n",
     "\n",
@@ -537,7 +610,7 @@
  "metadata": {
   "celltoolbar": "Tags",
   "kernelspec": {
-   "display_name": "Python 3.7.8 64-bit",
+   "display_name": "Python 3",
    "language": "python",
    "name": "python3"
   },
@@ -551,7 +624,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.8"
+   "version": "3.8.12"
   },
   "vscode": {
    "interpreter": {

From 814dc8cdf1c065739f564c8d49c8acd05ddb7226 Mon Sep 17 00:00:00 2001
From: Andrew White <white.d.andrew@gmail.com>
Date: Tue, 20 Dec 2022 07:11:02 -0500
Subject: [PATCH 13/15] Fixed np type error

---
 dl/molnets.ipynb    | 2 +-
 ml/regression.ipynb | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dl/molnets.ipynb b/dl/molnets.ipynb
index 8e0917a1..e5599cea 100644
--- a/dl/molnets.ipynb
+++ b/dl/molnets.ipynb
@@ -155,7 +155,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1d86d7b1",
+   "id": "e6890800",
    "metadata": {},
    "source": [
     "```{glue:figure} dframe\n",
diff --git a/ml/regression.ipynb b/ml/regression.ipynb
index e0627cc5..90025fde 100644
--- a/ml/regression.ipynb
+++ b/ml/regression.ipynb
@@ -966,7 +966,7 @@
     "    indices = np.array(\n",
     "        [np.random.choice(range(N), size=N // 2, replace=False) for _ in range(L)]\n",
     "    )\n",
-    "    test_indices = np.empty((L, N // 2), dtype=np.int)\n",
+    "    test_indices = np.empty((L, N // 2), dtype=int)\n",
     "    for i in range(L):\n",
     "        test_indices[i, :] = list(set(range(N)) - set(indices[i]))\n",
     "        nan_test_x[i, test_indices[i]] = features[test_indices[i]]\n",
@@ -1702,7 +1702,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.5"
+   "version": "3.8.12"
   }
  },
  "nbformat": 4,

From d9c53980ad0d093d42e54cd698494ae34ba12bd6 Mon Sep 17 00:00:00 2001
From: Andrew White <white.d.andrew@gmail.com>
Date: Tue, 20 Dec 2022 07:18:41 -0500
Subject: [PATCH 14/15] Will keep_files now

---
 .github/workflows/deploy-jupyter-book-preview.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/deploy-jupyter-book-preview.yml b/.github/workflows/deploy-jupyter-book-preview.yml
index f3849e8e..53bf5a0d 100644
--- a/.github/workflows/deploy-jupyter-book-preview.yml
+++ b/.github/workflows/deploy-jupyter-book-preview.yml
@@ -41,7 +41,7 @@ jobs:
       with:
         github_token: ${{ secrets.GITHUB_TOKEN }}
         publish_dir: _build/html
-        force_orphan: true
-        #keep_files: true
+        #force_orphan: true
+        keep_files: true
         destination_dir: ${{ steps.myref.outputs.branch }}
         cname: dmol.pub

From a929dc73ab7854a79e7df7cae34bc305dd4e5fca Mon Sep 17 00:00:00 2001
From: Andrew White <white.d.andrew@gmail.com>
Date: Fri, 23 Dec 2022 13:02:48 -0500
Subject: [PATCH 15/15] Added notes to animation and slowed it down

---
 dl/molnets.ipynb | 139 +++++++++++++++++++++++------------------------
 1 file changed, 67 insertions(+), 72 deletions(-)

diff --git a/dl/molnets.ipynb b/dl/molnets.ipynb
index e5599cea..aa215749 100644
--- a/dl/molnets.ipynb
+++ b/dl/molnets.ipynb
@@ -28,6 +28,48 @@
     "```"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "b083f55a",
+   "metadata": {},
+   "source": [
+    "## Running This Notebook\n",
+    "\n",
+    "\n",
+    "Click the &nbsp;<i aria-label=\"Launch interactive content\" class=\"fas fa-rocket\"></i>&nbsp; above to launch this page as an interactive Google Colab. See details below on installing packages.\n",
+    "\n",
+    "````{tip} My title\n",
+    ":class: dropdown\n",
+    "To install packages, execute this code in a new cell. \n",
+    "\n",
+    "```\n",
+    "!pip install dmol-book\n",
+    "```\n",
+    "\n",
+    "If you find install problems, you can get the latest working versions of packages used in [this book here](https://github.com/whitead/dmol-book/blob/main/package/setup.py)\n",
+    "\n",
+    "````"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7bf44568",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import matplotlib as mpl\n",
+    "import numpy as np\n",
+    "import tensorflow as tf\n",
+    "import pandas as pd\n",
+    "import rdkit, rdkit.Chem, rdkit.Chem.rdDepictor, rdkit.Chem.Draw\n",
+    "import networkx as nx\n",
+    "import dmol\n",
+    "\n",
+    "my_elements = {6: \"C\", 8: \"O\", 1: \"H\"}"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -100,15 +142,15 @@
    "id": "68b5d551",
    "metadata": {},
    "source": [
-    "These can be distinguished if we also have (and use) their Cartesian coordinates. We cannot distinguish enantiomers with GNNs, except maybe with pre-computed node attributes. Even those start to breakdown when we have helical chirality that is not centered at any one molecule.\n",
+    "These can be distinguished if we also have (and use) their Cartesian coordinates. We cannot distinguish enantiomers with GNNs, except maybe with pre-computed node attributes. Even those start to breakdown when we have helical chirality that is not centered at any one atom.\n",
     "\n",
-    "These are arguments for using Cartesian coordinates in addition to a GNN, but why use equivariant neural networks? Most molnet research is for **neural potentials**. These are neural networks that predict energy and forces given atom positions and elements. We know that the force on each atom is given by\n",
+    "These are arguments for using Cartesian coordinates in addition to a GNN, but why use equivariant neural networks as opposed to the simpler invariant networks discussed in {doc}`data`? Most molnet research is for **neural potentials**. These are neural networks that predict energy and forces given atom positions and elements. We know that the force on each atom is given by\n",
     "\n",
     "\\begin{equation}\n",
     "F\\left(\\vec{r}\\right) = -\\nabla U\\left(\\vec{r}\\right)\n",
     "\\end{equation}\n",
     "\n",
-    "where $U\\left(\\vec{x}\\right)$ is the rotation invariant potential given all atom positions $\\vec{r}$. So if we're predicting a translation, rotation, and permutation invariant potential, why use equivariance? Part of it is performance. Models like SchNet or ANI are invariant and are not as accurate as models like NequiP or TorchMD-NET that have equivariances in their internal layers. Another reason is that there are indeed specific 3D configurations that should have different energies (according to quantum chemistry calculations), but are invariant if treatd with pairwise distance alone {cite}`pozdnyakov2022incompleteness`."
+    "where $U\\left(\\vec{x}\\right)$ is the rotation invariant potential given all atom positions $\\vec{r}$. So if we're predicting a translation, rotation, and permutation invariant potential, why use equivariance? Part of it is performance. Models like SchNet or ANI are invariant and are not as accurate as models like NequiP or TorchMD-NET that have equivariances in their internal layers. Another reason is that there are indeed specific 3D configurations that should have different energies (according to quantum chemistry calculations), but are invariant if treated with pairwise distance alone {cite}`pozdnyakov2022incompleteness`."
    ]
   },
   {
@@ -162,11 +204,11 @@
     "----\n",
     "name: dframe\n",
     "----\n",
-    "Intermediate step of the graph convolution layer. The 3D vectors are the node features and start as one-hot, so a `[1.00, 0.00, 0.00]` means hydrogen. The center node will be updated by averaging its neighbors features.\n",
+    "Intermediate step of typical modern molecular neural network. The atom features are encoded as spherical harmonics/fragments and they are grow in dimension as we increase the number of them. The network updates just like a message passing graph neural network, except that we only do mixing among the same fragments following the SO(3) equivariant network approach described in {doc}`Equivariant`. There often is a Clebsch-Gordon non-linearity on the aggregated messages to mix between channels, although that is not shown here.\n",
     "```\n",
     "\n",
     "\n",
-    "To help understand the GCN layer, look at {numref}`dframe`. It shows an intermediate step of the GCN layer. Each node feature is represented here as a one-hot encoded vector at input. The animation in {numref}`gcnanim` shows the averaging process over neighbor features.  To make this animation easy to follow, the trainable weights and activation functions are not considered. Note that the animation repeats for a second layer. Watch how the \"information\" about there being an oxygen atom in the molecule is propagated only after two layers to each atom. All GNNs operate with similar approaches, so try to understand how this animation works. \n",
+    "To help understand the GCN layer, look at {numref}`dframe`. It shows an intermediate step of the GCN layer. Each node feature is represented here as a one-hot encoded vector at input. The animation in {numref}`gcnanim` shows the averaging\n",
     "\n",
     "\n",
     "\n",
@@ -175,7 +217,7 @@
     "name: gcnanim\n",
     "alt: animation of the graph convolution layer operation\n",
     "----\n",
-    "Animation of the graph convolution layer operation. The left is input, right is output node features. Note that two layers are shown (see title change). As the animation plays out, you can see how the information about the atoms propagates through the molecule via the averaging over neighbors. So the oxygen goes from being just an oxygen, to an oxygen bonded to C and H, to an oxygen bonded to an H and CH3. The colors just reflect the same information in the numerical values.\n",
+    "Animation of the graph convolution layer operation. The left is input, right is output node features. Note that two layers are shown (see title change). As the animation plays out, you can see how the information about the atoms propagates through the molecule via the averaging over neighbors. Note how each fragment set is updated independently to maintain equivariance. \n",
     "```"
    ]
   },
@@ -191,72 +233,22 @@
     "* Layers"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "b083f55a",
-   "metadata": {},
-   "source": [
-    "## Running This Notebook\n",
-    "\n",
-    "\n",
-    "Click the &nbsp;<i aria-label=\"Launch interactive content\" class=\"fas fa-rocket\"></i>&nbsp; above to launch this page as an interactive Google Colab. See details below on installing packages.\n",
-    "\n",
-    "````{tip} My title\n",
-    ":class: dropdown\n",
-    "To install packages, execute this code in a new cell. \n",
-    "\n",
-    "```\n",
-    "!pip install dmol-book\n",
-    "```\n",
-    "\n",
-    "If you find install problems, you can get the latest working versions of packages used in [this book here](https://github.com/whitead/dmol-book/blob/main/package/setup.py)\n",
-    "\n",
-    "````"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6bb5fcb7",
-   "metadata": {},
-   "source": [
-    "## Cited References\n",
-    "\n",
-    "```{bibliography}\n",
-    ":style: unsrtalpha\n",
-    ":filter: docname in docnames\n",
-    "```"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7bf44568",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import matplotlib.pyplot as plt\n",
-    "import matplotlib as mpl\n",
-    "import numpy as np\n",
-    "import tensorflow as tf\n",
-    "import pandas as pd\n",
-    "import rdkit, rdkit.Chem, rdkit.Chem.rdDepictor, rdkit.Chem.Draw\n",
-    "import networkx as nx\n",
-    "import dmol\n",
-    "\n",
-    "my_elements = {6: \"C\", 8: \"O\", 1: \"H\"}"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": null,
    "id": "0f525d6f",
    "metadata": {
     "tags": [
-     "hide-cell"
+     "remove-cell"
     ]
    },
    "outputs": [],
    "source": [
+    "# THIS CELL IS USED TO GENERATE A FIGURE\n",
+    "# AND NOT RELATED TO CHAPTER\n",
+    "# YOU CAN SKIP IT\n",
+    "\n",
+    "\n",
     "def mol2coords(mol):\n",
     "    drawer = rdkit.Chem.Draw.rdMolDraw2D.MolDraw2DCairo(-1, -1)\n",
     "    drawer.drawOptions().scalingFactor = 1.0  # 1 pixel per angstrom\n",
@@ -493,14 +485,12 @@
    },
    "outputs": [],
    "source": [
-    "glue(\"dframe\", plt.gcf(), display=False)\n",
-    "\n",
     "# THIS CELL IS USED TO GENERATE A FIGURE\n",
     "# AND NOT RELATED TO CHAPTER\n",
     "# YOU CAN SKIP IT\n",
     "fig, axs = plt.subplots(1, 2, squeeze=True, figsize=(14, 7.5), dpi=100)\n",
     "order = [5, 1, 0, 2, 3, 4]\n",
-    "time_per_node = 4\n",
+    "time_per_node = 8\n",
     "last_layer = [0]\n",
     "layers = 2\n",
     "input_nodes = np.copy(nodes)\n",
@@ -543,7 +533,7 @@
     "    color_vector_mask = list(range(0, max(0, int((dt - 0.25) / 0.25))))\n",
     "    stage = \"\"\n",
     "    if color_vector_mask:\n",
-    "        stage = f\"Irreps {color_vector_mask[-1]}\"\n",
+    "        stage = f\"Irreps Fragments {color_vector_mask[-1]}\"\n",
     "    if dt >= 0.25:\n",
     "        if not stage:\n",
     "            stage = \"Find Neighbors\"\n",
@@ -595,16 +585,21 @@
     "\n",
     "\n",
     "animation = VideoClip(make_frame, duration=time_per_node * nodes.shape[0] * layers)\n",
-    "animation.write_gif(\"../_static/images/molnet.gif\", fps=2)"
+    "animation.write_gif(\"../_static/images/molnet.gif\", fps=1)"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "2e7cf098",
+   "cell_type": "markdown",
+   "id": "6bb5fcb7",
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "source": [
+    "## Cited References\n",
+    "\n",
+    "```{bibliography}\n",
+    ":style: unsrtalpha\n",
+    ":filter: docname in docnames\n",
+    "```"
+   ]
   }
  ],
  "metadata": {