diff --git a/index.html b/index.html
index 5962a1fd..8f7570cf 100644
--- a/index.html
+++ b/index.html
@@ -616,7 +616,10 @@ <h1>Table of Contents</h1>
     <li><a href=sysid.html#lumped>Parameter Identification for Mechanical Systems</a></li>
     <ul>
       <li>Kinematic parameters and calibration</li>
-      <li>Least-squares formulation (of the inverse dynamics).</li>
+      <li>Estimating inertial parameters (and
+    friction)</li>
+      <li>Simultaneous kinematic and inertial identification via
+    lumped parameters.</li>
       <li>Identification using energy instead of inverse
     dynamics.</li>
       <li>Residual physics models with linear function
diff --git a/sysid.html b/sysid.html
index e7c2fcef..d613fce0 100644
--- a/sysid.html
+++ b/sysid.html
@@ -112,10 +112,10 @@ <h1><a href="index.html" style="text-decoration:none;">Underactuated Robotics</a
     least-squares objective to capture the errors; more generally we will look
     at fitting stochastic models to the data.</p>
 
-    <p>We often separate the identification procedure into two parts, were we
-    first estimate the state $\hat{\bx}_n$ given the input-output data $\bu_n,
-    \by_n$, and then focus on estimating the state-evolution dynamics in a
-    second step.  The dynamics estimation algorithms fall into two main
+    <p>We often separate the identification procedure into two parts, in which
+    we first estimate the state $\hat{\bx}_n$ given the input-output data
+    $\bu_n, \by_n$, and then focus on estimating the state-evolution dynamics
+    in a second step.  The dynamics estimation algorithms fall into two main
     categories: <ul><li><i>Equation error</i> minimizes only the one-step
     prediction error: $$\min_{\alpha} \sum_{n=0}^{N-2} \| f_\balpha(\hat\bx_n,
     \bu_n) - \hat{\bx}_{n+1} \|^2_2.$$  </li><li><i>Simulation error</i>
@@ -257,13 +257,12 @@ <h1><a href="index.html" style="text-decoration:none;">Underactuated Robotics</a
       offsets.  This one can be a real nuisance in practice.  Joint sensors can
       slip, and some robots even use relative rotary encoders, and rely on
       driving the joint to some known hard joint limit each time the robot is
-      powered on in order to obtain the offset.  I've worked on one humanoid
+      powered on in order to obtain the offset.  I've worked on one advanced
       robot that had a quite elaborate and painful kinematic calibration
       procedure which involve fitting additional hardware over the joints to
       ensure they were in a known location and then running a script.  Having a
       an expensive and/or unreliable calibration procedure can put a damper on
-      any robotics project.  For underactuated systems, in particular, it can
-      have a dramatic effect on performance.</p>
+      any robotics project.</p>
 
       <example><h1>Acrobot balancing with calibration error</h1>
 
@@ -303,20 +302,160 @@ <h1><a href="index.html" style="text-decoration:none;">Underactuated Robotics</a
 
     </subsection>
 
-    <subsection><h1>Least-squares formulation (of the inverse dynamics).</h1>
+    <subsection><h1 id="inertial">Estimating inertial parameters (and
+    friction)</h1>
 
-      <p>Now let's thinking about estimating the dynamic parameters of
+      <p>Now let's thinking about estimating the inertial parameters of
       multibody system.  We've been writing the manipulation equations in the
       form: \begin{equation}\bM({\bq})\ddot{\bq} + \bC(\bq,\dot{\bq})\dot\bq =
       \btau_g(\bq) + \bB\bu + \text{friction, etc.}\end{equation}  Each of the
       terms in this equation can depend on the parameters $\balpha$ that we're
       trying to estimate.  But the parameters enter the multibody equations in
-      a particular structured way: the equations are
-      <i>affine in the <b>lumped parameters</b></i>.  More precisely, the
-      manipulator equations above can be factored into the form $${\bf
-      W}(\bq,\dot{\bq}, \ddot{\bq}, \bu) \balpha_l(\balpha) +
-      \bw_0(\bq,\dot{\bq}, \ddot{\bq}, \bu) = 0,$$ where $\balpha_l$ are the
-      "lumped parameters".  We sometimes refer to ${\bf W}$ as the "data matrix".</p>
+      a particular structured way: <i>the <b>inverse</b> dynamics equations are
+      affine in the inertial parameters</i> <elib>Atkeson86</elib>. Notably,
+      the forward dynamics (solving for $\ddot{\bq}$) involves a ${\bf
+      M}^{-1}(q)$; these dynamics are <i>not</i> affine in the parameters.
+      Generic system identification approaches which treat the multibody
+      dynamics as $\bx[n+1] = f(\bx[n], \bu[n])$ will not be leveraging this
+      convexity directly.</p>
+
+      <p>To proceed with the inverse dynamics formulation, let us first choose
+      a parameterization for the inertias. <elib>Wensing17a</elib> shows that
+      the natural parameterization of the inertial terms for a particular rigid
+      body, $B$, are in terms of the <i>density-weighted moment matrices</i>:
+      \begin{gather*} m_B &=& \iiint_{V_B} \rho(\bx)dV \in \Re \\ {\bf h}_B &=&
+      \iiint_{V_B} \bx \rho(\bx) dV \in \Re^3 \\ {\bf \Sigma}_B &=&
+      \iiint_{V_B} \bx \bx^T \rho(\bx)dV \in S_3, \end{gather*} where $V_B$ is
+      the volume of the geometry in body $B$, the integral notation is for the
+      3D volume integral, and $S_3$ is the group of 3x3 (real) symmetric
+      matrices. $m_B$ is the total mass of the body, ${\bf h}_B$ is simply the
+      mass times the position of the center of mass, and ${\bf \Sigma}_B$ is
+      closely related to the more standard form of the inertia matrix. With the
+      integral using $x$ represented in the body frame, $B$, we can extract
+      ${}^B{\bf p}^{B_{com}} = \frac{1}{m}{\bf h}_B$ and ${\bf I}^{B/Bo} =
+      \tr({\bf \Sigma}_B)\mathbb{I}_{3 \times 3} - {\bf \Sigma}_B.$ Since ${\bf
+      \Sigma}_B$ is symmetric, we can summarize all of the inertial parameters
+      for body $B$ in 10 real numbers: $$\balpha_B = \begin{bmatrix} m_B, {\bf
+      h}_B^T, \mathrm{vec}_{\mathrm{sym}}({\bf \Sigma}_B) \end{bmatrix} \in
+      \Re^{10},$$ where $\mathrm{vec}_{\mathrm{sym}}$ takes the 6 upper
+      triangular elements of the 3x3 symmetric matrices into a vector.
+      Naturally, if we aim to estimate the parameters of many bodies
+      simultaneously, we can compose these into a single parameter vector
+      $\balpha.$</p>
+
+      <p>Remarkably, the inverse dynamics are affine in these parameters. This
+      means that the manipulator equations above can be refactored into the
+      form \begin{equation}{\bf W}(\bq_n,\dot{\bq}_n, \ddot{\bq}_n, \bu_n)
+      \balpha_B + \bw_0(\bq_n,\dot{\bq}_n, \ddot{\bq}_n, \bu_n) = 0.
+      \label{eq:least_squares}\end{equation}  We sometimes refer to the ${\bf
+      W}$ which is assembled by vertically stacking these equations evaluated
+      at all time samples, $n$, as the "data matrix". Given data trajectories
+      of positions, $\bq_n,$ and torque measurements, $\bu_n,$ we can estimate
+      $\dot{\bq}_n$ and $\ddot{\bq}_n$ with (acausal) filtering. Then we can
+      estimate the parameters using least-squares estimation. As such, we can
+      leverage all of the strong results and variants from linear estimation.
+      For instance, we can add terms to regularize the estimate (e.g. to stay
+      close to an initial guess), and we can write efficient recursive
+      estimators for optimal online estimation of the parameters using
+      recursive least-squares.  My favorite recursive least-squares algorithm
+      uses incremental QR factorization<elib>Kaess08</elib>.</p>
+        
+      <p>Not all values of $\alpha_B$ describe physically-valid inertias. For
+      instance, we know that $m_B$ has to be positive. There are constraints on
+      the inertia matrix itself. <elib>Wensing17a</elib> showed that these
+      constraints are exactly transcribed as positive-definiteness of the
+      "pseudo-inertia": $${\bf J}(\balpha_B) := \begin{bmatrix} {\bf \Sigma}_B
+      & {\bf h}_B \\ {\bf h}_B^T & m_B \end{bmatrix} \succ 0.$$ If we know the
+      geometry of the body, $B$, that we are estimating, then we can optionally
+      write additional convex constraints on these parameters. For instance, we
+      know that the center of mass must lie inside the convex hull of the
+      geometry: $${}^B{\bf p}^{B_{com}} \in \mathrm{conv}(V_B) \Rightarrow {\bf
+      h}_B \in m_B \, \mathrm{conv}(V_B).$$ This non-negative scaling of a
+      convex set is still a convex set (known as the "perspective" of the set).
+      <elib>Wensing17a</elib> also gives a convex constraint on the
+      second-moment matrix based on e.g. the minimum-volume Loewner-John
+      ellipsoid containing the known geometry.</p>
+
+      <p>Do these additional constraints matter?  In the limit of infinite
+      data, then one might hope that the least-squares estimator would obtain
+      the correct values without the constraints. But their are two foils to
+      this ideal. First, we often wish to operate in a relatively low-data
+      regime. Second, it is important to understand that <i>not all of the
+      inertial parameters will be identifiable.</i> For instance, if you
+      imagine a robotic arm bolted to a table where the first link is a
+      rotation about the $z$ axis. Then no amount of data obtained from this
+      robot will given any information about the moments of inertia of the
+      first link about the $x$ and $y$ axes. Naively including these parameters
+      in the estimation problem could not only lead to poor estimates of the
+      unidentifiable parameters, but can even render the physical-validity
+      constraints on the inertial parameters vacuous.</p>
+
+      <todo>example of non-identifiable parameters making the physical-validity
+      constraints vacuous.</todo>
+
+      <p>There are two general approaches to dealing with the unidentifiable
+      parameters. First, we can exclude them from the optimization. There are a
+      number of algorithms for symbolically extracting the identifiable
+      parameters, but it is more common (and likely just as effective) to
+      simply evaluate the null-space of a sufficiently rich data matrix, $W$,
+      to find the unidentifiable parameters by using, e.g., $\bR_1\alpha_l$
+      from the QR factorization: $${\bf W} = \begin{bmatrix} \bQ_1 & \bQ_2
+      \end{bmatrix} \begin{bmatrix} \bR_1 \\ 0 \end{bmatrix}, \quad \Rightarrow
+      \quad {\bf W}\balpha_B = \bQ_1 (\bR_1 \balpha_B) = \bQ_1
+      \bar{\balpha}_{B},$$ where $\bar{\balpha}_B$ are the identifiable
+      parameters. To implement the physical-validity constraints, we would need
+      to use estimates of the unidentifiable parameters.  Alternatively, one
+      can formulate the optimization directly on the full parameters and add a
+      regularization term which (ideally) operates in the nullspace of the
+      primary objective to drive the unidentifiable parameters towards a prior
+      estimate. <elib>Lee19</elib> gives a thorough treatment of this approach
+      and provides a convex formulation of a geometrically consistent
+      regularizer, which they call the <i>entropic divergence.</i></p>
+
+      <todo>I sent this note to Pat Wensing: "I feel that there is some
+      subtlety in combining the regularizer with the constrained optimization
+      formulation. Apologies if you addressed this and I missed it!  The
+      intuitive hope for the regularizer is roughly that it's operating in the
+      null space of the primary objective to handle the unidentifiable
+      parameters.  But having the constraints complicates that story quite a
+      bit, I think?  For instance, what's to stop the optimization from making
+      the triangle inequalities almost vacuous in the non-identifiable
+      parameters by paying only the small cost of the regularization
+      term?"</todo>
+
+      <p>The least-squares inverse-dynamics formulation also allows one to
+      simultaneously fit a number of other relevant dynamic parameters which
+      enter linearly into the equations. These include reflected inertial (as a
+      diagonal matrix times $\ddot{\bq}$), joint friction (as a constant torque
+      multiplied by the sign of $\dot{\bq}$), and damping (as a linear multiple
+      of $\dot{\bq}$).</p>
+
+      <p>Allow me to mention one subtle point about numerical scaling. All of
+      the equations in the least-squares inverse-dynamics formulation
+      (\ref{eq:least_squares}) have units of torque. But the torques used on
+      the base link of a robotic arm might be numerically much larger than the
+      torques corresponding to the more distal joints. <elib>Lee19</elib>
+      addresses this by solving the unconstrained least-squares problem once,
+      then taking the error covariance of this solution into the objective
+      function for the constrained least-squares optimization. This serves to
+      make the least-squares objective coordinate independent.</p>
+
+      <example><h1>Inertial identification for a Kuka iiwa</h1>
+      
+      </example>
+
+    </subsection>
+
+    <subsection><h1>Simultaneous kinematic and inertial identification via
+    lumped parameters.</h1>
+
+      <p>Although it is highly recommended to separate kinematic an inertial
+      estimation, some amount of joint estimation is possible when needed. We
+      know that the inverse dynamics are affine in the inertial parameters. But
+      it turns out the can also be written as affine in the "lumped
+      parameters": nonlinear combinations of inertial and kinematic which do
+      not depend on the data. I feel this is best explained with a few simple
+      examples.</p>
 
       <example><h1>Lumped parameters for the simple pendulum</h1>
 
@@ -335,18 +474,12 @@ <h1><a href="index.html" style="text-decoration:none;">Underactuated Robotics</a
       For instance, if I had terms in the equations of the form, $\sin(m
       \theta)$, then I would <i>not</i> be able to produce an affine
       decomposition separating $m$ from $\theta$.  Fortunately, that doesn't
-      happen in our mechanical systems <elib>Khalil04</elib>.  Furthermore,
-      this structure is particular to the <i>inverse dynamics</i>, as we have
+      happen in our mechanical systems <elib>Khalil04</elib>.  Again, this
+      structure is particular to the <i>inverse dynamics</i>, as we have
       written here.  If you were to write the forward dynamics, multiplying by
       $\bM^{-1}$ in order to solve for $\ddot{\bq}$, then once again you would
       destroy this affine structure.</p>
 
-      <p>This is super interesting!  It is tempting to thing about parameter
-      estimation for general dynamical systems in our standard state-space
-      form: $\bx[n+1] = f_\balpha(\bx[n], \bu[n]).$  But for multibody systems,
-      it seems that this would be the wrong thing to do, as it destroys this
-      beautiful affine structure.</p>
-
       <example><h1>Multibody parameters in <drake></drake></h1>
 
         <p>Very few robotics simulators have any way for you to access the
@@ -383,29 +516,12 @@ <h1><a href="index.html" style="text-decoration:none;">Underactuated Robotics</a
 
       <p>The existence of the lumped-parameter decomposition reveals that the
       <i>equation error</i> for lumped-parameter estimation, with the error
-      taken in the torque coordinates, can be solved using least squares.  As
-      such, we can leverage all of the strong results and variants from linear
-      estimation.  For instance, we can add terms to regularize the estimate
-      (e.g. to stay close to an initial guess), and we can write efficient
-      recursive estimators for optimal online estimation of the parameters
-      using recursive least-squares.  My favorite recursive least-squares
-      algorithm uses incremental QR factorization<elib>Kaess08</elib>.</p>
-
-      <p>Importantly, because we have reduced this to a least-squares problem,
-      we can also understand when it will <i>not</i> work.  In particular, it
-      is quite possible that some parameters cannot be estimated from any
-      amount of joint data taken on the robot.  As a simple example, consider a
-      robotic arm bolted to a table; the inertial parameters of the first link
-      of the robot will not be identifiable from any amount of joint data. Even
-      on the second link, only the inertia relative to the first joint axis
-      will be identifiable; the inertial parameters corresponding to the other
-      dimensions will not.  In our least-squares formulation, this is quite
-      easy to understand: we simply check the (column) rank of the data matrix,
-      ${\bf W}$.  In particular, we can extract the <b>identifiable lumped
-      parameters</b> by using, e.g., $\bR_1\alpha_l$ from the QR factorization:
-      $${\bf W} = \begin{bmatrix} \bQ_1 & \bQ_2 \end{bmatrix} \begin{bmatrix}
-      \bR_1 \\ 0 \end{bmatrix}, \quad \Rightarrow \quad {\bf W}\balpha_l =
-      \bQ_1 (\bR_1 \alpha_l).$$</p>
+      taken in the torque coordinates, can again be solved using least squares.
+      Importantly, because we have reduced this to a least-squares problem, we
+      can also understand when it will <i>not</i> work.  Once again, not all of
+      the lumped-parameters are identifiable, so we will typically need a
+      second step to operate on the identifiable lumped parameters or to add a
+      regularization term.</p>
 
       <example><h1>Parameter estimation for the Cart-Pole</h1>
 
@@ -438,25 +554,9 @@ <h1><a href="index.html" style="text-decoration:none;">Underactuated Robotics</a
       parameters are precisely the subset of the lumped parameters that we
       need.</p>
 
-      <p>For practical reasons, it might be convenient to take your estimates
-      of the lumped parameters, and try to back out the original parameters
-      (for instance, if you want to write them back out into a URDF file).  For
-      this, I would recommend a final post-processing step that e.g. finds the
-      parameters $\hat{\balpha}$ that are as close as possible (e.g. in the
-      least-squares sense) to your original guess for the parameters, subject
-      to the nonlinear constraint that $\bR_1 \balpha_l(\hat{\balpha})$ matches
-      the estimated identifiable lumped parameters.
-      </p>
-
-      <p>There are still a few subtleties worth considering, such as how we
-      parameterize the inertial matrices.  Direct estimation of the naive
-      parameterization, the six entries of a symmetric 3x3 matrix, can lead to
-      non-physical inertial matrices.  <elib>Wensing17a</elib>
-      describes a parameter estimation formulation that includes a convex
-      formulation of the physicality constraints between these parameters.</p>
-
     </subsection>
 
+
     <subsection><h1>Identification using energy instead of inverse
     dynamics.</h1>
 
@@ -481,13 +581,17 @@ <h1><a href="index.html" style="text-decoration:none;">Underactuated Robotics</a
       velocities, but that joint accelerations, often obtained by numerically
       differentiating <i>twice</i>, tend to be much more noisy. In some cases,
       it might be numerically better to apply finite differences to the total
-      energy of the system rather than to the individual joints <sup>&dagger;</sup>.
-      <sidenote><sup>&dagger;</sup> Sometimes these methods are written as the numerical
-      integration of the power input, rather than the differentiation of the
-      total energy, but the numerics should be no different.</sidenote>
+      energy of the system rather than to the individual joints
+      <sup>&dagger;</sup>.
+      <sidenote><sup>&dagger;</sup> Sometimes these methods are written as the
+      numerical integration of the power input, rather than the differentiation
+      of the total energy, but the numerics should be no different.</sidenote>
       <elib>Gautier96</elib> provides a study of various filtering formulations
       which leads to a recommendation in <elib>Gautier97</elib>
-      that the energy formulation tends to be numerically superior.</p>
+      that the energy formulation tends to be numerically superior. This
+      distinction is probably more important in online parameter estimation,
+      where the filters must be causal, as opposed to offline parameter
+      estimation where acausal filters can be used.</p>
 
     </subsection>
 
@@ -1402,10 +1506,10 @@ <h1><a href="index.html" style="text-decoration:none;">Underactuated Robotics</a
 , <span class="year">2004</span>.
 
 </li><br>
-<li id=Kaess08>
-<span class="author">M. Kaess and A. Ranganathan and F. Dellaert</span>, 
-<span class="title">"i{SAM}: Incremental Smoothing and Mapping"</span>, 
-<span class="publisher">IEEE Transactions on Robotics</span>, vol. 24, no. 6, pp. 1365-1378, <span class="year">2008</span>.
+<li id=Atkeson86>
+<span class="author">Christopher G Atkeson and Chae H An and John M Hollerbach</span>, 
+<span class="title">"Estimation of inertial parameters of manipulator loads and links"</span>, 
+<span class="publisher">The International Journal of Robotics Research</span>, vol. 5, no. 3, pp. 101--119, <span class="year">1986</span>.
 
 </li><br>
 <li id=Wensing17a>
@@ -1413,6 +1517,18 @@ <h1><a href="index.html" style="text-decoration:none;">Underactuated Robotics</a
 <span class="title">"Linear matrix inequalities for physically consistent inertial parameter identification: A statistical perspective on the mass distribution"</span>, 
 <span class="publisher">IEEE Robotics and Automation Letters</span>, vol. 3, no. 1, pp. 60--67, <span class="year">2017</span>.
 
+</li><br>
+<li id=Kaess08>
+<span class="author">M. Kaess and A. Ranganathan and F. Dellaert</span>, 
+<span class="title">"i{SAM}: Incremental Smoothing and Mapping"</span>, 
+<span class="publisher">IEEE Transactions on Robotics</span>, vol. 24, no. 6, pp. 1365-1378, <span class="year">2008</span>.
+
+</li><br>
+<li id=Lee19>
+<span class="author">Taeyoon Lee and Patrick M Wensing and Frank C Park</span>, 
+<span class="title">"Geometric robot dynamic identification: A convex programming approach"</span>, 
+<span class="publisher">IEEE Transactions on Robotics</span>, vol. 36, no. 2, pp. 348--365, <span class="year">2019</span>.
+
 </li><br>
 <li id=Gautier97>
 <span class="author">Maxime Gautier</span>,