learn-lang-diary/learn-lang-diary-part-nine.lyx

#LyX 2.3 created this file. For more info see http://www.lyx.org/
\lyxformat 544
\begin_document
\begin_header
\save_transient_properties true
\origin unavailable
\textclass article
\begin_preamble
\usepackage{url} 
\usepackage{slashed}
\end_preamble
\use_default_options false
\maintain_unincluded_children false
\language english
\language_package default
\inputencoding utf8
\fontencoding global
\font_roman "times" "default"
\font_sans "helvet" "default"
\font_typewriter "cmtt" "default"
\font_math "auto" "auto"
\font_default_family default
\use_non_tex_fonts false
\font_sc false
\font_osf false
\font_sf_scale 100 100
\font_tt_scale 100 100
\use_microtype false
\use_dash_ligatures false
\graphics default
\default_output_format default
\output_sync 0
\bibtex_command default
\index_command default
\paperfontsize default
\spacing single
\use_hyperref true
\pdf_bookmarks true
\pdf_bookmarksnumbered false
\pdf_bookmarksopen false
\pdf_bookmarksopenlevel 1
\pdf_breaklinks true
\pdf_pdfborder true
\pdf_colorlinks true
\pdf_backref false
\pdf_pdfusetitle true
\papersize default
\use_geometry false
\use_package amsmath 2
\use_package amssymb 2
\use_package cancel 1
\use_package esint 0
\use_package mathdots 1
\use_package mathtools 1
\use_package mhchem 0
\use_package stackrel 1
\use_package stmaryrd 1
\use_package undertilde 1
\cite_engine basic
\cite_engine_type default
\biblio_style plain
\use_bibtopic false
\use_indices false
\paperorientation portrait
\suppress_date false
\justification true
\use_refstyle 0
\use_minted 0
\index Index
\shortcut idx
\color #008000
\end_index
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\paragraph_indentation default
\is_math_indent 0
\math_numbering_side default
\quotes_style english
\dynamic_quotes 0
\papercolumns 1
\papersides 1
\paperpagestyle default
\listings_params "basicstyle={\ttfamily},basewidth={0.45em}"
\tracking_changes false
\output_changes false
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
\end_header

\begin_body

\begin_layout Title
Language Learning Diary - Part Nine
\end_layout

\begin_layout Date
Oct 2022 – Present
\end_layout

\begin_layout Author
Linas Vepštas
\end_layout

\begin_layout Abstract
The language-learning effort involves research and software development
 to implement the ideas concerning unsupervised learning of grammar, syntax
 and semantics from corpora.
 This document contains supplementary notes and a loosely-organized semi-chronol
ogical diary of results.
 The notes here might not always makes sense; they are a short-hand for
 my own benefit, rather than aimed at you, dear reader!
\end_layout

\begin_layout Section*
Introduction
\end_layout

\begin_layout Standard
Part Nine of the diary explores continuous learning.
\end_layout

\begin_layout Section*
Summary Conclusions
\end_layout

\begin_layout Standard
A summary of what is found in this part of the diary:
\end_layout

\begin_layout Itemize
None yet.
 
\end_layout

\begin_layout Section*
Hard lessons learned
\end_layout

\begin_layout Standard
Experiment-17 is the teacher.
 Here is what we learned:
\end_layout

\begin_layout Itemize
The disjuncts in `r16-merge.rdb` and `r13-all-in-one.rdb` are insufficient
 to generate interesting sentences.
 There are too few of them.
\end_layout

\begin_layout Itemize
Apparently, trimming has depleted the ranks.
 Thus, although they "look good" when examine individually, they're not
 rich enough to be used.
 
\end_layout

\begin_layout Standard
Here is what we can do differently, going forwards:
\end_layout

\begin_layout Itemize
This suggests clustering should be more aggressive.
 Clustering enriches the number of available disjuncts on any given word.
\end_layout

\begin_layout Itemize
A solution to not having enough disjuncts of the right shape is to supplements
 existing disjuncts with optional single links taken from word-pairs.
 This explodes the RAM usage in LG, up to 10 GB or 20GB or maybe more, depending.
 
\end_layout

\begin_layout Itemize
The LG `dict-atomese` backend was extended to use word-pairs and also ANY
 links.
 Disjuncts can now have optional word-pair connectors on them.
 (Done, Nov 2022)
\end_layout

\begin_layout Itemize
Since the LG atomese dict can now use single word-pairs, that means it can
 do MST/MPG parsing.
 Thus, we can get rid of the atomspace MST parser.
 (Done, Jan 2023)
\end_layout

\begin_layout Itemize
The LG backend can also supplement disjuncts with ANY links.
 (Done, Nov 2022)
\end_layout

\begin_layout Itemize
The MST/MPG mode can also use ANY links.
 (Done, Nov 2022)
\end_layout

\begin_layout Itemize
As a result, the LG parser can do all of it -- random-tree ANY parsing,
 MST/MPG parsing, and Section/disjunct parsing.
 
\end_layout

\begin_layout Itemize
This creates a possibility of doing "continuous learning": learning word
 pairs and disjuncts at the same time.
 
\end_layout

\begin_layout Itemize
However, the more complex portions cannot run until marginals recomputed.
 This suggests a natural awake/asleep cycle.
 During the awake cycle, data is ingested.
 During the asleep cycle, marginals are (re-) computed, MI is (re-)computed,
 similarities are updated.
 This is very nice, it gets rid of the pipeline.
 
\end_layout

\begin_layout Itemize
So it seems like it's time to abolish the pipeline.
\end_layout

\begin_layout Itemize
Easier said than done (Dec 2022) Computing MI on the fly raises issues with
 caching, stale data, write-back to the DB, read from the DB, general data
 flow.
 It's a bit messy.
 So this has to be a back-burner project.
\end_layout

\begin_layout Itemize
For example, a caching ProxyNode can be created.
 This would effectively be the old ECAN idea, this time done right.
\end_layout

\begin_layout Itemize
BTW, we can do GOE similarity with just word-pair MI.
 So clustering can begin before disjuncts have been created.
\end_layout

\begin_layout Standard

\end_layout

\begin_layout Section*
The Plan
\end_layout

\begin_layout Itemize
Start with pair counting.
 Do NOT trim until after MST.
\end_layout

\begin_layout Itemize
Use uniform sentence lengths.
\end_layout

\begin_layout Itemize
During MST, count the pairs that contributed to the MST.
 Lets call this 
\begin_inset Quotes eld
\end_inset

second counting
\begin_inset Quotes erd
\end_inset

.
 Never trim second-counted pairs (at least not for the next few steps.)
\end_layout

\begin_layout Itemize
Perform tentative GOE clustering before MST, perform MST with and without
 clusters, try to see which is betters!? How to tell which is better? I
 guess higher totla MI.
 But how to count/trakc which contributed the most, and still maintain detailed
 balance? I.e.
 how to 
\begin_inset Quotes eld
\end_inset

undo
\begin_inset Quotes erd
\end_inset

 clustering? Or will second-counting be sufficinet to track? 
\end_layout

\begin_layout Section*
TODO List
\end_layout

\begin_layout Standard
To-do items that are NOT covered in this chapter, but should be done someday,
 some-how:
\end_layout

\begin_layout Itemize
How does the number of word-pairs scale as a function of vocabulary size?
\end_layout

\begin_layout Itemize
How does vocabulary size scale as a function of corpus size? We've monitored
 the above quantities repeatedly, but never really worked out the scaling
 relationships.
\end_layout

\begin_layout Section*
Notes
\end_layout

\begin_layout Standard
Nov 2022 – Tried restarting with `run-1-marg-tranche-123.rdb` which is not
 trimmed.
 But it's huge.
 300K x 300K and *lots* of the words have backslashes in them Yuck! 52GB
 to load ...
 Need to start over.
\end_layout

\begin_layout Standard
Bringup of the above ideas is in Expt-18.
\end_layout

\begin_layout Section*
Hypervector ruminations
\end_layout

\begin_layout Standard
Some questions:
\end_layout

\begin_layout Itemize
Given a vector in the GOE (say, a particular word, with coordinates measured
 via MI, offset and normalized to unit length), which is the nearest cube-corner
 (bipolar hypervector)? I assume it's the one with with all coordinated
 rounded to either +1 or -1.
 But this needs proof.
\end_layout

\begin_layout Itemize
What is the angular distance to that corner?
\end_layout

\begin_layout Itemize
Given a corner in the cube, what is the nearest actual vector?
\end_layout

\begin_layout Itemize
What is the distribution of Hamming distance vs.
 actual distance? That is, pick two actual vectors from the dataset.
 Their dot product is the 
\begin_inset Quotes eld
\end_inset

actual
\begin_inset Quotes erd
\end_inset

 distance.
 The Hamming distance between them is the Hamming distance between their
 nearest cube corners.
\end_layout

\begin_layout Itemize
Given a random cube corner, what is the most efficient way of finding the
 nearest element in the dataset?
\end_layout

\begin_layout Standard
Other unrelated questions:
\end_layout

\begin_layout Itemize
What is the QR decomposition of word-pair MI? viz 
\begin_inset Formula $M=QR$
\end_inset

 and 
\begin_inset Formula $Q$
\end_inset

is orthogonal and 
\begin_inset Formula $R$
\end_inset

 is upper right triangular...
 is there significance to this decomposition? Is there a past (left) vs
 (right) future light-cone thing going on?
\end_layout

\begin_layout Itemize
Can the relationship between free groups and hypervectors be exploited?
 For example, consider the free group 
\begin_inset Formula $F$
\end_inset

 in two generators 
\begin_inset Formula $A,B$
\end_inset

.
 Dividing by the commutator 
\begin_inset Formula $AB-BA=0$
\end_inset

 just abelianizes this, reducing it to 2D Cartesian space (well, I'm skipping
 details; it goes to 
\begin_inset Formula $\mathbb{Z}\times\mathbb{Z}$
\end_inset

, and the module over the reals with more constraints gives Cartesion space).
 We can do the same tricks in very high dimensions, except this time leaving
 some of the generators free, or perhaps applying other constraints, to
 create presentations of more complex groups, and then working with modules
 over them.
 This then has the flavor of a history monoid or trace monoid, but now as
 groups/modules.
 The 
\begin_inset Quotes eld
\end_inset

central
\begin_inset Quotes erd
\end_inset

 part is fully abelianized and high-dimensional, but additional 
\begin_inset Quotes eld
\end_inset

dimensions
\begin_inset Quotes erd
\end_inset

 might not be, retaing a free gtrucuture, or partially constrained.
 The question: can this lead to any useful insights ior tools?
\end_layout

\begin_layout Itemize
BTW, the above provides an explanation for the perception of fractals and
 hyperbolic spaces in hallucinations (DMT, LSD, etc.): the hypothesis is
 this.
 Neural (cortical) columns in the visual cortex provide basic structures
 for parallel processing, but, without cross connections, the processing
 is that of a free monoid.
 To perceive 3D spaces, there are cross-column connections that abelianize
 the free module down to Caretesian 3D space, which we perceive 
\begin_inset Quotes eld
\end_inset

directly
\begin_inset Quotes erd
\end_inset

.
 The hallucinogens disrupt the communications between columns, exposing
 the basic parallel, free structure: i.e.
 the fractals and hyperbolic spaces.
\end_layout

\begin_layout Itemize
The above also suggests that DMT might allow the brain to 4D or 5D perception
 more quickly.
 That is, we can project 4D, 5D shapes down to a 2D computer screen/visual
 cortex, and the task is to learn, in a 
\begin_inset Quotes eld
\end_inset

natural
\begin_inset Quotes erd
\end_inset

 way, to perceive the 4D space.
 Tis would seem to require rewiring the cortical columns from their 3D module
 presentation to a 4D presentation, and perhaps the hallucinnogens would
 ease the required disruption and rewiring.
 See web page, 
\begin_inset CommandInset href
LatexCommand href
name "Hallucination of Fractals"
target "https://linas.org/math/hallucination.html"
literal "false"

\end_inset

.
\end_layout

\begin_layout Standard
Then the oldie but goodie:
\end_layout

\begin_layout Itemize
The GOE vectors form a de facto vierbein.
 What happens when I move from word to word? Can I define a connection?
 a curvature? a torsion? Even if it's flat, when I travel around a loop,
 is there a holonomy?
\end_layout

\begin_layout Section*
Counting vs.
 Bayesian Probability
\end_layout

\begin_layout Standard
Open question: How can the counting methods that we've been employing so
 far, be bridged back to Bayesian theory? For the simplest possible case,
 there is a clear bridge between counting and Bayesian probability.
 It is summarized by the vapid catch-phrase 
\begin_inset Quotes eld
\end_inset

update your priors
\begin_inset Quotes erd
\end_inset

.
 But how can this be done for structure learning?
\end_layout

\begin_layout Subsection*
The Bernoulli Process
\end_layout

\begin_layout Standard
Below follows a laborious analysis of the simplest possible case: the Bernoulli
 process of a coin toss.
 In this case, there is a very direct relation between counting and Bayesian
 theory: The Bayesian prior is given by the beta function, which increments
 by one with each coin-toss result.
 One performs counting on coin-toss results, and the beta function provides
 the correct way to 
\begin_inset Quotes eld
\end_inset

update your priors
\begin_inset Quotes erd
\end_inset

 with each new toss.
 Lets proceed with this laborious review.
 It tries to hit on all the key assumptions going into the analysis.
\end_layout

\begin_layout Standard
If we know (a priori) that we are dealing with a coin toss (only two outcomes:
 heads or tails) and if we know (a priori) that the same coin is being used
 (its a stationary process) and finally, if we know (a priori) that there
 is no interference with the coin, then, by careful analysis, we can conclude
 that the Bernoulli process can be described by a single real number 
\begin_inset Formula $0\le p\le1$
\end_inset

 giving the probablity that the next coin toss is 
\begin_inset Formula 
\begin{align*}
P\left(\mbox{heads}\right) & =p\\
P\left(\mbox{tails}\right) & =1-p
\end{align*}

\end_inset

But we do not have any a priori information about what 
\begin_inset Formula $p$
\end_inset

 might be.
 The task is to try to discover this number, by observation of an actual
 sequence of coin tesses.
 How does one do this? Well, the theory of Bayesian probability provides
 a strict methodology.
 Let's review this.
\end_layout

\begin_layout Standard
Since one does not know what 
\begin_inset Formula $p$
\end_inset

 might be, one creates a 
\begin_inset Quotes eld
\end_inset

prior
\begin_inset Quotes erd
\end_inset

, a reasonable assumption about it.
 Here, the reasonable assumption would be that the likelihood of 
\begin_inset Formula $p=\theta$
\end_inset

 is uniformly distributed over the interval 
\begin_inset Formula $0\le\theta\le1$
\end_inset

.
 That is, a hypothesis 
\begin_inset Formula $h_{\theta}$
\end_inset

 is made that 
\begin_inset Formula $p=\theta$
\end_inset

 and the letter 
\begin_inset Formula $\theta$
\end_inset

 is acting as a label indicating that we are currently working with hypothesis
 
\begin_inset Formula $h_{\theta}$
\end_inset

.
 In the present case, we have uncountably many hypothesis.
 We express the idea of zero knowledge by assigning a uniform distribution
 
\begin_inset Formula $\mu\left(h_{\theta}\right)=1$
\end_inset

 for the hypothesis 
\begin_inset Formula $h_{\theta}$
\end_inset

.
 I write 
\begin_inset Formula $\mu\left(h_{\theta}\right)$
\end_inset

 instead of 
\begin_inset Formula $P\left(h_{\theta}\right)$
\end_inset

 for the prior, to emphasize that 
\begin_inset Formula $\mu$
\end_inset

 can be, and should be thought of as a measure-theoretic measure on the
 unit interval.
 
\end_layout

\begin_layout Standard
Lets assume that 
\begin_inset Formula $N\ge0$
\end_inset

 coin tosses have been observed, giving a sequence 
\begin_inset Formula $x=\left[x_{1},x_{2},\cdots,x_{N}\right]$
\end_inset

 with each 
\begin_inset Formula $x_{k}\in\left\{ \mbox{heads},\mbox{tails}\right\} $
\end_inset

.
 The probability of observing the next toss 
\begin_inset Formula $y=x_{N+1}\in\left\{ \mbox{heads},\mbox{tails}\right\} $
\end_inset

 is factored through the likelihood of hypothesis:
\begin_inset Formula 
\begin{align*}
P\left(y\vert x\right)= & \int_{\theta}P\left(y\left|h_{\theta}\right.\right)P\left(\left.h_{\theta}\right|x\right)\\
= & \int_{\theta}P\left(y\left|h_{\theta}\right.\right)\mu\left(\left.h_{\theta}\right|x\right)
\end{align*}

\end_inset

As before, I use the non-standard notation 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)=P\left(\left.h_{\theta}\right|x\right)$
\end_inset

 to indicate that 
\begin_inset Formula $\mu$
\end_inset

 is a measure on the space of hypothesis.
 This is useful for several reasons: first, it avoids a confusing proliferation
 of 
\begin_inset Formula $P$
\end_inset

's, twisty little passages all alike.
 Best if different actors wore different clothes, so that we can tell them
 apart more easily.
 Secondly, it helps distinguish likelihood from probability, and makes clear
 that distributions on hypothesis are very different than probabilities
 of future, unknown events.
 The space of hypothesis belongs to the 
\begin_inset Quotes erd
\end_inset

world model
\begin_inset Quotes erd
\end_inset

: the model that we keep in our heads, to represent the universe outside
 of ourselves.
 The integral over hypothesis makes it clear that the natural interpretation
 of Bayesian theory is the Many-Worlds Interpretation: we assign each possible
 world 
\begin_inset Formula $h_{\theta}$
\end_inset

 a measure 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

.
 In statistics, this is called the 
\begin_inset Quotes eld
\end_inset

belief
\begin_inset Quotes erd
\end_inset

 that the possible world 
\begin_inset Formula $h_{\theta}$
\end_inset

 is the true world in which we live.
 This talk of 
\begin_inset Quotes eld
\end_inset

beleifs
\begin_inset Quotes erd
\end_inset

 indicated that we must apply modal logic, and perhaps Kripke semantics
 or at least some kind of possible-world semantics, when reasoning about
 Bayesian priors.
 Last but not least, the factorization across possible worlds can now be
 written as 
\begin_inset Formula 
\begin{align*}
P\left(y\vert x\right)= & \int_{\theta}P\left(y\left|h_{\theta}\right.\right)\mu\left(\left.h_{\theta}\right|x\right)\\
= & \int_{\mu}P\left(y\left|h_{\theta}\right.\right)
\end{align*}

\end_inset

where 
\begin_inset Formula $\int_{\mu}$
\end_inset

 is the conventional notation for an integral performed over a measureable
 space; viz.
 a Borel set endowed with a topology; presumably the weak topology that
 allows additive measures and all the other goodness implied by the Kolmogorov
 equivalence between probability and measure theory.
 To retain measurability, we require that the set of all possible worlds
 have a probability of one:
\begin_inset Formula 
\[
1=\int_{\mu}=\int_{\theta}\mu\left(\left.h_{\theta}\right|x\right)
\]

\end_inset

In the language of statistical mechanics, the set 
\begin_inset Formula $\left\{ \mu\left(\left.h_{\theta}\right|x\right)\mbox{ s.t. }0\le\theta\le1\right\} $
\end_inset

 is called a 
\begin_inset Quotes eld
\end_inset

canonical ensemble
\begin_inset Quotes erd
\end_inset

.
 
\end_layout

\begin_layout Standard
The hypothesis space includes a number of priors that are not updated.
 These are invariants; they don't change because they are, in a sense, outside
 of the scope of the problem being observed.
 The framework being developed here does not provide a machanism for updating
 these beleifs:
\end_layout

\begin_layout Itemize
Assumptions about the physics of coin tosses.
\end_layout

\begin_layout Itemize
Assumption that only two outcomes are possible.
\end_layout

\begin_layout Itemize
Assumption that the same coin is tossed, each time; that the coin does not
 change with time.
\end_layout

\begin_layout Itemize
Assumption that the outcome of a coin toss is correctly observed, and that
 there is no observational noise (stocahstic or systemaic).
\end_layout

\begin_layout Itemize
Assumption that the tools of Bayesian theory are accessible for use, and
 that the theory itself is correct.
\end_layout

\begin_layout Itemize
Assumption that logical interence can be performed and that the outcome
 of using logic is trustworthy.
\end_layout

\begin_layout Itemize
Assumption that a toolset for algebraic manipulations is accessible and
 employable.
\end_layout

\begin_layout Standard
The first four bullets above seem reasonable, as they can be obviously varied:
 a wind may be blowing; the coin may be pyramidal; the coin might bend or
 change shape with each toss.
 Poor eyesight might result in some of the tails being perceived as heads
 (a systematic bias).
 The last bullets are weirder: we normally accept logic and algebra as foundatio
nal, fixed and always true.
 In practice, however, it is not so clear-cut: algebraic calculations may
 be intractable; logical reasoning chains too deep.
 Mistakes (bugs) in the software and problem setup may occur.
 This text that you are reading right now, that explains things, may itself
 explain things incorrectly.
 Perhaps superior techniques remain undiscovered.
 All these issues are set aside for this example, although they lurk for
 the general case.
\end_layout

\begin_layout Standard
Using the a priori assumptions about the physics of coin tosses and the
 mathematical nature of stochastic processes, then applying logical inference
 to the coin-toss problem, we may conclude, in an a priori fashion, that,
 given hypothesis 
\begin_inset Formula $h_{\theta}$
\end_inset

, the probability of the coin toss is result is 
\begin_inset Formula 
\[
P\left(y\left|h_{\theta}\right.\right)=\begin{cases}
\theta & \mbox{if }y=\mbox{heads}\\
1-\theta & \mbox{if }y=\mbox{tails}
\end{cases}
\]

\end_inset

To obtain an expression for 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

, we apply Bayes rule, which relates the likelihood of a hypothesis to the
 probability of prior outcomes:
\begin_inset Formula 
\[
\mu\left(\left.h_{\theta}\right|x\right)=P\left(\left.h_{\theta}\right|x\right)=\alpha P\left(x\left|h_{\theta}\right.\right)\mu\left(h_{\theta}\right)
\]

\end_inset

We already have that the prior, before any observations at all have been
 made, is 
\begin_inset Formula 
\[
\mu\left(h_{\theta}\right)=\mu\left(\left.h_{\theta}\right|\varnothing\right)=1
\]

\end_inset

The coefficient 
\begin_inset Formula $\alpha$
\end_inset

 is normalization constant, forced by 
\begin_inset Formula $1=\int_{\theta}\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

 as noted above.
 To obtain the 
\begin_inset Quotes eld
\end_inset

a posteriori prior
\begin_inset Quotes erd
\end_inset

 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

 that holds after measurements 
\begin_inset Formula $x$
\end_inset

, we need an expression for 
\begin_inset Formula $P\left(x\left|h_{\theta}\right.\right)$
\end_inset

.
 
\end_layout

\begin_layout Standard
The assumption of independence implies that the Cartesian product can be
 used on the event space.
 That is, we are working with a Bernoulli process, not a Markov process,
 nor something more complex.
 Thus, a priori logical inference allows us to conclude that we can work
 with the cylinder set measure on the weak topology on a Cartesian product
 space.
 Cylinder set measures factorize in a 
\begin_inset Quotes eld
\end_inset

trivial
\begin_inset Quotes erd
\end_inset

 way, which allows the expression
\begin_inset Formula 
\[
P\left(x\left|h_{\theta}\right.\right)=\prod_{k=1}^{N}P\left(x_{k}\left|h_{\theta}\right.\right)
\]

\end_inset

This reduces the problem to simply counting the number of times 
\begin_inset Formula $N_{H}$
\end_inset

 that heads has been observed, and 
\begin_inset Formula $N_{T}=N-N_{H}$
\end_inset

 that tails has been observed.
 Plugging in,
\begin_inset Formula 
\[
P\left(x\left|h_{\theta}\right.\right)=\theta^{N_{H}}\left(1-\theta\right)^{N_{T}}
\]

\end_inset

This provides a closed-form expression for the probability of observing
 a specific result of a coin toss, given a prior observational history:
\begin_inset Formula 
\begin{align*}
P\left(y\vert x\right)= & \int_{\mu}P\left(y\left|h_{\theta}\right.\right)\\
= & \int_{\theta}P\left(y\left|h_{\theta}\right.\right)\mu\left(\left.h_{\theta}\right|x\right)\\
= & \alpha\int_{\theta}P\left(y\left|h_{\theta}\right.\right)P\left(x\left|h_{\theta}\right.\right)\\
= & \begin{cases}
\alpha\beta\left(N_{H}+1,N_{T}\right) & \mbox{if }y=\mbox{heads}\\
\alpha\beta\left(N_{H},N_{T}+1\right) & \mbox{if }y=\mbox{tails}
\end{cases}
\end{align*}

\end_inset

where
\begin_inset Formula 
\[
\beta\left(a,b\right)=\int d\theta\;\theta^{a}\left(1-\theta\right)^{b}=B\left(a+1,b+1\right)
\]

\end_inset

and 
\begin_inset Formula 
\[
B\left(z_{1},z_{2}\right)=\frac{\Gamma\left(z_{1}\right)\Gamma\left(z_{2}\right)}{\Gamma\left(z_{1}+z_{2}\right)}
\]

\end_inset

is the conventional Beta function (the reciprocal of the binomial coefficient).
\end_layout

\begin_layout Standard
At last, we have arrived.
 We've reduced the problem of observing coin tosses into two parts:
\end_layout

\begin_layout Itemize
A large, complicated set of a priori assumptions and logical, mathematical,
 algebraic manipulations, all of which can be done before observing any
 coin tosses at all.
\end_layout

\begin_layout Itemize
A simple and direct counting of the number of times that heads and tails
 are observed.
\end_layout

\begin_layout Standard
The catchphrase 
\begin_inset Quotes eld
\end_inset

update your priors
\begin_inset Quotes erd
\end_inset

 takes on an explicit form in this framework.
 The 
\begin_inset Quotes eld
\end_inset

prior
\begin_inset Quotes erd
\end_inset

 is the likelihood of a hypothesis 
\begin_inset Formula $h_{\theta}$
\end_inset

 being true.
 Prior to any observations, i.e.
 for 
\begin_inset Formula $N=0$
\end_inset

, 
\begin_inset Formula $x=\varnothing$
\end_inset

 we had 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|\varnothing\right)=\mu\left(h_{\theta}\right)=1$
\end_inset

 and after 
\begin_inset Formula $N=N_{H}+N_{T}$
\end_inset

 observations, we got the 
\begin_inset Quotes eld
\end_inset

prior
\begin_inset Quotes erd
\end_inset


\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
\mu\left(\left.h_{\theta}\right|x\right)=\alpha P\left(x\left|h_{\theta}\right.\right)\mu\left(h_{\theta}\right)=\alpha\theta^{N_{H}}\left(1-\theta\right)^{N_{T}}=f\left(\theta;N_{H},N_{T}\right)
\]

\end_inset

The normalization constant 
\begin_inset Formula $\alpha$
\end_inset

 is now clearly 
\begin_inset Formula $\alpha=1/\beta\left(N_{H},N_{T}\right)$
\end_inset

.
 The 
\begin_inset Formula $f\left(\theta;N_{H},N_{T}\right)$
\end_inset

 is the so-called 
\begin_inset Quotes eld
\end_inset

beta distribution
\begin_inset Quotes erd
\end_inset

.
 Here, it is off-by-one from the Wikipedia definition only because it's
 tedious to have to write 
\begin_inset Formula $N_{H}+1$
\end_inset

 everywhere.
\end_layout

\begin_layout Standard
The 
\begin_inset Quotes eld
\end_inset

updated prior
\begin_inset Quotes erd
\end_inset

 is
\begin_inset Formula 
\[
\mu\left(\left.h_{\theta}\right|y,x\right)=\begin{cases}
f\left(\theta;N_{H}+1,N_{T}\right) & \mbox{if }y=\mbox{heads}\\
f\left(\theta;N_{H},N_{T}+1\right) & \mbox{if }y=\mbox{tails}
\end{cases}
\]

\end_inset

This plainly exposes the effect of counting on the likelihood of possible
 world 
\begin_inset Formula $h_{\theta}$
\end_inset

.
 It very plainly illustrates that the catchphrase 
\begin_inset Quotes eld
\end_inset

update your priors
\begin_inset Quotes erd
\end_inset

 really means 
\begin_inset Quotes eld
\end_inset

count the events occuring in the world about you, and update the weightings
 on the hypothesis of the possible worlds that you live in
\begin_inset Quotes erd
\end_inset

.
 This is the explicit relationship between counting and Bayesian theory
 that we are looking for.
\end_layout

\begin_layout Subsection*
Maximum Likelihood
\end_layout

\begin_layout Standard
Out of all possible worlds (that is, out of all elements in the canonical
 ensemble), there should be one that is 
\begin_inset Quotes eld
\end_inset

most likely
\begin_inset Quotes erd
\end_inset

, the one with the maximum likelihood.
 This selects a single model, and the update problem becomes one of updating
 this model under new observations.
 
\end_layout

\begin_layout Standard
The driving idea here is that it might be a chore to matain a large collection
 of possible worlds 
\begin_inset Formula $h_{\theta}$
\end_inset

, each with it's 
\begin_inset Quotes eld
\end_inset

prior distribution
\begin_inset Quotes erd
\end_inset

 of 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

.
 Thus, the ensemble is discarded, in favor of keeping track of only one
 element, the most likely one.
 The update problem is that of selecting a parameter 
\begin_inset Formula $\theta_{\mbox{max}}$
\end_inset

 that corresponds to the most likely (aka 
\begin_inset Quotes eld
\end_inset

classical
\begin_inset Quotes erd
\end_inset

) world 
\begin_inset Formula $h_{\theta\mbox{max}}$
\end_inset

, and then updating 
\begin_inset Formula $\theta_{\mbox{max}}$
\end_inset

 with each new observation.
 This is conceptually easy; one solves for the extremum in the prior
\begin_inset Formula 
\[
\left.\frac{d}{d\theta}\mu\left(\left.h_{\theta}\right|x\right)\right|_{\theta=\theta_{\mbox{max}}}=0
\]

\end_inset

In general, computing this directly is not possible, or perhaps intractable,
 primarily because an analytic expression for 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

 is not available.
 That is, the prior distribution 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

 is availble only numerically.
 The extrememum problem must then be solved numerically as well, typically
 through gradient descent.
\end_layout

\begin_layout Standard
For the case of the Bernoulli process, though, analtic expressions are available
, and the extremum is directly calculable.
 Suppose we have 
\begin_inset Formula $N=N_{H}+N_{T}$
\end_inset

 observations of coin flips so far.
 The solution to
\begin_inset Formula 
\begin{align*}
0= & \frac{d}{d\theta}\mu\left(\left.h_{\theta}\right|x\right)\\
= & \frac{d}{d\theta}\alpha\theta^{N_{H}}\left(1-\theta\right)^{N_{T}}\\
= & \left[\frac{N_{H}}{\theta}-\frac{N_{T}}{1-\theta}\right]\alpha\theta^{N_{H}}\left(1-\theta\right)^{N_{T}}
\end{align*}

\end_inset

is given by
\begin_inset Formula 
\[
\theta=\frac{N_{H}}{N_{H}+N_{T}}
\]

\end_inset

That is, the maximum-likelihood prior is just the fraction of times that
 heads came up, out of all coinn tosses.
 Updating, to obtain the posterior, preserves this form: 
\begin_inset Formula $N_{H}$
\end_inset

 and 
\begin_inset Formula $N_{T}$
\end_inset

 are updated with new observations.
 This gives direct access to the 
\begin_inset Quotes eld
\end_inset

frequentist
\begin_inset Quotes erd
\end_inset

 answer.
 Again, it is just a matter of counting.
\end_layout

\begin_layout Standard
Pay attention to the dramatically different results! In the Bayesian case,
 we beleive (
\emph on
a priori
\emph default
) that the Beta function provides the prior distribution, and we count in
 order to maintain the full canonical ensmble up-to-date.
 In the frequentist case, we don't know what the distribution is (we do
 not need to make any 
\emph on
a priori
\emph default
 assumptions about it), but we do know what the probabilities are for the
 next coin flip.
\end_layout

\begin_layout Standard
In the above, an assumption was made that there is only one such maximal
 world; and it's a valid one for the Bernoulli process.
 More generally, this is not the case.
 For classical thermodynamics, there are many different possible states
 that are energetically equivalent: this is the microcanonical ensemble.
 Each state in the microcanonical ensemble is equally likely.
 Each has the same probability.
 The number of such states is large: Avogadro's number, in chemistry.
\end_layout

\begin_layout Subsection*
Uncertainty
\end_layout

\begin_layout Standard
How certain are we about the maximum likelihood? For this, we need the second
 derivative.
 That is, rather than carrying an explicit formulation about beliefs, peraps
 we can approximate with the maximum likelihood, and it's second derivative?
 A Taylor's expansion to second order? The goal of this exercise is to develop
 a 
\begin_inset Quotes eld
\end_inset

gut sense
\begin_inset Quotes erd
\end_inset

 of this system.
 Plug and chug:
\begin_inset Formula 
\begin{align*}
\frac{d^{2}}{d\theta^{2}}\mu\left(\left.h_{\theta}\right|x\right)= & \frac{d^{2}}{d\theta^{2}}\mu\left(\theta;N_{H},N_{T}\right)\\
= & \frac{d^{2}}{d\theta^{2}}\alpha\theta^{N_{H}}\left(1-\theta\right)^{N_{T}}\\
= & \left[\frac{N_{H}}{\theta}-\frac{N_{T}}{1-\theta}\right]^{2}\mu+\left[-\frac{N_{H}}{\theta^{2}}-\frac{N_{T}}{\left(1-\theta\right)^{2}}\right]\mu
\end{align*}

\end_inset

where 
\begin_inset Formula $\mu=\mu\left(\theta;N_{H},N_{T}\right)$
\end_inset

.
 Then, plugging in 
\begin_inset Formula $\theta=N_{H}/\left(N_{H}+N_{T}\right)$
\end_inset

, the first term vanishes, leaving 
\begin_inset Formula 
\[
\left.\frac{d^{2}}{d\theta^{2}}\mu\left(\left.h_{\theta}\right|x\right)\right|_{\theta=N_{H}/N}=-\frac{N}{\theta\left(1-\theta\right)}\left.\mu\right|_{\theta=N_{H}/N}
\]

\end_inset

which has the odd form of looking like 
\begin_inset Quotes eld
\end_inset

one less count
\begin_inset Quotes erd
\end_inset

, on both sides, except for the normalization constant.
 But what is that doing (just for grins?) Well, lets work this out.
\begin_inset Formula 
\begin{align*}
\alpha\left(N_{H},N_{T}\right)= & \frac{1}{\beta\left(N_{H},N_{T}\right)}\\
= & \frac{\Gamma\left(N_{H}+N_{T}+2\right)}{\Gamma\left(N_{H}+1\right)\Gamma\left(N_{T}+1\right)}\\
= & \frac{\left(N_{H}+N_{T}+1\right)\left(N_{H}+N_{T}\right)}{N_{H}N_{T}}\cdot\frac{\Gamma\left(N_{H}+N_{T}\right)}{\Gamma\left(N_{H}\right)\Gamma\left(N_{T}\right)}\\
= & \frac{N\left(N+1\right)}{N_{H}N_{T}}\alpha\left(N_{H}-1,N_{T}-1\right)\\
= & \frac{N+1}{N}\cdot\frac{1}{\theta\left(1-\theta\right)}\alpha\left(N_{H}-1,N_{T}-1\right)
\end{align*}

\end_inset

and so, plugging through,
\begin_inset Formula 
\begin{align*}
\left.\frac{d^{2}}{d\theta^{2}}\mu\left(\left.h_{\theta}\right|x\right)\right|_{\theta=N_{H}/N}= & -\frac{N}{\theta\left(1-\theta\right)}\left.\mu\left(\theta;N_{H},N_{T}\right)\right|_{\theta=N_{H}/N}\\
= & -\frac{N}{\theta\left(1-\theta\right)}\alpha\left(N_{H},N_{T}\right)\theta^{N_{H}}\left(1-\theta\right)^{N_{T}}\\
= & -\frac{N+1}{\theta\left(1-\theta\right)}\alpha\left(N_{H}-1,N_{T}-1\right)\theta^{N_{H}-1}\left(1-\theta\right)^{N_{T}-1}\\
= & -\frac{N+1}{\theta\left(1-\theta\right)}\mu\left(\theta;N_{H}-1,N_{T}-1\right)
\end{align*}

\end_inset


\family roman
\series medium
\shape up
\size normal
\emph off
\bar no
\strikeout off
\xout off
\uuline off
\uwave off
\noun off
\color none
Hmm.
 Nothing particularly enlightening here.
 There's a screwball identity:
\begin_inset Formula 
\[
N\left.\mu\left(\theta;N_{H},N_{T}\right)\right|_{\theta=N_{H}/N}=\left(N+1\right)\left.\mu\left(\theta;N_{H}-1,N_{T}-1\right)\right|_{\theta=N_{H}/N}
\]

\end_inset

which is saying that, with each additional observation, the second derivative
 goes more negative, by a factor of 
\begin_inset Formula $\left(N+1\right)/N$
\end_inset

.
 I dunno.
 I can't find anything interesting here.
 Just to recap, the result is that, nrat the max likelihood peak,
\begin_inset Formula 
\[
\mu\left(\theta;N_{H},N_{T}\right)\approx\left.\mu\left(\theta;N_{H},N_{T}\right)\right|_{\theta=N_{H}/N}\left[1-\frac{N^{3}}{N_{H}N_{T}}\cdot\frac{\theta^{2}}{2}+\mathcal{O}\left(\theta^{3}\right)\right]
\]

\end_inset

which is..
 well, I'm not getting any kick, here.
 Oh well.
\end_layout

\begin_layout Subsection*
Friston Principles
\end_layout

\begin_layout Standard
Reading an intro-to-Friston, trying to decode what it says, and recast it
 into the present vocabulary.
 This one: Ryan Smith, Karl J.
 Friston, Christopher J.
 Whyte, 
\begin_inset CommandInset href
LatexCommand href
name "A step-by-step tutorial on active inference and its application to empirical data"
target "https://www.sciencedirect.com/science/article/pii/S0022249621000973"
literal "false"

\end_inset

 (April 2022.)
\end_layout

\begin_layout Standard
Friston offers a surrogate or replacement for Bayesian theory, predicated
 on the assumption that working directly with Bayes is impossible, for both
 pratical and formal considerations.
 Remnants of Bayes remain, in that there are still ensembles of hypotehtical
 worlds 
\begin_inset Formula $h_{\theta}$
\end_inset

, parameterized by some abstract parameters 
\begin_inset Formula $\theta$
\end_inset

, and that certain formal Bayes-like algebraic manipulations are still possible.
 Certain indistinct tenets of mathematical logic still seem to be taken
 for granted.
\end_layout

\begin_layout Standard
What, exacly, is 
\begin_inset Quotes eld
\end_inset

intractable
\begin_inset Quotes erd
\end_inset

? Several things, it seems.
 First, maintaining the 
\begin_inset Quotes eld
\end_inset

true
\begin_inset Quotes erd
\end_inset

 measure 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

 is intractable, because no analytic form can be written down in most cases
 (the Bernoulli process being one obvious exception.) There are also practical
 difficulties: if the parameter 
\begin_inset Formula $\theta$
\end_inset

 is a real number, then there are an unountable number of possible worlds
 
\begin_inset Formula $h_{\theta}$
\end_inset

, and so one is driven to use the toolset of finite mathematics to obtain
 approximations.
 This is suposed to be 
\begin_inset Quotes eld
\end_inset

self-evident
\begin_inset Quotes erd
\end_inset

, but is perhaps a bit dangerous if working from first prinsiples.
 
\end_layout

\begin_layout Standard
The second intractable object is the integral 
\begin_inset Formula 
\[
P\left(y\vert x\right)=\int_{\theta}P\left(y\left|h_{\theta}\right.\right)\mu\left(\left.h_{\theta}\right|x\right)
\]

\end_inset

There are two distinct issues: one is that of discretizing 
\begin_inset Formula $\theta$
\end_inset

, as already mentioned.
 The other is actually performing the summation, simply because the space
 (manifold; ensemble) on which 
\begin_inset Formula $h_{\theta}$
\end_inset

 lives is much too large.
 That is, a numerical summation is intractable, because the for-each loop
 is just too big.
 Being forced to use numerical techniques like Monte Carlo integration to
 compute the integral is dumb, if the integral can be avoided in the first
 place.
\end_layout

\begin_layout Standard
Friston is saying that, in the general case, maintaining the measure 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

 in analytic form is intractable.
 Worse, it is not knowable in even in principle, and that one must necessarily
 work with approximations 
\begin_inset Formula $\nu\left(\left.h_{\theta}\right|x\right)$
\end_inset

 obtained not from Bayes, but from other principles, here termed 
\begin_inset Quotes eld
\end_inset

Friston principles
\begin_inset Quotes erd
\end_inset

.
 These are centered around the idea of minimizing the 
\begin_inset Quotes eld
\end_inset

surprisal
\begin_inset Quotes erd
\end_inset

, which in turn can be approximated by minimizing the 
\begin_inset Quotes eld
\end_inset

variational free energy
\begin_inset Quotes erd
\end_inset

 (VFE).
\end_layout

\begin_layout Standard
This is a bit subtle.
 One 
\begin_inset Quotes eld
\end_inset

pretends
\begin_inset Quotes erd
\end_inset

 to beleive in Bayes, just enough to be able to make abstract statements
 about the measure 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

.
 Yet, there is no particular claim of an actual platonic reality for 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

, that it is 
\begin_inset Quotes eld
\end_inset

out there somewhere
\begin_inset Quotes erd
\end_inset

, even if we 
\begin_inset Quotes eld
\end_inset

can't write it down
\begin_inset Quotes erd
\end_inset

.
 It seems to be fine to say that it is formally unknowable, and that we
 have enough of an algebraic theory to be able to approximate it, using
 counting and assorted formulas and theorems.
 These theorems are the 
\begin_inset Quotes eld
\end_inset

shadow
\begin_inset Quotes erd
\end_inset

 cast by Bayes, the shadow that remains on the wall of the Platonic cave,
 even if we can't see the true form.
 The shadow is still more than just 
\begin_inset Quotes eld
\end_inset

naive frequentist counting
\begin_inset Quotes erd
\end_inset

, even though, epistemologically, counting is all we have.
\end_layout

\begin_layout Standard
How is this accmplished? What are the principles?
\end_layout

\begin_layout Standard
The surprisal or 
\begin_inset Quotes eld
\end_inset

self-information
\begin_inset Quotes erd
\end_inset

 is defined as 
\begin_inset Formula $-\log P\left(y\left|h_{\theta}\right.\right)$
\end_inset

, which measures how 
\begin_inset Quotes eld
\end_inset

surprising
\begin_inset Quotes erd
\end_inset

 the observation 
\begin_inset Formula $y$
\end_inset

 is, in light of the world-model 
\begin_inset Formula $h_{\theta}$
\end_inset

.
 The maximum likelihood model (or the microcanonical ensemble) is the one
 that minimizes the surprisal absolutely.
 The problem of update is then to maximize the amount of information extracted
 from observation 
\begin_inset Formula $y$
\end_inset

.
\end_layout

\begin_layout Standard
Here we go slightly off the rails, I'm still digesting what Friston actualy
 says.
\end_layout

\begin_layout Standard
Friston claims that the 
\begin_inset Quotes eld
\end_inset

variational free energy
\begin_inset Quotes erd
\end_inset

 (VFE) is always greater than or equal to the surprisal, and that we can
 minimize surprisal by minimizing the variational free energy.
 (Clearly, this cannot be literally true; I guess he means that VFE is a
 suitable surrogate.)
\end_layout

\begin_layout Standard
Starting with an ensemble of priors 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

 one wishes to obtain a posterior ensemble 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|y,x\right)$
\end_inset

.
 Normally, this is done by applying Bayes rule
\begin_inset Formula 
\[
\mu\left(\left.h_{\theta}\right|y,x\right)=\alpha P\left(y\left|h_{\theta}\right.\right)\mu\left(\left.h_{\theta}\right|x\right)
\]

\end_inset

with 
\begin_inset Formula $\alpha$
\end_inset

 the 
\begin_inset Quotes eld
\end_inset

intractable
\begin_inset Quotes erd
\end_inset

 normalization constant.
 The Friston proposal is to employ a freely-variable distribution 
\begin_inset Formula $\nu\left(h_{\theta}\right)$
\end_inset

, and then define the Kullback-Liebler divergence between it and the desired
 distribution: 
\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
F\left(\nu\right)=\int_{\theta}\nu\left(h_{\theta}\right)\log\frac{\nu\left(h_{\theta}\right)}{\alpha P\left(y\left|h_{\theta}\right.\right)\mu\left(\left.h_{\theta}\right|x\right)}
\]

\end_inset

The notation 
\begin_inset Formula $F\left(\nu\right)$
\end_inset

 is used to remind that the denominator of the log is held constant, while
 a variation on 
\begin_inset Formula $\nu$
\end_inset

 is performed.
 One searches for a 
\begin_inset Formula $\nu$
\end_inset

 that minimizes 
\begin_inset Formula $F\left(\nu\right)$
\end_inset

:
\begin_inset Formula 
\[
\frac{\delta F\left(\nu\right)}{\delta\nu}=0
\]

\end_inset

The resulting 
\begin_inset Formula $\nu$
\end_inset

 that minimizes 
\begin_inset Formula $F$
\end_inset

 is interpreted as the best-guess for the posterior 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|y,x\right)$
\end_inset

.
 In order to get the normaliztion correct, the variation is performed subject
 to the constraint 
\begin_inset Formula 
\[
\int_{\theta}\nu\left(h_{\theta}\right)=1
\]

\end_inset

This normalization acts as a Lagrange multiplier, which is evidenced by
 rewriting as
\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
F\left(\nu\right)=\int_{\theta}\nu\left(h_{\theta}\right)\log\frac{\nu\left(h_{\theta}\right)}{P\left(y\left|h_{\theta}\right.\right)\mu\left(\left.h_{\theta}\right|x\right)}-\log\alpha\,\int_{\theta}\nu\left(h_{\theta}\right)
\]

\end_inset

That is, 
\begin_inset Formula $\log\alpha$
\end_inset

 is the Lagrange multiplier.
\end_layout

\begin_layout Standard
Firston calls 
\begin_inset Formula $F\left(\nu\right)$
\end_inset

 the VFE.
 I'm not sure why.
\end_layout

\begin_layout Standard
As this is all done numerically, the proposed algorithm is that of gradient
 descent.
\end_layout

\begin_layout Standard
Friston makes some claims:
\end_layout

\begin_layout Itemize
The minimization of VFE has a de-noising effect, and avoids overfitting.
\end_layout

\begin_layout Itemize
VFE measures the (Kolmogorov) complexity of a model minus the accuracy.
 Thus, minimizing VFE maximizes the accuracy, as balanced by the complexity.
\end_layout

\begin_layout Standard
Based on my earlier experience, there should be a scale factor 
\begin_inset Formula $\beta=1/kT$
\end_inset

 placed in front of the complexity.
 It is not yet clear where 
\begin_inset Formula $\beta$
\end_inset

 appears in these expressions.
 Its also not yet clear which term correspondds to the complexity.
 
\end_layout

\begin_layout Standard
Friston distinguishes 
\begin_inset Quotes eld
\end_inset

perception
\begin_inset Quotes erd
\end_inset

 and 
\begin_inset Quotes eld
\end_inset

learning
\begin_inset Quotes erd
\end_inset

:
\end_layout

\begin_layout Itemize
Perception corresponds to an update of the most likely hypothesis 
\begin_inset Formula $h_{\theta}$
\end_inset

 (which Friston calls 
\begin_inset Quotes eld
\end_inset

the Markov state
\begin_inset Quotes erd
\end_inset

, see below), or, more generally, an update of 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

 with each new observation.
\end_layout

\begin_layout Itemize
Learning corresponds to an update of the 
\begin_inset Quotes eld
\end_inset

empty prior
\begin_inset Quotes erd
\end_inset

 
\begin_inset Formula $\mu\left(h_{\theta}\right)=\mu\left(\left.h_{\theta}\right|\varnothing\right)$
\end_inset

 that is used before any observations are made.
 
\end_layout

\begin_layout Standard
The distinction here is subtle.
 Normally, when the system is 
\begin_inset Quotes eld
\end_inset

born
\begin_inset Quotes erd
\end_inset

, it really does start out 
\begin_inset Quotes eld
\end_inset

empty
\begin_inset Quotes erd
\end_inset

, in that one starts with a uniform distrubution 
\begin_inset Formula $\mu\left(h_{\theta}\right)=\mu\left(\left.h_{\theta}\right|\varnothing\right)=1$
\end_inset

.
 One then has bouts of 
\begin_inset Quotes eld
\end_inset

learning
\begin_inset Quotes erd
\end_inset

, making sequences of observations 
\begin_inset Formula $x$
\end_inset

 to obtain 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

.
 One then concludes a learning session, by 
\begin_inset Quotes eld
\end_inset

forgetting
\begin_inset Quotes erd
\end_inset

 the observations 
\begin_inset Formula $x$
\end_inset

 but keeping the posterior that was learned (and now calling it the prior).
 At the start of each learning session, the initial 
\begin_inset Formula $\mu\left(h_{\theta}\right)$
\end_inset

 is no longer uniform, but is the distribution obtained at the end of the
 last bout of learning.
\end_layout

\begin_layout Subsection*
Friston Notation (Markov vs.
 Bayes)
\end_layout

\begin_layout Standard
An important remark about notation.
 It is important because the notation used in the Friston paper, 
\emph on
op cit.
\emph default
, allows some subtle confusions between Bayesian theory, frequentist theory,
 maximum likelihood theory and Markov models.
 
\end_layout

\begin_layout Standard
The variable 
\begin_inset Formula $x$
\end_inset

, originally intended to be a string of observations, can also be taken
 interpreted as a summary count of observations to date.
 It is tempting to call this count the 
\begin_inset Quotes eld
\end_inset

state
\begin_inset Quotes erd
\end_inset

, but this would be formally wrong.
 This is a bit subtle.
 In conventional Markovian orthodoxy, one says that an assumption has been
 made that the deep past does not affect the belief state.
 Rather, all past observations have been condensed to a 
\begin_inset Quotes eld
\end_inset

current state
\begin_inset Quotes erd
\end_inset

, which, for things like the Bernoulli process, can be taken to be a collection
 of counts.
 To avoid saying a 
\begin_inset Quotes eld
\end_inset

collection of counts
\begin_inset Quotes erd
\end_inset

, Friston calls this 
\begin_inset Quotes eld
\end_inset

the state
\begin_inset Quotes erd
\end_inset

.
 This is entirely standard in standard Markovian theory.
 But, in the present case, this is perhaps a poor choice of terminology,
 and has unfortunate epistemic side effects.
\end_layout

\begin_layout Standard
Here's the problem.
 This terminology appears to blur the relationship between the (Bayesian)
 model parameter 
\begin_inset Formula $\theta$
\end_inset

 and the state/count 
\begin_inset Formula $x$
\end_inset

.
 Returning to the Bernoulli example: we had reduced the observation of coin
 flips to a 
\begin_inset Quotes eld
\end_inset

state
\begin_inset Quotes erd
\end_inset

 of 
\begin_inset Formula $\left(N_{H},N_{T}\right)$
\end_inset

 of the number of heads and tails that have been observed so far.
 By contrast, 
\begin_inset Formula $\theta$
\end_inset

 was a formal parameter, used as a coordinate in the space of ensembles
 (the space of hypothetical priors).
 This coordinate was needed as a stepping stone to arrive at surprisal.
 For the coin, the stepping stone is
\begin_inset Formula 
\[
P\left(y\left|h_{\theta}\right.\right)=\begin{cases}
\theta & \mbox{if }y=\mbox{heads}\\
1-\theta & \mbox{if }y=\mbox{tails}
\end{cases}
\]

\end_inset

If we discard Bayes, and use the maximum likelihood approach, then we can
 directly relate the state to the parameter: the most likely parameter value
 is 
\begin_inset Formula $\theta_{\mbox{max}}=N_{H}/\left(N_{H}+N_{T}\right)$
\end_inset

.
 But this relationship holds only in the naive frequentist model, and not
 in the Bayesian model.
 Have these two roles been confused? Yes! The confusion, though, seems to
 be an intentional, central feature of Markov theory.
\end_layout

\begin_layout Standard
To untangle the mess, we have the following concordance.
 The first column presents Bayesian theory, using the local idiosyncratic
 notation, while the second presents Markov theory, using Friston's notation.
\end_layout

\begin_layout Standard
\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
\align center
\begin_inset Tabular
<lyxtabular version="3" rows="6" columns="3">
<features tabularvalignment="middle">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="left" valignment="top">
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Bayes
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Markov (Friston)
\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Commentary
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $p\left(s\right)$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $x$
\end_inset

 is the reduced observation history (count), while 
\begin_inset Formula $s$
\end_inset

 is the 
\begin_inset Quotes eld
\end_inset

state
\begin_inset Quotes erd
\end_inset

.
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $P\left(y\left|h_{\theta}\right.\right)$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $p\left(o\left|s\right.\right)$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $y$
\end_inset

 and 
\begin_inset Formula $o$
\end_inset

 are the same thing: the latest observation.
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $P\left(y\left|h_{\theta}\right.\right)\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $p\left(o,s\right)=p\left(o\left|s\right.\right)p\left(s\right)$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $p\left(o,s\right)$
\end_inset

 is a joint probability.
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $P\left(y\left|x\right.\right)=\int_{\theta}P\left(y\left|h_{\theta}\right.\right)\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $p\left(o\right)=\sum_{s}p\left(o,s\right)$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\int_{\theta}$
\end_inset

 is an integral over all possible worlds; 
\begin_inset Formula $\sum_{s}$
\end_inset

 is a sum over all states.
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|y,x\right)$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $p\left(s\left|o\right.\right)=p\left(o,s\right)/p\left(o\right)$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="left" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Both 
\begin_inset Formula $\mu$
\end_inset

 and 
\begin_inset Formula $p\left(s\left|o\right.\right)$
\end_inset

 are the posterior.
\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
I hope this table makes it clear why I like my notation better.
 The very first line makes it clear that 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

 is a Bayesian prior, obtained from a history of observations 
\begin_inset Formula $x$
\end_inset

.
 The history can be collapsed to a pair of counts 
\begin_inset Formula $\left(N_{H},N_{T}\right)$
\end_inset

 for the Bernoulli process.
 By misuse of terminology, this could be called a 
\begin_inset Quotes eld
\end_inset

state
\begin_inset Quotes erd
\end_inset

, and even perhaps a 
\begin_inset Quotes eld
\end_inset

Markovian state
\begin_inset Quotes erd
\end_inset

, as the actual sequence of 
\begin_inset Formula $x$
\end_inset

 does not matter; only the reduced counts do.
 The Markov/Friston notation gets rid of the beleif ensemble paramter 
\begin_inset Formula $\theta$
\end_inset

, perhaps by assuming that 
\begin_inset Formula $\theta=\theta_{\mbox{max}}$
\end_inset

, but this is not entirely clear.
 If so, then Markov/Friston has buried a maximum-likelihood principle in
 here, that is not made explicit.
 I think it's unfair to blame Friston for this; I assume it's a property
 of Markov models.
\end_layout

\begin_layout Standard
The third line of the table highlights another quagmire.
 For a frequentist interpretation of probability, there is nothing at all
 wrong with calling 
\begin_inset Formula $p\left(o,s\right)$
\end_inset

 the joint probability.
 It's quite fine.
 Yet we see that the Bayesian parameter 
\begin_inset Formula $\theta$
\end_inset

 has gone missing, or rather, has been conflated with the state.
\end_layout

\begin_layout Standard
The fifth, last line, Friston writes out Bayes rule explicitly.
 By contrast, Bayes rule as already come and gone when using the notation
 
\begin_inset Formula $\mu$
\end_inset

.
 The noralizing division was subsumed into the overall normalization constant,
 forced by 
\begin_inset Formula 
\[
1=\int_{\theta}\mu\left(\left.h_{\theta}\right|x\right)=\int_{\theta}\mu\left(\left.h_{\theta}\right|y,x\right)
\]

\end_inset

which just says that the ensemble of hypothesis 
\begin_inset Formula $h_{\theta}$
\end_inset

, both prior and posterior, is all that there is, and there is nothing more.
 That is, 
\begin_inset Formula $\mu$
\end_inset

 is a measure on the space of hypothesis.
\end_layout

\begin_layout Standard
Much of the bitch and moan above can be eliminated by writing 
\begin_inset Formula $p\left(s\left|x\right.\right)$
\end_inset

 instead of 
\begin_inset Formula $p\left(s\right)$
\end_inset

 and then explicitly identifying 
\begin_inset Formula $s=h_{\theta}$
\end_inset

.
 This allows one to write 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)=p\left(s\left|x\right.\right)$
\end_inset

 and then everything becomes entirely algebraically equivalent.
 The vocabulary is still a problem.
 In one case, the word 
\begin_inset Quotes eld
\end_inset

state
\begin_inset Quotes erd
\end_inset

 applies to 
\begin_inset Formula $s$
\end_inset

 while in the other case, the word 
\begin_inset Quotes eld
\end_inset

state
\begin_inset Quotes erd
\end_inset

 applies to the reduced count obtained from the observations 
\begin_inset Formula $x$
\end_inset

.
 We conclude: the terminology 
\begin_inset Quotes eld
\end_inset

state
\begin_inset Quotes erd
\end_inset

 is problematic.
 It is not a synonym for 
\begin_inset Quotes eld
\end_inset

reduced count
\begin_inset Quotes erd
\end_inset

, no matter how tempting this is.
\end_layout

\begin_layout Standard
Where did the word 
\begin_inset Quotes eld
\end_inset

state
\begin_inset Quotes erd
\end_inset

 come from, anyway? Well, a Markov chain is more or less formally equivalent
 to a probabilistic state machine, where states are actually states and
 state transitions have a computer science interpretation.
 
\end_layout

\begin_layout Subsection*
Model Building
\end_layout

\begin_layout Standard
The derivation of the beta function above was possible, because an assumption
 was made about the independence of observations.
 This assumption is a part of the a priori world model: that we know, in
 advance, that the same, invariant two-sided coin is being tossed by a perfect
 observer, with no other physical interference.
 A more complex analysis results if a different world model is assumed.
 For example, perhaps the observer may have imperfect vision.
 In this world-model, the coin is still stable and invariant, but the hypothesis
 space must now include a model for poor vision.
 This will again require simplifying assumptions: is vision poor due to
 an occular defect, or is it due to passing dust-clouds of variable duration?
 Are heads and tails equally likely to be confused, or is one more likely
 to be observed than the other? A different example, and a different analysis
 results if the world model assumes that the coin is malleable: a flake
 of aluminum foil, or perhaps a soft clay disk, changing shape with each
 toss.
 Can we relate world-model building to inference?
\end_layout

\begin_layout Standard
It would seem that Bayesian inference requires the a priori assumption of
 some specific world-model.
 This world-model is formed before any observations are made; and is factored
 into two parts: a fixed, unalterable set of assumptions about 
\begin_inset Quotes eld
\end_inset

how the world works
\begin_inset Quotes erd
\end_inset

, and a set of parameters to be infered via Bayesian update.
 That is, Bayesian inference, in this narrow sense, is possible only after
 the physics of the world has been discovered.
 What if, instead, the goal is to discover the physics of the world? That
 is, based on a sequence of observations, determine if eyesight is faulty,
 determine if the coin is made of clay, determine if there are time periods
 where the physcal properties of the coin, or of vision, change rapidly
 or remain fixed.
 What tools do we have then?
\end_layout

\begin_layout Standard
If one does not have an a priori world model, what does one do then? A sequence
 of bits 
\begin_inset Formula $x=\left[x_{1},x_{2},\cdots,x_{N}\right]$
\end_inset

 arrive from some space alien, from some communications channel, from some
 stellar observatory, from some drug-induced transcendental hallucinatory
 state involving machine elves.
 How does one develop a physical model that can explain the sequence of
 observations?
\end_layout

\begin_layout Standard
The standard starting point is an attack on the independence asumption.
 Rather than factoring
\begin_inset Formula 
\[
P\left(x\left|h_{\theta}\right.\right)=\prod_{k=1}^{N}P\left(x_{k}\left|h_{\theta}\right.\right)=\theta^{N_{H}}\left(1-\theta\right)^{N_{T}}
\]

\end_inset

one must presume that the observation sequence 
\begin_inset Formula $x=\left[x_{1},x_{2},\cdots,x_{N}\right]$
\end_inset

 has both short-term and long-term correlations, and has a history, path
 dependence.
 The goal is to factor out recurring patterns, of variable length, determine
 the degree to which these patterns are stationary (i.e.
 subject to the central limit theorem), and, having extracted such patterns,
 generate a second stream of bits that appears to be even more random than
 the initial observation sequence.
 How is this done?
\end_layout

\begin_layout Standard
The assumption of structure is the assumption of grammar...
 our world model is that everything is grammar.
\end_layout

\begin_layout Subsection*
Sensory Processes
\end_layout

\begin_layout Standard
The above section exposes the general structure of Bayesian beleif, and
 illustrates how counting is used to 
\begin_inset Quotes eld
\end_inset

update your priors
\begin_inset Quotes erd
\end_inset

, for a very simple system: the Bernoulli process.
 Of course, we deal with a much more complex system.
 Can we perform reasoning by analogy, to obtain a realistic description
 of a many-worlds model of structure? Lets begin by reviewing the stumbling
 blocks to a generalized theory.
\end_layout

\begin_layout Standard
First up is the independence assumption.
 If we are working with a time-ordered sequence of words, then it is very
 natural to assume a probability space 
\begin_inset Formula $\Omega=W^{*}$
\end_inset

 consisting of finite-length sequences of words 
\begin_inset Formula $w\in W$
\end_inset

.
 This has multiple obvious faults, but lets write them down anyway, even
 thought I've done this a hundred times before.
 
\end_layout

\begin_layout Itemize
The vocabulary 
\begin_inset Formula $W$
\end_inset

 is not fixed: we encounter new words on a regular basis.
 Some are newly created, some are simply those that we've not known before.
\end_layout

\begin_layout Itemize
The Cartesian product topology on the set of strings is misleading, in that
 it tricks us into thinking about the space of hypothesis as being a space
 of strings, when, in fact, we wanted our hypothesis to (first) range over
 grammatical structure, and then (later) over fixed, recurring themes, and
 finally, over common sense.
\end_layout

\begin_layout Standard
To conclude: a time-ordered sequence of events is not generally independent,
 and has both short-range and long range structure.
 Yet, some events are more strongly correlatated than others, and so factorizati
on into components is in order.
\end_layout

\begin_layout Standard
The next issue is that of multi-sensory input.
 This means not only that sight and sound might be correlated, but also
 that senses, like touch, are both multi-modal and also have a spatial structure.
 The spatial structure of skin/touch sensation is that left foot is distinct
 from the right foot or the fingers.
 It comes with some idea of distance: the elbow is close to the fingers,
 but far from the toes.
 Sensations have a multi-modal vocabulary: hot, cold, pain, (sharp, sudden,
 stabbing pain, dull pain, throbbing, burning); headaches (of many different
 kinds), soft sensations: gentle brush, tickle.
 The vocabulary is one of degree and magnitude, and not on/off.
 Although, formally, when we become aware of a sensations, the awareness
 of degree (the cateorization of degree) arrives after the initial sensation.
 The vocabulary is only partially orthogonizable: we can feel cold sharp
 pain, but we cannot feel dull sharp pain.
 Perhaps at the neuronal level, one can have distinct Cartesian coordinates
 for one-dimensional sensors (say, for engineered robots) but even this
 assumption is not obvious: A robotic touch sensor may consist of a pressure
 sensor and a heat sensor, but, if mashed by a sufficiently strong pressure,
 the heat sensor may record readings as well.
 Loss-of-communications to the sensors is a distinct sensory input that
 is not 
\begin_inset Quotes eld
\end_inset

orthogonal
\begin_inset Quotes erd
\end_inset

 in the strict Cartesian sense.
 
\end_layout

\begin_layout Standard
Similar analysis applies to audio input: There's frequency, and there is
 loudness.
 There is chirp (frequency shifts).
 There are bangs, claps, and constant tones.
 There are limits to the frequency and volume of what can be heard.
 The sensors may perform some post-processing, rather than feeding us raw
 point measurements of microphone displacemenets.
 The point is that even 
\begin_inset Quotes eld
\end_inset

raw
\begin_inset Quotes erd
\end_inset

 sensory input may have ambiguous semi-structured inter-relationships.
 To conclude: we can label each multi-modal, multi-sensory inpt as 
\begin_inset Formula $x_{k}$
\end_inset

, but the space 
\begin_inset Formula $X$
\end_inset

 to which it belongs has an ambiguous morphology.
\end_layout

\begin_layout Standard
Next comes the problem of the time dimension.
 Words come in a strict temporal order, and so a distinct integer label
 
\begin_inset Formula $k\in\mathbb{Z}$
\end_inset

 can be used to label a word.
 Multi-sensory data (sight, sound, audio) has no such integer labelling,
 even when there is a lower bound of temporal resolution.
 That is, we could 
\emph on
try
\emph default
 to label sensory data with integer labels applied at the millisecond resolution
 time-scale, but te reality is most millisecond intervals will be empty
 or unchanging: most human awareness happens at a slightly coarser scale.
 So we have a temporal labelling problem as well.
 For multi-sensory awareness, there may be ambiguity of 
\begin_inset Quotes eld
\end_inset

noise
\begin_inset Quotes erd
\end_inset

 in the correlation of different senses.
 We need a way of talking about temporal multi-sensory input that is robust
 against these assorted effects.
 
\end_layout

\begin_layout Standard
For now, we dont have an adequate theory of multi-sensory inputs; we only
 have a hodge-podge of practical engineering solutions.
 But this is perhaps enough, for now.
 Onwards through the fog!
\end_layout

\begin_layout Subsection*
Tokenization
\end_layout

\begin_layout Standard
Sensory perception appears to get converted into tokens, immediately, before
 any further processing can be performed.
 We perceive spoken language as words, and, under duress, as phonemes.
 We perceive visual objects as wholes, not parts, and must guess when presented
 with fractional visual cues.
 The 
\begin_inset Quotes eld
\end_inset

sub-symbolic
\begin_inset Quotes erd
\end_inset

 level performs this tokenization.
\end_layout

\begin_layout Standard
I don't have an adequate theory of sub-symbolic tokenization that I am comfortab
le with.
\end_layout

\begin_layout Subsection*
Factorizing possible worlds
\end_layout

\begin_layout Standard
Having acknowledged the difficulty of structured multi-sensory input, lets
 now pretend it doesn't exist.
 That is, lets assume a sequence of input tokens 
\begin_inset Formula $x_{k}$
\end_inset

 drawn from some approximately constant vocabulary 
\begin_inset Formula $W$
\end_inset

, having a temporal label 
\begin_inset Formula $k$
\end_inset

.
 If the 
\begin_inset Formula $x_{k}$
\end_inset

 has structure, then assume that it is a tuple: 
\begin_inset Formula $x_{k}=\left(x_{k}^{1},x_{k}^{2},\cdots\right)$
\end_inset

 of some limited dimension.
 The tuple might be a matroid or an independence set, appropriate for that
 point in time.
\end_layout

\begin_layout Standard
One of the atomic (irreducible) conceptual elements is the probability 
\begin_inset Formula $P\left(y\vert x\right)$
\end_inset

 of observing an event 
\begin_inset Formula $y=x_{N+1}$
\end_inset

 given an already-observed sequence 
\begin_inset Formula $x=\left[x_{1},x_{2},\cdots,x_{N}\right]$
\end_inset

 at time 
\begin_inset Formula $N\ge0$
\end_inset

.
 As in the example of the Bernoulli process, we decompose this with respect
 to the priors; that is, the strength 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

 of the 
\begin_inset Quotes eld
\end_inset

beleif
\begin_inset Quotes erd
\end_inset

 in the hypothesis 
\begin_inset Formula $h_{\theta}$
\end_inset

:
\begin_inset Formula 
\begin{align*}
P\left(y\vert x\right)= & \int_{\theta}P\left(y\left|h_{\theta}\right.\right)\mu\left(\left.h_{\theta}\right|x\right)\\
= & \int_{\mu}P\left(y\left|h_{\theta}\right.\right)
\end{align*}

\end_inset

As before, the non-standard notation 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)=P\left(\left.h_{\theta}\right|x\right)$
\end_inset

 is used to indicate that 
\begin_inset Formula $\mu$
\end_inset

 is a measure on the space of hypothesis, and to reduce the proliferation
 of 
\begin_inset Formula $P$
\end_inset

 that leads to potential confusion.
 The label 
\begin_inset Formula $\theta$
\end_inset

 is now an abstract label over the 
\begin_inset Quotes eld
\end_inset

space of hypothesis
\begin_inset Quotes erd
\end_inset

 (whatever that may be).
 The 
\begin_inset Quotes eld
\end_inset

integral
\begin_inset Quotes erd
\end_inset

 
\begin_inset Formula $\int_{\theta}$
\end_inset

 is a summation over this space of hypothesis, and the notation 
\begin_inset Formula $\int_{\mu}$
\end_inset

 stands for an integral performed over a measureable space; viz.
 a Borel set endowed with some topology.
 This implies the normalization 
\begin_inset Formula $1=\int_{\theta}\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

.
 As before, we replace the concept of 
\begin_inset Quotes eld
\end_inset

hypothesis
\begin_inset Quotes erd
\end_inset

 with the concept of 
\begin_inset Quotes eld
\end_inset

possible world
\begin_inset Quotes erd
\end_inset

, as we are constructing and maintaining a collection of 
\begin_inset Formula $h_{\theta}$
\end_inset

 that possibly model the 
\begin_inset Quotes eld
\end_inset

external world as it really is
\begin_inset Quotes erd
\end_inset

 (ignoring, for now, that we may be actors in that external world, as well
 as questions as to what 
\begin_inset Quotes eld
\end_inset

really
\begin_inset Quotes erd
\end_inset

 really is.)
\end_layout

\begin_layout Standard
As before, we seek to obtain an expression for 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

, and we apply Bayes rule to get it:
\begin_inset Formula 
\[
\mu\left(\left.h_{\theta}\right|x\right)=\alpha P\left(x\left|h_{\theta}\right.\right)\mu\left(h_{\theta}\right)
\]

\end_inset

where 
\begin_inset Formula $\alpha$
\end_inset

 is a constant, fixed by normalization, and 
\begin_inset Formula $\mu\left(h_{\theta}\right)=\mu\left(\left.h_{\theta}\right|\varnothing\right)$
\end_inset

 is the prior, before any observations have been made (
\begin_inset Formula $x=\varnothing$
\end_inset

).
 As conventional, it seems appropriate to assume the uniform distribution
 for this prior: 
\begin_inset Formula 
\[
\mu\left(h_{\theta}\right)=\mu\left(\left.h_{\theta}\right|\varnothing\right)=1
\]

\end_inset

although perhaps there might be some 
\emph on
a priori
\emph default
 arguments to use a different prior.
\end_layout

\begin_layout Standard
The goal of structure factorization is to enumerate the possible structures
 of the possible worlds.
 So, for example, in possible world 
\begin_inset Formula $\theta=a$
\end_inset

 we have 
\begin_inset Formula 
\[
P\left(x_{1},x_{2},x_{3}\left|h_{a}\right.\right)=P\left(x_{1}\left|h_{a}\right.\right)P\left(x_{2},x_{3}\left|h_{a}\right.\right)
\]

\end_inset

while in possible world 
\begin_inset Formula $\theta=b$
\end_inset

 we have 
\begin_inset Formula 
\[
P\left(x_{1},x_{2},x_{3}\left|h_{b}\right.\right)=P\left(x_{1},x_{3}\left|h_{b}\right.\right)P\left(x_{2},x_{3}\left|h_{b}\right.\right)
\]

\end_inset

How should such possible worlds be enumerated? How do we list all the possible
 
\begin_inset Formula $h$
\end_inset

's?
\end_layout

\begin_layout Subsubsection*
Attempt One
\end_layout

\begin_layout Standard
What follows is not quite right, but is a first attempt at arriving at a
 workable definition of 
\begin_inset Quotes eld
\end_inset

factorization
\begin_inset Quotes erd
\end_inset

.
\end_layout

\begin_layout Standard
Given 
\begin_inset Formula $N\ge0$
\end_inset

 historical observations, we have 
\begin_inset Formula $x=\left[x_{1},x_{2},\cdots,x_{N}\right]\in W^{N}$
\end_inset

 is a string of 
\begin_inset Formula $N$
\end_inset

 symbols chosen from the vocabulary 
\begin_inset Formula $W$
\end_inset

.
 Let 
\begin_inset Formula $\pi\left(N\right)$
\end_inset

 be the power set of 
\begin_inset Formula $N=\left\{ 1,2,\cdots,N\right\} $
\end_inset

, that is, the set of all subsets of 
\begin_inset Formula $N$
\end_inset

 items.
 Each element of 
\begin_inset Formula $I\in\pi\left(N\right)$
\end_inset

 induces a substring 
\begin_inset Formula $x_{I}\subset x$
\end_inset

, so that 
\begin_inset Formula $x_{I}=\left[x_{a},x_{b},\cdots\right]$
\end_inset

 with 
\begin_inset Formula $a,b,\cdots\in I$
\end_inset

.
 We call 
\begin_inset Formula $I$
\end_inset

 the index set and 
\begin_inset Formula $x_{I}$
\end_inset

 the corresponding cylinder set.
 Let 
\begin_inset Formula $\pi\left(\pi\left(N\right)\right)$
\end_inset

 be the power set of index sets.
 An element 
\begin_inset Formula $\theta\in\pi\left(\pi\left(N\right)\right)$
\end_inset

 is a set of index sets, and we define a possible world 
\begin_inset Formula $\theta$
\end_inset

 to be such a subset.
 That is, each possible world 
\begin_inset Formula $\theta$
\end_inset

 is a specific factorization of 
\begin_inset Formula $P\left(x\left|h_{\theta}\right.\right)$
\end_inset

 into components.
 That is, 
\begin_inset Formula 
\[
P\left(x\left|h_{\theta}\right.\right)=\prod_{I\in\theta}f_{\theta}\left(x_{I}\right)
\]

\end_inset

Each factor 
\begin_inset Formula $f_{\theta}$
\end_inset

 is a probability, and so again, we could write 
\begin_inset Formula $f_{\theta}=p_{\theta}$
\end_inset

 but choose 
\begin_inset Formula $f$
\end_inset

 to avoid a confusing proliferation of 
\begin_inset Formula $p$
\end_inset

's.
 We could also write 
\begin_inset Formula $f_{\theta}\left(x_{I}\right)=f\left(x_{I}\left|h_{\theta}\right.\right)$
\end_inset

 but it seems easier to write the former, to make it clear that 
\begin_inset Formula $f$
\end_inset

 is a factor that is valid 
\emph on
only
\emph default
 in the possible world 
\begin_inset Formula $h_{\theta}$
\end_inset

 and in no other possible worlds.
 In particular, a given 
\begin_inset Formula $I$
\end_inset

 may or may not appear in other possible worlds.
 Implicit is that 
\begin_inset Formula $h_{\theta}$
\end_inset

 is a world-model at time-step 
\begin_inset Formula $N$
\end_inset

 and that other world models apply at later time steps.
\end_layout

\begin_layout Standard
There are multiple problems with the above.
\end_layout

\begin_layout Itemize
We should have been working with priors 
\begin_inset Formula $\mu$
\end_inset

 instead of posteriors 
\begin_inset Formula $P$
\end_inset

 (even though the two are related...)
\end_layout

\begin_layout Itemize
The number of factorizations is explosive: 
\begin_inset Formula $\left|\pi\left(N\right)\right|=2^{N}$
\end_inset

 and 
\begin_inset Formula $\left|\pi\left(\pi\left(N\right)\right)\right|=2^{2^{N}}$
\end_inset

 and this second number is much arger than astronimical.
\end_layout

\begin_layout Itemize
The factorization 
\begin_inset Formula $\theta$
\end_inset

 was purely syntactic, and we forgot to split out the semantic components.
 There should have been some latent 
\begin_inset Quotes eld
\end_inset

meaning
\begin_inset Quotes erd
\end_inset

 associated with the factorization.
 That is, the factorization must depend on the actual values 
\begin_inset Formula $x_{I}$
\end_inset

 and not (only) on the index set 
\begin_inset Formula $I$
\end_inset

.
 Different sequences 
\begin_inset Formula $x$
\end_inset

 must necesaril have different factorizations
\end_layout

\begin_layout Itemize
There's no attempt to detect sequences created by finite-state machines,
 never mind finite-state transducers.
 The whole point of ideas like HMM's is to infer the structue of a (hidden)
 state machine.
\end_layout

\begin_layout Subsubsection*
Attempt Two
\end_layout

\begin_layout Standard
The word 
\begin_inset Quotes eld
\end_inset

structure
\begin_inset Quotes erd
\end_inset

 is meant to have a model-theoretic flavor.
 That is, we factorize a possible world 
\begin_inset Formula $\theta$
\end_inset

 into a 
\begin_inset Quotes eld
\end_inset

sementics
\begin_inset Quotes erd
\end_inset

 or 
\begin_inset Quotes eld
\end_inset

meaning
\begin_inset Quotes erd
\end_inset

 
\begin_inset Formula $\zeta$
\end_inset

 and a 
\begin_inset Quotes eld
\end_inset

syntax
\begin_inset Quotes erd
\end_inset

 
\begin_inset Formula $\eta$
\end_inset

.
 The idea is to apply a syntactic analysis 
\begin_inset Formula $\eta$
\end_inset

 to the observation 
\begin_inset Formula $x=\left[x_{1},x_{2},\cdots,x_{N}\right]$
\end_inset

, factorizing the sequence into a structure described by syntax 
\begin_inset Formula $\zeta$
\end_inset

, and then ascribing some kind of 
\begin_inset Quotes eld
\end_inset

meaning
\begin_inset Quotes erd
\end_inset

 that goes along with that factorization.
 But what is this idea, exactly?
\end_layout

\begin_layout Standard
Ambiguous sentences can illustrate the idea: 
\begin_inset Quotes eld
\end_inset

He saw the man with the telescope
\begin_inset Quotes erd
\end_inset

.
 This has two distinct syntactic parses, with two distinct meanings; the
 meaning correlates strongly with the parse (linkage).
 At this point, we adopt the terminology of Link Grammar, so that, for a
 fixed grammar 
\begin_inset Formula $\eta$
\end_inset

, there may be multiple linkages 
\begin_inset Formula $l\left(x,\eta\right)$
\end_inset

 of sequence 
\begin_inset Formula $x$
\end_inset

.
\end_layout

\begin_layout Standard
Thus, as the first step of factorization, write
\begin_inset Formula 
\begin{align*}
\mu\left(\left.h_{\theta}\right|x\right)= & \mu\left(\left.h_{\eta}\right|x\right)\mu\left(\left.h_{\zeta}\right|h_{\eta},x\right)\\
= & \mu\left(\left.h_{\eta}\right|x\right)\sum_{l\left(x,\eta\right)}\mu\left(l\left(x,\eta\right)\right)\mu\left(\left.h_{\zeta}\right|l\left(x,\eta\right)\right)
\end{align*}

\end_inset

Here, 
\begin_inset Formula $\theta$
\end_inset

 reverts back to its 
\begin_inset Quotes eld
\end_inset

original
\begin_inset Quotes erd
\end_inset

 meaning as an index on a possible world, semantics and all.
 Thus, 
\begin_inset Formula $\mu\left(h_{\theta}\left|x\right.\right)$
\end_inset

 captures the likelihood or 
\begin_inset Quotes eld
\end_inset

belief
\begin_inset Quotes erd
\end_inset

 that the world model 
\begin_inset Formula $h_{\theta}$
\end_inset

 describes the 
\begin_inset Quotes eld
\end_inset

real world
\begin_inset Quotes erd
\end_inset

 (given the observation 
\begin_inset Formula $x$
\end_inset

).
 The first line of the equality above states that the factorization into
 syntax and semantics is literally a factorization, a product of two beliefs:
 first, 
\begin_inset Formula $\mu\left(\left.h_{\eta}\right|x\right)$
\end_inset

 that the structure of the world is described by a syntax 
\begin_inset Formula $\eta$
\end_inset

 (given that 
\begin_inset Formula $x$
\end_inset

 was observed), and second, the likelihood 
\begin_inset Formula $\mu\left(\left.h_{\zeta}\right|h_{\eta},x\right)$
\end_inset

 that the world has the 
\begin_inset Quotes eld
\end_inset

meaning
\begin_inset Quotes erd
\end_inset

 
\begin_inset Formula $\zeta$
\end_inset

 given both the syntax 
\begin_inset Formula $\eta$
\end_inset

 and also the observation 
\begin_inset Formula $x$
\end_inset

.
\end_layout

\begin_layout Standard
The second line of the equality atempts to deal with multiple distinct parses.
 The first term 
\begin_inset Formula $\mu\left(\left.h_{\eta}\right|x\right)$
\end_inset

 is unchanged: a syntax is compatible with an observed sequence, or it is
 not.
 That is, the sequence 
\begin_inset Formula $x$
\end_inset

 is parseable by syntax 
\begin_inset Formula $\eta$
\end_inset

 or it is not.
 Strict concepts of syntax imply that this has values of zero or one.
 Syntax with associated 
\begin_inset Quotes eld
\end_inset

costs
\begin_inset Quotes erd
\end_inset

 (
\emph on
a la
\emph default
 Link Grammar) allow fractional values.
\end_layout

\begin_layout Standard
The second term 
\begin_inset Formula $\mu\left(l\left(x,\eta\right)\right)$
\end_inset

 says that certain linkages have certain likelihoods, depending only on
 the grammar, but not the semantics.
 (This is the Link Grammar 
\begin_inset Quotes eld
\end_inset

cost
\begin_inset Quotes erd
\end_inset

 subsystem.) The third term 
\begin_inset Formula $\mu\left(\left.h_{\zeta}\right|l\left(x,\eta\right)\right)$
\end_inset

 says that the semantics explicitly depends on the linkage.
 
\end_layout

\begin_layout Standard
Thus, for example, 
\begin_inset Quotes eld
\end_inset

He saw the man with the telescope
\begin_inset Quotes erd
\end_inset

 has two equiprobable linkages: linkage 
\begin_inset Formula $A$
\end_inset

 where 
\begin_inset Quotes eld
\end_inset

he
\begin_inset Quotes erd
\end_inset

 is using 
\begin_inset Quotes eld
\end_inset

the telescope
\begin_inset Quotes erd
\end_inset

, and linkage 
\begin_inset Formula $B$
\end_inset

, where 
\begin_inset Quotes eld
\end_inset

the man
\begin_inset Quotes erd
\end_inset

 is carrying 
\begin_inset Quotes eld
\end_inset

the telescope
\begin_inset Quotes erd
\end_inset

.
 We can assign 
\begin_inset Formula $\mu\left(l\left(x,\eta\right)\right)=\mu\left(A\right)=\mu\left(B\right)=1/2$
\end_inset

 to each, as no other parses are possible, but we do not have enough knowledge
 to disambiguate between the two.
 Let us now fix 
\begin_inset Formula $h_{\theta}$
\end_inset

 and thus 
\begin_inset Formula $h_{\zeta}$
\end_inset

 so that our world model 
\begin_inset Formula $h_{\zeta}$
\end_inset

 is that there is a spy, named Joe, whose job responsibility is remote observati
on with telescopes.
 Given this world model, we wish to arrive at the conclusion that 
\begin_inset Formula $\mu\left(\left.h_{\zeta}\right|A\right)\approx1$
\end_inset

 and 
\begin_inset Formula $\mu\left(\left.h_{\zeta}\right|B\right)\approx0$
\end_inset

.
 But how, exactly, can we conclude this?
\end_layout

\begin_layout Subsubsection*
Attempt Three
\end_layout

\begin_layout Standard
Perhaps the mistake above is that we should have written
\end_layout

\begin_layout Standard
\begin_inset Formula 
\[
\mu\left(\left.h_{\theta}\right|x\right)=\mu\left(\left.h_{\eta}\right|x\right)\sum_{l\left(x,\eta\right)}\mu\left(l\left(x,\eta\right)\right)P\left(l\left(x,\eta\right)\left|h_{\zeta}\right.\right)
\]

\end_inset

so that 
\begin_inset Formula $P\left(l\left(x,\eta\right)\left|h_{\zeta}\right.\right)$
\end_inset

 is the posterior probability that the linkage 
\begin_inset Formula $l\left(x,\eta\right)$
\end_inset

 is correct, given the world model 
\begin_inset Formula $h_{\zeta}$
\end_inset

.
 Does this help in any way? It seems like it should: we can use logical
 inference: if we beleive in spies with telescopes, then we can infer that
 
\begin_inset Formula $P\left(A\left|h_{\zeta}\right.\right)\approx1$
\end_inset

 and 
\begin_inset Formula $P\left(B\left|h_{\zeta}\right.\right)\approx0$
\end_inset

 through explicit mechanical steps.
\end_layout

\begin_layout Standard
This example, however, is not parsing, but is reference resolution, and
 so that is a distinct process employing a somewhat different set of algorithms.
 I think it can also can be done with the existing Link Grammar style framework,
 but requires spilling more words onto the page, Perhaps the simpler case
 of parsing should be clarified first.
\end_layout

\begin_layout Subsubsection*
Attempt Four
\end_layout

\begin_layout Standard
The original project of factorization, started above, was to justify the
 current MST parsing concept, or maybe even to correct it, by formalizing
 counting.
 So far, we've not made a dent.
 So lets try again.
 Since we are learning from nothing at all, we will not have a world-model
 yet, and so we set 
\begin_inset Formula $\zeta=\varnothing$
\end_inset

 as the prior.
 The first goal is to learn just the surface syntax 
\begin_inset Formula $\eta$
\end_inset

, and nothing deeper.
 Per attempt three, this implies a uniform distribution 
\begin_inset Formula $P\left(l\left(x,\eta\right)\left|h_{\zeta}\right.\right)=P\left(l\left(x,\eta\right)\left|\varnothing\right.\right)=1$
\end_inset

 which in turn implies the normalization 
\begin_inset Formula 
\[
\sum_{l\left(x,\eta\right)}\mu\left(l\left(x,\eta\right)\right)=1
\]

\end_inset

Hmmm.
\end_layout

\begin_layout Subsubsection*
Attempt Five
\end_layout

\begin_layout Subsubsection*
Sentence tokenization
\end_layout

\begin_layout Standard
Note that the creation of linkages 
\begin_inset Formula $l\left(x,\eta\right)$
\end_inset

 is a form of tokenization: these can only be meaningfully created when
 enough tokens have been gathered for a sentence.
 Thus, we conclude that 
\begin_inset Formula $\mu\left(l\left(x,\eta\right)\right)\approx0$
\end_inset

 for 
\emph on
all
\emph default
 linkages whenever the sequence 
\begin_inset Formula $x$
\end_inset

 does not form a full or coherent sentence.
 This implies that 
\begin_inset Formula $\mu\left(\left.h_{\theta}\right|x\right)\approx0$
\end_inset

 when 
\begin_inset Formula $x$
\end_inset

 is not coherent.
 That is, incoherent statements do not support any particular world model.
\end_layout

\begin_layout Standard
What does this imply for 
\begin_inset Quotes eld
\end_inset

updating ones priors
\begin_inset Quotes erd
\end_inset

? Well, obviously, we should not update at all, in this situation.
 Formally, mathematically, how does this work? Well, as a post-condition,
 we need to maintain normalization 
\begin_inset Formula $1=\int_{\theta}\mu\left(\left.h_{\theta}\right|x\right)$
\end_inset

.
 But ...
 Hmm.
 We're missing some kind of detail, here.
\end_layout

\begin_layout Standard
Garden-path sentences provide an interesting test for this notion of tokenizatio
n.
 In this case, as more words arrive, we have to back-track our priors back
 to an earlier state.
 How, exectly, does this work, mathematically?
\end_layout

\begin_layout Subsubsection*
Parallel (high-performance) tokenization
\end_layout

\begin_layout Standard
Can we use the above kind of sentence-tokenization analysis as a simpler-to-work
 with example of 
\begin_inset Quotes eld
\end_inset

sub-symbolic tokenization
\begin_inset Quotes erd
\end_inset

? So, for example, of converting audio fragments into phonemes or words,
 or converting visual fragments into recognizable objects?
\end_layout

\begin_layout Standard
Conventional parsing involves starting with a grammar 
\begin_inset Formula $\eta$
\end_inset

 and attempting to parse a sequence 
\begin_inset Formula $x$
\end_inset

 with it, which either fails or succeeds.
 However, we can consider an architecture that constructs a collection of
 recognizers 
\begin_inset Formula $\rho$
\end_inset

, each of which correspond to some specific linkage, and return true/false
 (or a number 
\begin_inset Formula $\mu$
\end_inset

) if that linkage fits.
 The intent is that the recognizers are kind-of-like perceptrons, except
 that they are syntactic perceptrons.
 Alternately, each recognizer can be thought of as a decision tree.
 The outcome of recognition is (exactly) the same as conventional parsing,
 but the process can be highly parallelized, as each recognizer can run
 on a distinct CPU and provide its answer 
\begin_inset Formula $\mu$
\end_inset

.
 For natural langauge, this is perhaps not so interesting, but for sensory
 processing, this becomes very important.
\end_layout

\begin_layout Standard
So, for example, consider the syntactic elements that describe many European
 flags: they have three horizontal stripes, of three colors.
 The structure can be given a grammar, involving the legnth and width of
 rectangular regions, etc.
 We can attempt to analyze a collection of pixels to see how well they conform
 to this syntax, and thus determine the nation from the image.
 Conversly, we can build perceptrons that trigger only whe resented with
 horizontal stripes.
 On this, we layer a perceptron that recognizes exactly three stripes, and
 not two or four, and then after this, some more perceptrons that recognize
 specific colors.
 This can be done by organizing image-processing filters/primitives into
 a specific hierarchical tree.
 That tree implements a specific, syntactically-legal arrangement of the
 grammatical elements of the flag.
 Each distinct, valid tree can run on a distinct GPU.
\end_layout

\begin_layout Standard
By this hand-waving argument, we conclude that each such filter-processing
 arrangment is in one-to-one correspondance with a valid syntax tree.
 If we limit ourselves to only one filter-processing element, the classic
 perceptron, then each deep network is in correspondance with a specific
 syntactic tree.
 
\end_layout

\begin_layout Standard
There are two interesting cases, here.
 One is where there are more then a few distinct image-processing operations,
 and as these are distinct, then the mapping to syntax is simplified, as
 the distinction helps keep the mapping clear.
 In particular, one can imagine that syntactic trees are randomly generated,
 and kept only if they appear to be effective, 
\emph on
a la
\emph default
 
\begin_inset Quotes eld
\end_inset

decision forest
\begin_inset Quotes erd
\end_inset

 training.
 The task is then: given a collection of decision trees, what is the correspondi
ng syntax that generates those decision trees, and only those trees, and
 no others?
\end_layout

\begin_layout Standard
The second case is where the only image-processing operation is the perceptron,
 and the only form of composition is layering.
 This architecture results in fairly tightly-bound, non-sparse networks
 which do not easily decompose into syntactic elements.
 Instead, there are perhaps some weaker and some stronger connections, but
 identifying those using formal methods is challenging.
 Analyzing how they decompose into syntactic elements is harder still.
 The (only?) saving grace is the 
\begin_inset Quotes eld
\end_inset

deep and cheap
\begin_inset Quotes erd
\end_inset

 idea, that the deeper networks are easier to decompose into primitives
 than the shallow networks.
 That is, the mapping is to decompose a deep layering of perceptrons into
 a large collection of decision trees, and then, given this collection of
 decision trees, to infer the syntax that describes them.
\end_layout

\begin_layout Standard
Lets go through the motions of formalizing this.
 First, lets formalize the notion of parallelism.
 This seems relatively trivial, but lets do it anyway.
 Replace the input word 
\begin_inset Formula $x=\left[x_{1},x_{2},\cdots,x_{N}\right]$
\end_inset

 by a variable sequence 
\begin_inset Formula $v=\left[v_{1},v_{2},\cdots,v_{N}\right]$
\end_inset

.
 The difference here is that each of the 
\begin_inset Formula $x_{k}$
\end_inset

 were previously understood to be actual observed constants, and so one
 could obtain an explicit linkage 
\begin_inset Formula $l\left(x,\eta\right)$
\end_inset

 and assign to it an explicit real number 
\begin_inset Formula $\mu\left(l\left(x,\eta\right)\right)$
\end_inset

.
 Replacing the concrete values 
\begin_inset Formula $x$
\end_inset

 by variables 
\begin_inset Formula $v$
\end_inset

 yeilds an unevaluated function 
\begin_inset Formula 
\[
\mu\left(l\left(v,\eta\right)\right):W^{N}\to\mathbb{R}
\]

\end_inset

and, for shorthand, write 
\begin_inset Formula $\rho\left(v,\eta\right)=\mu\left(l\left(v,\eta\right)\right)$
\end_inset

 so that there is one rcognizer 
\begin_inset Formula $\rho$
\end_inset

 for each potential linkage 
\begin_inset Formula $l$
\end_inset

.
 So, at this stage, the formation of regonizers is formally trivial.
 We only need to assume that some given linkage 
\begin_inset Formula $l$
\end_inset

 exists and can be wired up, independently of the tokens 
\begin_inset Formula $x$
\end_inset

.
 The tokens 
\begin_inset Formula $x$
\end_inset

 are only plugged in later, to obtain the actual numberical value for 
\begin_inset Formula $\mu$
\end_inset

.
 Nothing to see here, folks, move along.
\end_layout

\begin_layout Standard
We are now left with the two hard parts:
\end_layout

\begin_layout Itemize
How to infer a syntax, given a collection of decision trees?
\end_layout

\begin_layout Itemize
How to infer a collection of decision trees, given a deep feed-forward perceptro
n network?
\end_layout

\begin_layout Section*
Factorization
\end_layout

\begin_layout Standard
What used to be here, this entire section, has been cut and moved to a new
 paper called 
\begin_inset Quotes eld
\end_inset

Factorization
\begin_inset Quotes erd
\end_inset

 in the sheaf/docs directory, in the AtomSpace git repo.
\end_layout

\begin_layout Subsection*
Three-way relationships
\end_layout

\begin_layout Standard
What can be said from first principles? Well, pair-wise MI is an indicator
 of how correlated to variables 
\begin_inset Formula $a,b$
\end_inset

 are.
 That is, if 
\begin_inset Formula $a,b$
\end_inset

 are statistically independent, then the probability factorizes: 
\begin_inset Formula $p\left(a,b\right)=p\left(a\right)p\left(b\right)$
\end_inset

 and so the ratio 
\begin_inset Formula $p\left(a,b\right)/p\left(a\right)p\left(b\right)$
\end_inset

 deviates from 1.0 if 
\begin_inset Formula $a,b$
\end_inset

 are correlated.
 Likewise for three variables: 
\begin_inset Formula $p\left(a,b,c\right)/p\left(a\right)p\left(b\right)p\left(c\right)$
\end_inset

 indicates the degree to which 
\begin_inset Formula $a,b,c$
\end_inset

 are not independent.
\end_layout

\begin_layout Standard
But what if we know, a priori, that 
\begin_inset Formula $a,b,c$
\end_inset

 are pair-wise dependent on one another.
 What additional information does 
\begin_inset Formula $p\left(a,b,c\right)$
\end_inset

 reveal, above and beyond the pair-wise information? In this case, the pointwise
 MI is
\begin_inset Formula 
\[
MI\left(a,b,c\right)=\log_{2}\frac{p\left(a,b,*\right)p\left(a,*,c\right)p\left(*,b,c\right)}{p\left(a,b,c\right)p\left(a,*,*\right)p\left(*,b,*\right)p\left(*,*,c\right)}
\]

\end_inset

This factors:
\begin_inset Formula 
\begin{align*}
MI\left(a,b,c\right)= & \log_{2}\frac{p\left(a,b,*\right)}{p\left(a,*,*\right)p\left(*,b,*\right)}+\log_{2}\frac{p\left(a,*,c\right)p\left(*,b,c\right)}{p\left(a,b,c\right)p\left(*,*,c\right)}\\
= & MI\left(a,b\right)+\log_{2}\frac{p\left(a,*,c\right)}{p\left(a,*,*\right)p\left(*,*,c\right)}+\log_{2}\frac{p\left(a,*,*\right)p\left(*,b,c\right)}{p\left(a,b,c\right)}\\
= & MI\left(a,b\right)+MI\left(a,c\right)+\log_{2}\frac{p\left(*,b,c\right)}{p\left(*,b,*\right)p\left(*,*,c\right)}+\log_{2}\frac{p\left(a,*,*\right)p\left(*,b,*\right)p\left(*,*,c\right)}{p\left(a,b,c\right)}\\
= & MI\left(a,b\right)+MI\left(a,c\right)+MI\left(b,c\right)-\log_{2}\frac{p\left(a,b,c\right)}{p\left(a,*,*\right)p\left(*,b,*\right)p\left(*,*,c\right)}
\end{align*}

\end_inset

So we see that MI really is (should be) additive (as the above follows from
 Bayes rule).
 The last term is clearly how much 
\begin_inset Formula $p\left(a,b,c\right)$
\end_inset

 fails to consist of completely independent terms.
\end_layout

\begin_layout Standard
Lets give the last term a name:
\begin_inset Formula 
\[
IC\left(a,b,c\right)=\log_{2}\frac{p\left(a,b,c\right)}{p\left(a,*,*\right)p\left(*,b,*\right)p\left(*,*,c\right)}
\]

\end_inset


\end_layout

\begin_layout Standard
The technical problem here is that this third term can be huge, in practice,
 because it needs to make up for 
\begin_inset Quotes eld
\end_inset

over-counting
\begin_inset Quotes erd
\end_inset

 in the sum of the pair-wise terms.
\end_layout

\begin_layout Standard
This is confusing.
 What have we really done? Note that 
\begin_inset Formula $p\left(a,b,c\right)$
\end_inset

 appears on both the left and right above, and thus can be cancelled out.
 This leaves only
\begin_inset Formula 
\begin{align*}
MI\left(a,b,c\right)+IC\left(a,b,c\right)= & MI\left(a,b\right)+MI\left(a,c\right)+MI\left(b,c\right)\\
= & \log_{2}\frac{p\left(a,b,*\right)p\left(a,*,c\right)p\left(*,b,c\right)}{\left[p\left(a,*,*\right)p\left(*,b,*\right)p\left(*,*,c\right)\right]^{2}}
\end{align*}

\end_inset

which is interesting ...
 except the r.h.s is devoid of the 
\begin_inset Quotes eld
\end_inset

interesting value
\begin_inset Quotes erd
\end_inset

 
\begin_inset Formula $p\left(a,b,c\right)$
\end_inset

, which is what we wanted to factor.
 That is, the three pair-wise MI's have nothing to tell us about the 
\begin_inset Quotes eld
\end_inset

true nature
\begin_inset Quotes erd
\end_inset

 of 
\begin_inset Formula $p\left(a,b,c\right)$
\end_inset

.
 
\end_layout

\begin_layout Standard
Perhaps instead, we should be writing
\begin_inset Formula 
\[
IC\left(a,b,c\right)=MI\left(a,b\right)+MI\left(a,c\right)+MI\left(b,c\right)-MI\left(a,b,c\right)
\]

\end_inset

and claim that IC is the 
\begin_inset Quotes eld
\end_inset

quantity of interest
\begin_inset Quotes erd
\end_inset

 that 
\begin_inset Quotes eld
\end_inset

actually describes
\begin_inset Quotes erd
\end_inset

 
\begin_inset Formula $p\left(a,b,c\right)$
\end_inset

? Up to 
\begin_inset Quotes eld
\end_inset

normalization
\begin_inset Quotes erd
\end_inset

 by the 
\begin_inset Quotes eld
\end_inset

independent
\begin_inset Quotes erd
\end_inset

 individual p's, that is? That way, we can arrive at an independence hypothsis
 that IC can be approximately described by pair-wise terms, whenever 
\begin_inset Formula $MI\left(a,b,c\right)\approx0$
\end_inset

.
 This is, after all, the ultimate goal: to write a 
\begin_inset Quotes eld
\end_inset

perturbative series
\begin_inset Quotes erd
\end_inset

 expressing the true joint probability in terms of graphical subcomponents.
 
\end_layout

\begin_layout Subsection*
Tri-part experiment/demo
\end_layout

\begin_layout Standard
Can we gain insight from a classic demo? The classic three-word sentences:
 {sunny/cloudy} x {calm/rain} x {dry/wet}.
 There are 8 possible sentences, each with different probabilities.
 The conventional interpretation is that sunshowers are rare, and that there's
 a hidden process: the law might be wet because it is being sprinkled.
 The hope here is that the most likely parse corresponds to the causal structure
, and that perhaps the hidden process might be revealed.
 Let's try it and see what happens.
\end_layout

\begin_layout Standard
A demo table, with some 
\begin_inset Quotes eld
\end_inset

plausible
\begin_inset Quotes erd
\end_inset

 probability assignments.
 Half the time, its sunny, and half the time not.
 If its not raining, then the lawn is being watered half the time.
 Low probabilites assigned to freak events.
\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
\align center
\begin_inset Tabular
<lyxtabular version="3" rows="10" columns="8">
<features tabularvalignment="middle">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Cloud
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Rain
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Wet
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $p$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\log_{2}p$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
MI(a,b,c)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
IC(a,b,c)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Comments
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
F
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
F
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
F
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-2
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.415
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.830
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Sunny, calm and dry
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
F
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
F
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
T
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-2
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.322
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.093
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Sunny, lawn is being watered
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
F
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
T
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
F
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-6
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-19.9
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-10.686
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-15.482
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Sunshower, not enough to wet the lawn
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
F
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
T
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
T
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-13.25
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.307
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-9.575
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Sunshower
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
T
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
F
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
F
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-3
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.585
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.170
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Cloudy, dry
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
T
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
F
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
T
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-3
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.263
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.907
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Cloudy, lawn being watered
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
T
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
T
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
F
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-7
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-23.21
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
2.874
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-18.803
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Cloudy and raining but not wet???
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
T
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
T
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
T
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-2
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.263
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.678
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Cloudy and raining
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
*
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
*
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
*
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.04875
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.5145
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Total
\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
Can we ascribe meanings to the above? The 
\begin_inset Quotes eld
\end_inset

almost impossible
\begin_inset Quotes erd
\end_inset

 situation 'Cloudy and raining but not wet???' is signaled by the highest
 MI in the set.
 Turns out the painters had spread a tarp on the lawn, and it rained.
 So the MI captures this unusual situation, but the low probability would
 do that, too.
 The large negative MI's denote anti-correlation, which should also be taken
 as 
\begin_inset Quotes eld
\end_inset

important
\begin_inset Quotes erd
\end_inset

.
 The rest of the values don't pop out and suggest that they're capturing
 anything deeply meaningful or important.
\end_layout

\begin_layout Standard
The total MI is
\begin_inset Formula 
\[
MI_{\mbox{Tot}}=\sum_{a,b,c}p\left(a,b,c\right)MI\left(a,b,c\right)
\]

\end_inset

and likewise of IC.
 Changing around the probabilities will make 
\begin_inset Formula $MI_{\mbox{Tot}}$
\end_inset

 go more positive or negative, but not dramatically so – bounded to 
\begin_inset Formula $\pm0.5$
\end_inset

 roughly.
 Seems that IC is non-negative, going to zero only when all vars are independent.
 An obvious proof that IC is non-negative is not popping into my head right
 now.
\end_layout

\begin_layout Standard
An issue with the above demo is that there are two hidden variables: the
 lawn sprinkler, and the painters tarp over the lawn that keeps the lawn
 sometimes dry.
 How would this demo change, if these weren't hidden? Next section...
\end_layout

\begin_layout Subsubsection*
Pairwise-MI
\end_layout

\begin_layout Standard
Lets crunch the numbers for pairwise MI.
\begin_inset Foot
status collapsed

\begin_layout Plain Layout
Scripts for this are in `p9-tri/pmi.scm`.
\end_layout

\end_inset

 There are some almost-ties, due to the watering schedule being exactly
 50-50 on any day that's not raining.
 These ties are broken only by the freak events.
\end_layout

\begin_layout Standard
The MST column shows the parse that maximizes the sentence MI.
 Due to nearly-identical pair-wise MI's in some cases, there are plausible
 alternate parses; these are listed as second rows.
\end_layout

\begin_layout Standard
Due to nearly-identical pair-wise MI's in some cases, there are plausible
 alternate parses; these are listed as second rows.
 This entire section started out with the question: 
\begin_inset Quotes eld
\end_inset

how should such almost-as-good parses be weighted?
\begin_inset Quotes erd
\end_inset

 and we still don't have an answer for that, nor is one forthcoming with
 the present example.
\end_layout

\begin_layout Standard
There are two MI columns.
 The 
\begin_inset Formula $MI_{s}$
\end_inset

 column shows the sum of the two MI's of the two links in the MST parse.
 The 
\begin_inset Formula $MI_{w}$
\end_inset

 column shows the weighted sum:
\begin_inset Formula 
\[
MI_{w}=\frac{p_{r_{1}}MI_{r_{1}}+p_{r_{2}}MI_{r_{2}}}{p_{r_{1}}+p_{r_{2}}}
\]

\end_inset

where 
\begin_inset Formula $r_{1}$
\end_inset

 and 
\begin_inset Formula $r_{2}$
\end_inset

 are the two links in the MST parse.
\end_layout

\begin_layout Standard
\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
\align center
\begin_inset Tabular
<lyxtabular version="3" rows="12" columns="9">
<features tabularvalignment="middle">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Sent
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $p$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
MI(a,b,c)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
MI(a,b,*)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
MI(a,*,c)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
MI(*,b,c)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
MST
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $MI_{s}$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $MI_{w}$
\end_inset


\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
FFF
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.41488989
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.4148883
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.41488989
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.4151826
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ac)(bc)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.830
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.41506
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ab)(bc)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Quotes eld
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
FFT
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.322
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Quotes eld
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.3223
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.3220
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ab)(bc)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.093
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.099
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ab)(ac)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Quotes eld
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
FTF
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-6
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-10.686
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-11.823
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.415
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-16.344
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ab)(ac)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-9.824
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.410
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
FTT
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.307
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Quotes eld
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.322
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.678
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ac)(bc)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.356
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.178
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
TFF
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.585
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.584813
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.5854816
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.415
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ab)(bc)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.170
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.5848
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ac)(bc)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Quotes eld
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
TFT
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.263
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Quotes eld
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.263
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.322
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ac)(bc)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.059
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.0295
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
TTF
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-7
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
2.874
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.000
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.585
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-16.344
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ab)(ac)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.415
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.4714
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
TTT
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.263
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Quotes eld
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.263
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.678
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ab)(bc)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.678
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8387
\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
Developing a gut-sense understanding the table above is ...
 difficult.
 The pair-wise MI's make sense, except when they are surprising.
 We have the biggest MI(cloudy, rain) = 1.0 and second-biggest MI(rain, wet)
 = 0.68 which satisfies our common-sense but seems preposterious, given the
 use of the lawn sprinkler.
 We have a 3-way tie for 3rd-place: MI(sunny, calm) = MI(sunny, dry) = MI(calm,
 dry) = 0.415 which also fits the 
\begin_inset Quotes eld
\end_inset

common-sense
\begin_inset Quotes erd
\end_inset

 pattern but seems to contradict the idea that we water the lawn for half
 of the sunny days.
 Is this some artifact?
\end_layout

\begin_layout Standard
The large negative value MI(rain,dry) correctly reflects that its almost
 impossible for the lawn to be dry when it rains (again, reflecting the
 surprise that the painters stretched a tarp over the lawn, and then it
 rained.)
\end_layout

\begin_layout Standard
The sentences ranked by total 
\begin_inset Formula $MI_{s}$
\end_inset

, highest to lowerst, are TOT(cloudy rain wet) = 1.678 and TOT(sunny calm
 dry) = 0.83 and TOT (cloudy rain dry) = 0.415 although this third sentence
 is extremely rare.
 The MI on this third sentence fails to capture the surprise represented
 by MI(rain,dry) because MST parsing says 
\begin_inset Quotes eld
\end_inset

maximize MI
\begin_inset Quotes erd
\end_inset

, instead of 
\begin_inset Quotes eld
\end_inset

maximize absolute value of MI
\begin_inset Quotes erd
\end_inset

.
 Will it be important to revisit large negative MI? In the sense that 
\begin_inset Quotes eld
\end_inset

these things should not go together, so if they are showing up nearby, we
 should ask why?
\begin_inset Quotes erd
\end_inset


\end_layout

\begin_layout Standard
The 
\begin_inset Formula $MI_{s}$
\end_inset

 column reveals an answer to our earlier hypothesis: can the MST parse into
 pair-wise links approximate the tri-part MI? The answer seems to be 
\begin_inset Quotes eld
\end_inset

somewhat, but not terribly well
\begin_inset Quotes erd
\end_inset

.
 The FTF value is the only one that gives us hope.
 BTW, its clear that 
\begin_inset Formula $MI_{w}$
\end_inset

 is a failure at this: again, the weighting by probability makes common
 word-pairs unduly important, when they should not be.
\end_layout

\begin_layout Subsubsection*
Disjuncts
\end_layout

\begin_layout Standard
Let look at the disjuncts from these parses.
 Ties given in a second row.
\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
\align center
\begin_inset Tabular
<lyxtabular version="3" rows="12" columns="7">
<features tabularvalignment="middle">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Sent
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $p$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
MST
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
tie
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
a
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
b
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
c
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
FFF
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ac)(bc)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
v
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny: dry+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
calm: dry+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dry: sunny-&calm-
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ab)(bc)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny: calm+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
calm: sunny-&dry+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dry: calm-
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
FFT
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ab)(bc)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
v
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny: calm+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
calm: sunny-&wet+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wet: calm-
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ab)(ac)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny: calm+&wet+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
calm: sunny-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wet: sunny-
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
FTF
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-6
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ab)(ac)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny: rain+&dry+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
rain: sunny-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dry: sunny-
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
FTT
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ac)(bc)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny: wet+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
rain: wet+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wet: sunny-&rain-
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
TFF
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ab)(bc)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
v
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cloudy: calm+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
calm: cloudy-&dry+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dry: calm-
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ac)(bc)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cloudy: dry+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
calm: dry+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dry: cloudy-&calm-
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
TFT
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ac)(bc)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cloudy: wet+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
calm: wet+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wet: cloudy-&calm-
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
TTF
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-7
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ab)(ac)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cloudy: rain+&dry+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
rain: cloudy-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dry: cloudy-
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
TTT
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
(ab)(bc)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cloudy: rain+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
rain: cloudy-&wet+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wet: rain-
\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
The task is to weight these, either by observation count, or by a observation
 count times some weight (as proposed above).
 The resulting dict is below.
 The observation count is 
\begin_inset Formula $Np$
\end_inset

, so that for 
\begin_inset Formula $N$
\end_inset

 total observations, only a fraction 
\begin_inset Formula $p$
\end_inset

 is encountered.
 The dict is below.
\end_layout

\begin_layout Standard
\align center
\begin_inset VSpace defskip
\end_inset


\begin_inset Tabular
<lyxtabular version="3" rows="31" columns="5">
<features tabularvalignment="middle">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
word
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $p$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
disjunct
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
MI(w,d)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $MI_{L}$
\end_inset


\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
calm+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.4148883
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.4148883
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dry+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.5851
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.41488989
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
calm+&wet+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9998
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.04629
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wet+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-10.25
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.3223
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-6
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
rain+&dry+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.862
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-5.704
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cloudy
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
rain+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8302
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wet+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.1710
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/16
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
calm+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.7547
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/16
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dry+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-1.7547
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-7
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
rain+&dry+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-2.6292
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
calm
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
3/16
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dry+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.58481
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wet+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.5860
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.58482
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny-&dry+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.41519
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny-&wet+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.41519
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/16
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cloudy-&dry+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.41519
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
rain
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cloudy-&wet+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.9995
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wet+
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-9.254
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-6
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-15.896
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-7
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cloudy-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9995
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dry
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
3/16
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
calm-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.6782
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny-&calm-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
18.311
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/16
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cloudy-&calm-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.1698
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-6
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-16.4808
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-7
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cloudy-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.41518
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wet
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
rain-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.6780
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.3220
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
calm-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.6439
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cloudy-&calm-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.0930
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny-&rain-
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.67780
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
The MI(w,d) column shows the word-disjunct MI.
 The 
\begin_inset Formula $MI_{L}$
\end_inset

 column shows the average MI of all the links in the disjunct.
 Above it was proposed that 
\begin_inset Formula $MI_{L}$
\end_inset

 could be used as a weighting during counting.
 But this can't really work: weights should be non-negative.
 Perhaps we are getting ahead of ourselves.
\end_layout

\begin_layout Standard
\align center
\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
\align center
\begin_inset Tabular
<lyxtabular version="3" rows="9" columns="10">
<features tabularvalignment="middle">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
a
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
b
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
c
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $p$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
MI(a,d)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
MI(b,d)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
MI(c,d)
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
MI-tot
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $MI_{s}$
\end_inset


\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
MI(a,b,c)
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
calm
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dry
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.4149
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.4152
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.6782
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.508
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.830
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.415
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
calm
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wet
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.4150
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.4152
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.6649
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.1653
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.093
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.322
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
rain
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dry
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-6
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.862
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-15.896
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-16.4808
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-9.824
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-10.686
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sunny
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
rain
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wet
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-10.25
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-9.254
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.67780
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.356
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.307
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cloudy
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
calm
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dry
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.7547
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.41519
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.6782
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.170
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.585
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cloudy
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
calm
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wet
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/8
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.1710
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.5860
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.0930
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.664
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-0.059
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.263
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cloudy
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
rain
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
dry
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1e-7
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
-2.6292
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.9995
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.41518
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.415
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
2.874
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
cloudy
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
rain
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wet
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1/4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.8302
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.9995
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.6780
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
3.508
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.678
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
0.263
\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\end_layout

\begin_layout Standard
\align center
\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
The above table attempts to validate our hypothesis that the DJ parsse is
 better than the MST parse for matching the three-term MI value.
 Well, its not obviously better.
 Actually, it seems worse.
 But who knows.
 Maybe this was a bad example.
 (Are we even asking for the right thing? I think we are ...) 
\end_layout

\begin_layout Standard
However: the binomial-MI featured an alternating sign.
 Should we be doing something wacky, like subtracting link MI's from dj-MI's
 ? But this is ...
 I dunno; the whole point was to obtain something more ...
 perturbative ...
\end_layout

\begin_layout Section*
MST/MPG Distributions
\end_layout

\begin_layout Standard
When MST/MPG parses are created, how should the disjuncts in them be weighted,
 during counting? We can weight several different ways:
\end_layout

\begin_layout Itemize
Each is counted with a weight of 1.0
\end_layout

\begin_layout Itemize
Each is counted with a weight proportional to the total sentence MI.
\end_layout

\begin_layout Itemize
Each is counted with a weight that is a sum of the MI's of the word-pairs
 in that disjunct.
\end_layout

\begin_layout Itemize
Each is counted with a weight that is the average of the MI's of the pairs
 in that disjunct.
\end_layout

\begin_layout Itemize
Each is counted with the weighted average of the MI's of the pairs in the
 disjunct.
 
\end_layout

\begin_layout Itemize
The weighting is adjusted in some way by the number of connectors in the
 disjunct.
\end_layout

\begin_layout Standard
The last option seems to be the most appealing, the most 
\begin_inset Quotes eld
\end_inset

natural
\begin_inset Quotes erd
\end_inset

 to me.
 There are several variants for that.
 Lets define these clearly.
 Let 
\begin_inset Formula $w$
\end_inset

 be the germ of the section, and 
\begin_inset Formula $c$
\end_inset

 be the other connecting words in the section.
 Let 
\begin_inset Formula $MI\left(w,c\right)$
\end_inset

 be the MI, with the understanding that the order is reversed, when the
 word order is reversed (
\emph on
i.e.

\emph default
 connecting to left instead of right.) Let 
\begin_inset Formula $p\left(w,c\right)$
\end_inset

 be the corresponding frequency.
 Let 
\begin_inset Formula $\left|C\right|$
\end_inset

 be the total number of connectors in the section.
 Then the unweighted average is
\begin_inset Formula 
\[
\mbox{avg}_{C}\left(w\right)=\frac{1}{\left|C\right|}\sum_{c\in C}MI\left(w,c\right)
\]

\end_inset

The weighted average is
\begin_inset Formula 
\[
E_{C}\left(w\right)=\frac{\sum_{c\in C}p\left(w,c\right)MI\left(w,c\right)}{\sum_{c\in C}p\left(w,c\right)}
\]

\end_inset

This weighted average is problematic because common words drown out rare
 words in the summation.
 This is because 
\begin_inset Formula $p\left(w,c\right)$
\end_inset

 is huge, if 
\begin_inset Formula $c$
\end_inset

 is a word like 
\begin_inset Quotes eld
\end_inset

the
\begin_inset Quotes erd
\end_inset

.
 We don't really want such pairs to drown out the average, since the link
 to 
\begin_inset Quotes eld
\end_inset

the
\begin_inset Quotes erd
\end_inset

 is likely the least important contribution to this average.
\end_layout

\begin_layout Standard
Thus, perhaps the MI itself can provide a weight:
\begin_inset Formula 
\[
WI_{C}\left(w\right)=\frac{\sum_{c\in C}2^{MI\left(w,c\right)}MI\left(w,c\right)}{\sum_{c\in C}2^{MI\left(w,c\right)}}
\]

\end_inset

This uses 2 not exp because we measure MI in bits not nats.
 Moving forward, WI will be used, mostly because it seems like the best
 candidate, and I can't think of anything that feels more suitable.
\begin_inset Foot
status collapsed

\begin_layout Plain Layout
I find this very annoying.
 A decade later, I am still flying blind, with no theoretical foundations
 for any of the decisions being made here; just a bunch of gut-feels.
 I admit that perhaps I am not reading enough papers.
 But I do read a lot of paper abstracts, and none of the abstracts mention
 any work along these lines.
\end_layout

\end_inset


\end_layout

\begin_layout Standard
Next, we have the question: how many MST parses should be sampled? If it's
 only a few, then weighting probably doesn't matter.
 If its a lot, then we want to make sure that the top parses contribute
 the most, and are NOT drowned out by the large number of mediocre parses.
 So we're back again, asking old questions:
\end_layout

\begin_layout Itemize
For a fixed sentence, with parses ranked by MST, how does the sentence total
 MI decrease as a function of rank? (Relative to the parse with the highest
 MI?)
\end_layout

\begin_layout Itemize
What is the integral of the above? i.e.
 for a fixed sentence, what is the average sentence total MI? That is, for
 a fixed sentence, sum the MI over all MST parses, and divide by the number
 of parses.
\end_layout

\begin_layout Itemize
What is the distribution of the total MI of the disjuncts, (given as the
 sum of MI for the pairs)? How does this vary as a function of the number
 of parses that are accepted?
\end_layout

\begin_layout Standard
To make the above more precise: Let 
\begin_inset Formula $L$
\end_inset

 be the set of links in a sentence.
 The total MI for that sentence is then
\begin_inset Formula 
\[
MI_{\mbox{Tot}}=\sum_{\left(u,w\right)\in L}MI\left(u,w\right)
\]

\end_inset

This is naturally larger for longer sentences, which would seem to give
 undue weight to long sentences.
 How to rectify this? One available average is to divide by the number of
 words in the sentence.
 This then encourages parses with more links in them, 
\emph on
i.e.

\emph default
 maximizes for parses which have lots of loops.
 Again, perhaps undesirable? Another average is to divide by the total number
 of links in the sentence.
 This seems the most fair, although it does seem to (mildly) discourage
 parses with loops: Adding one more link, to form a loop, will 
\emph on
always
\emph default
 decrease this average, since the added link must necessarily have a lower
 MI than the others that were chosen.
\end_layout

\begin_layout Standard
A spanning tree with 
\begin_inset Formula $W$
\end_inset

 vertices will consist of 
\begin_inset Formula $W-1$
\end_inset

 edges.
 Thus, a sentence of length 
\begin_inset Formula $\left|W\right|$
\end_inset

 connected by 
\begin_inset Formula $\left|L\right|$
\end_inset

 links will have 
\begin_inset Formula $\left|L\right|-\left|W\right|+1$
\end_inset

 
\begin_inset Quotes eld
\end_inset

fundamental cycles
\begin_inset Quotes erd
\end_inset

 in it.
 Hmm.
 Is there a middle ground? Can we mildly encourage parses with cycles in
 them, without over-connecting everything to everything? So perhaps divide
 by 
\begin_inset Formula $\left|W\right|+0.5\left(\left|L\right|-\left|W\right|+1\right)$
\end_inset

? 
\end_layout

\begin_layout Standard
Planar graphs already limit the number of possible cycles...
 Assuming that neighboring words are always 
\begin_inset Quotes eld
\end_inset

related
\begin_inset Quotes erd
\end_inset

 to each other is a reasonable default, usually.
 But long-distance links are important, too.
 Finding the best long-distance links is a good thing.
 Should we weight in a way that encourages long-distance links? How? The
 number of alternatives available here is starting to be overwhelming here.
 There's no obvious clear-cut 
\begin_inset Quotes eld
\end_inset

best answer.
\begin_inset Quotes erd
\end_inset

 I don't know of any theoretical foundations that can point at a best answer.
\end_layout

\begin_layout Standard
Is MI actually 
\begin_inset Quotes eld
\end_inset

additive
\begin_inset Quotes erd
\end_inset

? I've always been assuming that, yes, it is, its foundational.
 That's a foundational assumption; it follows from Bayes rule.
 The example above with disjuncts suggests otherwise.
 Thus, we also have a non-additive possibility for sentence MI, the WI analog
 for sentences:
\begin_inset Formula 
\[
WI\left(L\right)=\frac{\sum_{\left(u,w\right)\in L}2^{MI\left(u,w\right)}MI\left(u,w\right)}{\sum_{\left(u,w\right)\in L}2^{MI\left(u,w\right)}}
\]

\end_inset

Hmm.
 I kind-of like that.
 It seems to have OK properties.
 It is NOT additive: 
\begin_inset Formula $WI\left(L\right)\ne\sum_{w\in W}WI\left(w\right)$
\end_inset

 but it IS weighted-additive:
\begin_inset Formula 
\[
WI\left(L\right)=\frac{1}{2Z\left(L\right)}\sum_{w\in W}Z_{C}\left(w\right)WI_{C}\left(w\right)
\]

\end_inset

where 
\begin_inset Formula 
\[
Z_{C}\left(w\right)=\sum_{c\in C}2^{MI\left(w,c\right)}
\]

\end_inset

is the normalizing denominator for a disjunct, and 
\begin_inset Formula 
\[
Z\left(L\right)=\sum_{\left(u,w\right)\in L}2^{MI\left(u,w\right)}
\]

\end_inset

is the normalizing denominator for 
\begin_inset Formula $WI\left(L\right)$
\end_inset

.
 We call it 
\begin_inset Formula $Z$
\end_inset

 because its like a thermodynamic partition function.
 There's an extra factor of 2 because the connectors are double-counted.
\end_layout

\begin_layout Standard
BTW, note also that the averages are plain-additive:
\begin_inset Formula 
\[
MI_{\mbox{Tot}}\left(L\right)=\sum_{\left(u,w\right)\in L}MI\left(u,w\right)=\frac{1}{2}\sum_{w\in W}\left|C\right|\mbox{avg}_{C}\left(w\right)
\]

\end_inset

with the factor of 1/2 coming from the double-counting of links by connectors
 (since each connector is a half-link.)
\end_layout

\begin_layout Standard
So, from now on, lets track WI and Z in the code-base, as well as the usual
 averages.
 That sounds like a plan.
 Lets go with it.
\end_layout

\begin_layout Subsection*
Distributions
\end_layout

\begin_layout Standard
The above provides plenty of material to chew on.
 Lets look at a real dataset, and create some graphs.
 The dataset will be r18-bond.rdb, created from a block-parse of the fanfic-tranc
he-2 dataset (only).
 The block parse was with a window-size of 16 and 4 parses per sample, resulting
 in an sampling of 15*4=60 edge samples containing a fixed word (as a lower
 bound on the average; each word participates in an average of two edges
 per parse, but these edges are counted only once per word-pair.
 The average may be higher, when parses have fundamental cycles in them.)
\begin_inset Foot
status collapsed

\begin_layout Plain Layout
A window of size 16 has a clique size of 
\begin_inset Formula $16\times15/2=120$
\end_inset

 edges, so this is an under-sampling compared to a sliding clique sampling.
 Since the samples are from random planar trees, I'm guessing that the shorter
 links are sampled much more often than the longer ones.
 Hmm.
 So this is yet another distributional question we don't understand.
 Future versions will be changed to a window-size of 9 and 6 parses, which
 results in 
\begin_inset Formula $\left(9-1\right)\times6=48$
\end_inset

 edge samples per word, which is more than the 
\begin_inset Formula $9\times8/2=36$
\end_inset

 edges that a sliding clique would have ...
 unless the parse has cycles in it, and then ...
 Hmm.
 This is confusing.
 Again, we are running on gut-feels rather than hard evidence.
 
\end_layout

\end_inset

 
\end_layout

\begin_layout Section*
Compact Dot Products?
\end_layout

\begin_layout Standard
Is there a way to partially pre-compute dot products and store them in compact
 form, balancing between speed and storage? The prototypical example: there
 are 
\begin_inset Formula $N=10K$
\end_inset

 vectors, each vector is D-dimensional, with 
\begin_inset Formula $D\approx10K$
\end_inset

 (
\emph on
i.e.

\emph default
 similar to 
\begin_inset Formula $N$
\end_inset

), and each vector extremely sparse (the number of non-zero entries in the
 vector follows a Zipfian or square-root-Zipfian distribution).
\end_layout

\begin_layout Standard
The two conventional choices are: 
\end_layout

\begin_layout Enumerate
Compute dot-products on the fly, as needed.
 But this is extremely slow, when a lot of them are needed.
\end_layout

\begin_layout Enumerate
Precompute the dot products.
 But this requires 
\begin_inset Formula $\mathcal{O}\left(N^{2}\right)$
\end_inset

 storage which is not tractable for large 
\begin_inset Formula $N$
\end_inset

.
\end_layout

\begin_layout Standard
Is there a middle ground? Can we pre-compute something the size of 
\begin_inset Formula $\mathcal{O}\left(N\log N\right)$
\end_inset

 that makes computing the actual dot-product really fast? Perhaps a Haar
 basis or wavelet basis of some kind?
\end_layout

\begin_layout Section*
Datasets
\end_layout

\begin_layout Standard
All new code, (mostly) same old datasets.
 Track some vital stats.
 First table: pairs and pair counting.
\end_layout

\begin_layout Standard
\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
\align center
\begin_inset Tabular
<lyxtabular version="3" rows="15" columns="7">
<features tabularvalignment="middle">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Dataset
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sfia
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
tranche-1
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
tranche-2
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
tranche-3
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
tranche-4
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
tranche-5
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Files
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
369
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
3027
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
4498
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
5711
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
5417
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
8220
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Words
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.9M
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
8.05M
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
6.63M
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
14.56M
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
13.14M
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
19.38M
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wall-clock
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
6.25h
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
19h
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
17.5h
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
32.8h
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
52.0h
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
RSS
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
4.3G
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
14.2G
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
22.4G
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
41.9G
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
55.0g
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Load time
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
91s
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
317s
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
592s
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1028s
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1620s
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
rdb size
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1GB
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
2.15GB
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
3.25GB
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
5.79GB
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
10.16GB
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
basis words
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
32.6K
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
92.0K
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
118.5K
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
250K
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
321K
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
pairs
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
2.74M
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
9.11M
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
14.41M
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
26.83M
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
35.48M
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\log_{2}$
\end_inset

 size
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
21.3844
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
23.1190
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
23.7804
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
24.6776
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
25.0804
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sparsity
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
8.60343
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
9.86053
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
9.92942
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
11.1869
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
11.5071
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
rarity
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
6.39046
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
6.62923
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
6.92551
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
6.74533
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
6.78662
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\log_{2}$
\end_inset

 obs/pair
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
5.59080
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
5.92793
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
6.21929
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
6.25789
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
6.34636
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
MI
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.66855
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.51228
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.39581
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.45826
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1.44263
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Entropy
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
16.5490
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
17.1582
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
17.2259
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
17.7181
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
17.8457
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
Legend:
\end_layout

\begin_layout Description
Words Word-count, for the files in that text batch.
 This is not the cumulative number, this is the delta for that tranche,
 only.
 The sfia dataset is not accumulated into any of the tranches; it stands
 alone.
\end_layout

\begin_layout Description
RSS Resident set size, for loading the pairs.
 Reloaded after pair counting, not during.
\end_layout

\begin_layout Description
Load_time CPU time to load pairs dataset.
 Loading is currently single-threaded, not parallel.
 Loading is done by running 
\family sans
./cogserver-mst.scm
\family default
, which load all pairs and prints the summery report for the rest of the
 row tables...
\end_layout

\begin_layout Standard
Second table: MST.
 It requires link-grammar version 5.12.3 to run correctly.
\end_layout

\begin_layout Standard
\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
\align center
\begin_inset Tabular
<lyxtabular version="3" rows="18" columns="4">
<features tabularvalignment="middle">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<column alignment="center" valignment="top">
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Dataset
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sfia
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
tranche-1
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
tranche-2
\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
wall-clock
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
35 hrs
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
rdb size
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Load sects
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
825s
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
RSS
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Shape time
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
1800s
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
RSS+Shape
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
110G
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
basis words
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
39.0K
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
csets
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
18.42M
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Sections
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
24.68M
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
X-Sects
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
72.14M
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\log_{2}$
\end_inset

 size
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
sparsity
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
rarity
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
\begin_inset Formula $\log_{2}$
\end_inset

 obs/Sect
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
MI
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout
Entropy
\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text

\begin_layout Plain Layout

\end_layout

\end_inset
</cell>
</row>
</lyxtabular>

\end_inset


\begin_inset VSpace defskip
\end_inset


\end_layout

\begin_layout Standard
Legend:
\end_layout

\begin_layout Description
RSS Resident set size, for loading the MST sections plus exploding them
 into cross-sections.
 Reloaded after MST counting, not during.
\end_layout

\begin_layout Description
Load_time CPU time to load MST dataset.
 Loading is currently single-threaded, not parallel.
\end_layout

\begin_layout Standard
Comments...
\end_layout

\begin_layout Section*
The End
\end_layout

\begin_layout Standard
This is the end of Part Nine of the diary.
 
\end_layout

\end_body
\end_document