Rewrote state iterator; added documentation and tests #23

cogumbreiro · 2018-06-08T19:43:10Z

Added support for ultrajson so as to improve the loading speed. Upon loading, Salento tries to use ultrajson, emitting a warning when the module is not installed. No changes required.
Rewrote the algorithm for iterating over a call sequence, allowing the user to range over the distribution probability of states more easily. The new algorithm should be a bit faster than the previous version as it brings to a minimum the number of calls it performs to TensorFlow.
Added documentation, usage examples and tests to better serve users.
Added typechecking information (currently only to infer.py) to document the code and improve testing.

`BayesianPredictor` documentation

The BayesianPredictor offers two main capabilities, both of which
take a call-sequence as an input and output an iterator to navigate the
distribution probabilities of the given call-sequence.

Method Model.infer_call_iter offers the simplest capability: it allows the
user to iterate over the distribution probability given a sequence of calls and
ignores the states of each term. The return value of Model.infer_call_iter
is an iterator of pairs that contain the term name (a string) and the
probability distribution.

For instance, given an instance pred of BayesianPredictor, we can yield
the probability of each term in in a sequence of calls calls with the
following code:

for (call, dist) in pred.distribution_call_iter(spec, calls, sentinel=self.END_MARKER):
    yield dist[call]

Method Model.infer_state_iter allows the user to iterate over the
distribution probability given a sequence of calls, including the states
of each term. The return value of Model.infer_state_iter
is an iterator of lists; each list pairs the term name (a string) with the
probability distribution. The first element of each list is the call name
and the distribution of the call name, the subsequent pairs consist of the
distribution probability for each state of that call name.

For instance, say that we want to "flatten" the output of
Model.infer_state_iter. In the following code we yield the probability
of each call name and of each state in a single iterator.

for row in pred.distribution_state_iter(spec, calls, sentinel=self.END_MARKER):
    for (term, dist) in row:
        yield dist[term]

Method `infer_state_iter_ex` replaces `infer_state_iter` with a simpler interface and improved performance. 1. The output becomes a sequence of rows, each row pairs call names with probabilities. 2. There are half the number of calls to Tensorflow via `infer_seq_iter`, which is less of a problem with a cache, but it is still an improvement. 3. Added a preliminary testing module that defines a basic notion of correctness. Currently, we are only measuring if we are doing the same calls to the TensorFlow model. We are currently *not* what distribution probabilities are given to the user. 4. Rewrote `kld` to use the new interface, which becomes simpler due to streamlined API.

cogumbreiro · 2018-06-08T19:43:38Z

Pinging @vineethk @asingh-gt

cogumbreiro added 19 commits June 8, 2018 15:34

Added support for ujson.

ab8b22a

Making the BayesianPredictor more testable.

5152852

_sequence_to_graph is static; refactored it outside of the class.

653860c

Simplify sequence_to_graph.

e9d416e

Fixed a bug in how we compute the states of sequences.

7b624fa

Added call_iter. Updated kld/seq aggregators.

706ba81

infer_step_iter was deprecated by infer_state_iter and infer_call_iter.

bf2380b

Added some documentation.

6d9d02e

Added some minimal docs.

fb23997

Added a test for data_reader.py.

b38171f

Added licensing info and documentation.

2fd666f

Fixed the dates of copyright.

ae6fd4f

Improved the performance of items.

721dfc9

Renamed to match the method being tested.

f29c48b

Added a bit of typing.

b8b108c

Improved the documentation of state_iter.

ccf8e4e

Improved the documentation.

72a6833

Augment typing coverage.

88f3ad9

cogumbreiro added 5 commits June 11, 2018 20:09

Small typing fix.

dbf7541

Added typing information in stub file.

d82cb45

Reverted adding typing information (using stub files instead).

9999426

Updated typing information.

aadfc2d

Added a fast algorithm to compute the max element (NumPy powered).

161c942

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrote state iterator; added documentation and tests #23

Rewrote state iterator; added documentation and tests #23

cogumbreiro commented Jun 8, 2018

cogumbreiro commented Jun 8, 2018

Rewrote state iterator; added documentation and tests #23

Are you sure you want to change the base?

Rewrote state iterator; added documentation and tests #23

Conversation

cogumbreiro commented Jun 8, 2018

BayesianPredictor documentation

cogumbreiro commented Jun 8, 2018

`BayesianPredictor` documentation