Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrote state iterator; added documentation and tests #23

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

cogumbreiro
Copy link
Collaborator

  1. Added support for ultrajson so as to improve the loading speed. Upon loading, Salento tries to use ultrajson, emitting a warning when the module is not installed. No changes required.
  2. Rewrote the algorithm for iterating over a call sequence, allowing the user to range over the distribution probability of states more easily. The new algorithm should be a bit faster than the previous version as it brings to a minimum the number of calls it performs to TensorFlow.
  3. Added documentation, usage examples and tests to better serve users.
  4. Added typechecking information (currently only to infer.py) to document the code and improve testing.

BayesianPredictor documentation

The BayesianPredictor offers two main capabilities, both of which
take a call-sequence as an input and output an iterator to navigate the
distribution probabilities of the given call-sequence.

Method Model.infer_call_iter offers the simplest capability: it allows the
user to iterate over the distribution probability given a sequence of calls and
ignores the states of each term. The return value of Model.infer_call_iter
is an iterator of pairs that contain the term name (a string) and the
probability distribution.

For instance, given an instance pred of BayesianPredictor, we can yield
the probability of each term in in a sequence of calls calls with the
following code:

for (call, dist) in pred.distribution_call_iter(spec, calls, sentinel=self.END_MARKER):
    yield dist[call]

Method Model.infer_state_iter allows the user to iterate over the
distribution probability given a sequence of calls, including the states
of each term. The return value of Model.infer_state_iter
is an iterator of lists; each list pairs the term name (a string) with the
probability distribution. The first element of each list is the call name
and the distribution of the call name, the subsequent pairs consist of the
distribution probability for each state of that call name.

For instance, say that we want to "flatten" the output of
Model.infer_state_iter. In the following code we yield the probability
of each call name and of each state in a single iterator.

for row in pred.distribution_state_iter(spec, calls, sentinel=self.END_MARKER):
    for (term, dist) in row:
        yield dist[term]

Method `infer_state_iter_ex` replaces `infer_state_iter` with a simpler
interface and improved performance.

1. The output becomes a sequence of rows, each row pairs call names with
   probabilities.

2. There are half the number of calls to Tensorflow via
  `infer_seq_iter`, which is less of a problem with a cache, but it is
   still an improvement.

3. Added a preliminary testing module that defines a basic notion of
   correctness. Currently, we are only measuring if we are doing the
   same calls to the TensorFlow model. We are currently *not* what
   distribution probabilities are given to the user.

4. Rewrote `kld` to use the new interface, which becomes simpler due to
   streamlined API.
@cogumbreiro
Copy link
Collaborator Author

Pinging @vineethk @asingh-gt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant