Skip to content

gtangg12/task-planning-bayes-factor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Task Planning with Pretrained Prior

Consider the problem of determining the next action of an agent trying to accomplish a goal, $g$, the current and previous observations $o_{1:t}$, as well as the previous actions $a_{1:t-1}$.

$$p(a_t|g, o_{1:t}, a_{1:t-1})$$

We can factor out a prior by formulating using Bayes Rule:

$$p(g, o_{1:t-1}|a_{1:t-1}, a_t, o_t)p(a_t|a_{1:t-1}, o_t)$$

where the first term is the likelihood and second prior, which can be a pretrained language model if $o_t$ is encoded using text. Observe the likelihood is discrimative i.e. it operates on binary labels/outputs probability that denote whether $a_t$ is the correct action. Assume that all goals come can unified relatively well under the same prior, and the training data for prior fine-tuning can be aggregated from multiple goals. We can sample actions as follows: sample the top $k$ choices from the prior and reweight them using the likelihood.

In our case, we use BabyAI as the environment. The prior is GPT2 fine-tuned on many goals, where $a_{1:t-1}, o_t$ are represented as text. The likelihood is a classifier that involves BabyAI's 7x7 grid, the aggregated actions, observations, and the goal. We generate negative samples by changing the action of the last element of a goal sequence, which can be viewed as analogous to teacher forcing.

The prior achieves decent accuracy (~70%) and the likelihoods achieve very high accuracy (~93%) with low data (few hundred data points) only on the given goal.

View detailed experiment logs here.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published