Temporal Motion and Communicaiton Planning in Wirelessly Connected Environments via Actor-critic Reinforcement Learning
@INPROCEEDINGS{8264271,
author={M. {Guo} and M. M. {Zavlanos}},
booktitle={2017 IEEE 56th Annual Conference on Decision and Control (CDC)},
title={Temporal task planning in wirelessly connected environments with unknown channel quality},
year={2017},
volume={},
number={},
pages={4161-4168},
doi={10.1109/CDC.2017.8264271}}
This package contains the implementation for motion and communication control of a mobile robot, which is tasked with gathering data in an environment of interest and transmitting these data to a data center. The task is specified as a high-level Linear Temporal Logic (LTL) formula that captures the data to be gathered at various regions in the workspace. The robot has a limited buffer to store the data, which needs to be transmitted to the data center before the buffer overflows. Communication between the robot and the data center is through a dedicated wireless network to which the robot can upload data with rates that are uncertain and unknown.
- Sampling-based roadmap construction, see [build_roadmap] folder
- Wireless routing based on Linear Programming (LP), see [wsn_routing] folder
-
Policy synthesis given a fully-known system model as Markov Decision Processes (MDP)
-
Product automaton between MDP and Deterministic Robin Automaton (DRA), based on [MDP_TG].
-
Policy generated via LP, see [lp_policy.p]
-
Policy generated via actor-critic RL in the product, see [ac_policy.p]
-
Example log, see [log.txt]
-
-
Implementation of the least-squares temporal difference (LSTD) method of the actor-critic type. [ref1] [ref2]
- Task execution, workspace exploration, parameterized-policy learning all performed online and simultaneously, see [ac.py]
- For single critical segment, see [one_cri_seg_ac_learn.py]
- For a given high-level discrete plan, see [ltl_ac_learn.py]
- Indirect learning mode via simulated experience, and direct learning via real experience.
from crm import build_crm
from ac import actor_critic
# load roadmap with wsn rate info, as combined road map (crm)
crm = build_crm()
# set up actor_critic learner
actor_critic_learner = actor_critic(crm, data_bound, quant_size,
Ts, uncertainty_prob, clambda,
Gamma, Beta, D)
actor_critic_learner.set_init_goal(new_init, new_goal)
actor_critic_learner.set_theta(theta)
# indrect learn via simulation
print '|||||||Indirect learning for %d episodes|||||||' %static_learn_episodes
indirect_learn_log = actor_critic_learner.complete_learn(static_learn_episodes,
mode ='model')
# direct learn via robot moving
print '|||||||Direct learning for 1 episode|||||||'
direct_learn_log = actor_critic_learner.one_episode_learn(gamma,
beta, mode='experiment')
- Install python packages like Networkx, ply
- Compile ltl2ba executable for your OS.
- Compile ltl2dstar executable for your OS.
- Gurobi solver for linear programs. Free for academic use.