Package designed to synthesize permissive shields for controllers of various RL environments. Based on Shielded Reinforcement Learning for Hybrid Systems which has a Julia-implementation here.
Instantiate an environment (some pre-build environments are supplied in
pyshield/envs.py
):
Then instantiate a shield with the environment, the granularity of the discretization and the number of supporting points to sample per axis.
Synthesizing the shield is done via a call to
shield.make_shield(verbosity=1)
. You can set verbosity
to 0 if you don't want
any status output.
from pyshield.models import Shield
from pyshield.envs import RandomWalk
# The parameter `unlucky` enforces a worst-case stochasticity on the environment
env = RandomWalk(obs_low=[0,0], obs_high=[1.2,1.2], unlucky)
shield = Shield(env, 0.005, samples_per_axis=4)
shield.make_shield(verbosity=1)
After this is done, shield.safe_actions
store information on which actions are
safe in which partitions. If the environment is 2-dimensional, you can draw the
shield, if you provide the names of the actions, the axis labels and a colormap.
# the names of action 0 and action 1
action_names = ['slow', 'fast']
# the colors of the partitions depending on allowed actions
cmap = { '()': 'r', '(slow)': 'y', '(fast)': 'g', '(slow, fast)': 'w' }
# labels of x and y axis
labels = ('x', 't')
shield.draw(cmap, axis_labels=labels, actions=acton_names)
For the example here, the output should look like this:
Currently, two environments are supplied in pyshield/envs.py
. You can use your
own environments if they implement the following API (largely inherited from
Gymnasium:
- They have an attribute
observation_space
, which should be a Box space with non-infinite bounds - They have an attribute
action_space
, which should be a Discrete space - They have a function called
is_safe(s)
, which takes a states
and returnsTrue
ifs
is a safe state andFalse
otherwise - They have a function called
allowed_actions(s)
, which takes a states
and returns a list of allowed actions in this state (can just be all states for any value ofs
) - They have a function
step_from(s, a)
, which takes a states
and an actiona
and returns a tuple(next_s, reward, terminated)
that comes from performing a single step froms
by taking actiona