Skip to content

Latest commit

 

History

History
73 lines (54 loc) · 2.73 KB

README.md

File metadata and controls

73 lines (54 loc) · 2.73 KB

Synthesize shields for safe control

Package designed to synthesize permissive shields for controllers of various RL environments. Based on Shielded Reinforcement Learning for Hybrid Systems which has a Julia-implementation here.

Usage

Instantiate an environment (some pre-build environments are supplied in pyshield/envs.py):

Then instantiate a shield with the environment, the granularity of the discretization and the number of supporting points to sample per axis.

Synthesizing the shield is done via a call to shield.make_shield(verbosity=1). You can set verbosity to 0 if you don't want any status output.

from pyshield.models import Shield
from pyshield.envs import RandomWalk

# The parameter `unlucky` enforces a worst-case stochasticity on the environment
env = RandomWalk(obs_low=[0,0], obs_high=[1.2,1.2], unlucky)

shield = Shield(env, 0.005, samples_per_axis=4)
shield.make_shield(verbosity=1)

After this is done, shield.safe_actions store information on which actions are safe in which partitions. If the environment is 2-dimensional, you can draw the shield, if you provide the names of the actions, the axis labels and a colormap.

# the names of action 0 and action 1
action_names = ['slow', 'fast']

# the colors of the partitions depending on allowed actions
cmap = { '()': 'r', '(slow)': 'y', '(fast)': 'g', '(slow, fast)': 'w' }

# labels of x and y axis
labels = ('x', 't')

shield.draw(cmap, axis_labels=labels, actions=acton_names)

For the example here, the output should look like this:

rw-shield

Custom environments

Currently, two environments are supplied in pyshield/envs.py. You can use your own environments if they implement the following API (largely inherited from Gymnasium:

  • They have an attribute observation_space, which should be a Box space with non-infinite bounds
  • They have an attribute action_space, which should be a Discrete space
  • They have a function called is_safe(s), which takes a state s and returns True if s is a safe state and False otherwise
  • They have a function called allowed_actions(s), which takes a state s and returns a list of allowed actions in this state (can just be all states for any value of s)
  • They have a function step_from(s, a), which takes a state s and an action a and returns a tuple (next_s, reward, terminated) that comes from performing a single step from s by taking action a