Examples in Python

For users of Python we provide additional source code which makes it possible to solve constrained planning problems by interacting with our Java toolbox. This requires running a java server, and a Python client which sends requests based on a client-server architecture. The construction of planning domains and problem instances can all be done in the Python client. Only the planning step is performed by the Java server.

Starting the server

The Python client and Java server communicate using a client-server architecture. A full description of this architecture is provided here. Starting the server is straightforward: it only requires running the class executables.Server or the Server.jar file that is included in the archive. After starting the server, it waits until a client connects, after which it can process requests. More information is available in the installation instructions.

Solving MDPs with constraints

We illustrate how an MDP planning problem with constraints can be solved in Python. We follow the same example as the corresponding Java example. The Python code can be found in TestCMDP.py. First it is required to connect to the Java server:

ToolboxServer.connect()

Once the connection got established we can obtain a problem instance for the advertising domain with 2 agents and 10 sequential decisions.

num_agents = 2
num_decisions = 10
instance = InstanceGenerator.get_advertising_instance(num_agents, num_decisions)

After generating the instance it is time to solve the instance using one of the algorithms in the toolbox. The code fragment below initializes an algorithm which uses the linear program for CMDPs. It obtains a solution by calling the solve method of the algorithm. After solving the code prints the expected reward of the solution.

expected_reward = ConstrainedMDPFiniteHorizon.solve(instance)
print("Expected reward:", expected_reward)

Finally, the computed solution can be evaluated through simulation. The code fragment below initializes a simulation environment using the CMDPSimulator class. Using this simulator it executes 1000 simulation runs and it prints the mean reward obtained in these simulation runs. In addition to mean reward the simulator also provides methods to obtain the expected resource consumptions (cost) and estimations of the constraint violation probabilities.

sim = CMDPSimulator(instance)
mean_reward = sim.run(1000)
print("Mean reward:", mean_reward)

Once we are ready we don't need the server connection anymore, and therefore we disconnect.

ToolboxServer.disconnect()

Solving POMDPs with constraints

The example code for POMDPs with constraints can be found in TestCPOMDP.py. It follows exactly the same structure as the example for MDPs, and additional explanations are therefore omitted.

ToolboxServer.connect()

num_agents = 2
num_decisions = 10
instance = InstanceGenerator.get_cbm_instance(num_agents, num_decisions)

expected_reward = CGCP.solve(instance)
print("Expected reward:", expected_reward)

sim = CPOMDPSimulator(instance)
mean_reward = sim.run(1000)
print("Mean reward:", mean_reward)

ToolboxServer.disconnect()

Defining new domains and problem instances

New domains and problem instances can be defined by initializing CMDP and CPOMDP objects. These objects follow roughly the same structure as the objects in Java, except that there is no inheritance structure. The Python code corresponding to the Java example is as follows:

num_states = 2
num_actions = 2
num_decisions = 10
initial_state = 0

# create reward function
reward_function = [[0.0 for a in range(num_actions)] for s in range(num_states)]
reward_function[1][1] = 10.0

# create transition function
transition_destinations = [[[] for a in range(num_actions)] for s in range(num_states)]
transition_probabilities = [[[] for a in range(num_actions)] for s in range(num_states)]

transition_destinations[0][0].append(0)
transition_probabilities[0][0].append(1.0)

transition_destinations[0][1].append(0)
transition_destinations[0][1].append(1)
transition_probabilities[0][1].append(0.1)
transition_probabilities[0][1].append(0.9)

# define one cost function
num_cost_functions = 1
cost_function = [[[0.0 for a in range(num_actions)] for s in range(num_states)] for k in range(num_cost_functions)]
cost_function[0][1][1] = 2.0

# define cost limits
limits = [0.5]

# create CMDP
cmdp = CMDP(num_states, num_actions, initial_state, num_decisions)
cmdp.set_reward_function(reward_function)
cmdp.set_transitions(transition_destinations, transition_probabilities)
cmdp.set_cost_functions(cost_function)

# create instance
cmdps = []
cmdps.append(cmdp)
cmdp_instance = CMDPInstance(cmdps, num_decisions)
cmdp_instance.set_cost_limits_budget(limits)

# solve the problem
ToolboxServer.connect()
expected_reward = ConstrainedMDPFiniteHorizon.solve(cmdp_instance)
print("Expected reward:", expected_reward)

# extend to POMDP
num_observations = 2
observation_function = [[[0.0 for o in range(num_observations)] for s in range(num_states)] for a in range(num_actions)]
for a in range(num_actions):
    for s_next in range(num_states):
        o = s_next
        observation_function[a][s_next][o] = 1.0

b0 = BeliefPoint([1.0, 0.0])

cpomdp = CPOMDP(num_states, num_actions, num_observations, b0, num_decisions)
cpomdp.set_reward_function(reward_function)
cpomdp.set_transitions(transition_destinations, transition_probabilities)
cpomdp.set_cost_functions(cost_function)
cpomdp.set_observation_function(observation_function)

cpomdps = []
cpomdps.append(cpomdp)
cpomdp_instance = CPOMDPInstance(cpomdps, num_decisions)
cpomdp_instance.set_cost_limits(limits)


expected_reward = CGCP.solve(cpomdp_instance)
print("Expected reward:", expected_reward)

ToolboxServer.disconnect()

Writing a solution to a file

After solving an instance it is possible to store the solution in a file. This makes it possible to simulate the solution later, without solving the instance again. There are two commands which inform the server that a solution should be written or loaded. The write command shown below can be called after solving an instance. The read command can be executed to load an existing solution from a file.

SolutionManager.writeCMDPSolution("solution.out")
SolutionManager.readCMDPSolution("solution.out")

In order to simulate the solution, it is required that the CMDPSimulator or CPOMDPSimulator is instantiated with the corresponding instance. This is entirely the responsibility of the programmer. Simulating an instance after loading a solution that does not match leads to simulation results that cannot be used.

The ConstrainedPlanningToolbox has been developed by the Algorithmics group at Delft University of Technology, The Netherlands. Please visit our website for more information.