diff --git a/CMakeLists.txt b/CMakeLists.txt index 22e1853..fbea3ed 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -60,6 +60,7 @@ set(msg_interface_srcs ) set(msg_interface_hdrs model/msg-interface/ns3-ai-msg-interface.h) set(gym_interface_srcs model/gym-interface/cpp/ns3-ai-gym-interface.cc + model/gym-interface/cpp/ns3-ai-multi-agent-gym-interface.cc model/gym-interface/cpp/ns3-ai-gym-env.cc model/gym-interface/cpp/container.cc model/gym-interface/cpp/spaces.cc @@ -67,6 +68,7 @@ set(gym_interface_srcs ) set(gym_interface_hdrs model/gym-interface/cpp/ns3-ai-gym-interface.h + model/gym-interface/cpp/ns3-ai-multi-agent-gym-interface.h model/gym-interface/cpp/ns3-ai-gym-env.h model/gym-interface/cpp/container.h model/gym-interface/cpp/spaces.h diff --git a/README.md b/README.md index e1fbb47..4d7a33d 100644 --- a/README.md +++ b/README.md @@ -29,6 +29,7 @@ greater flexibility. - High-performance data interaction module in both C++ and Python side. - A high-level [Gym interface](model/gym-interface) for using Gymnasium APIs, and a low-level [message interface](model/msg-interface) for customizing the shared data. +- Support for multi-agent reinforcement learning - Useful skeleton code to easily integrate with AI frameworks on Python side. ## Installation @@ -43,11 +44,13 @@ To get started on ns3-ai, check out the [A-Plus-B](examples/a-plus-b) example. T C++ passes two numbers to Python and their sum is passed back to C++, with the implementation using all available interfaces: Gym interface, message interface (struct-based) and message interface (vector-based). +An advanced example for [multi-agent](examples/multi-agent) reinforcement learning is also provided. ### Documentation Ready to deploy ns3-ai in your own research? Before you code, please go over the tutorials on -[Gym interface](model/gym-interface) and [message interface](model/msg-interface). They provide +[Gym interface](model/gym-interface) and [message interface](model/msg-interface). The documentation for [multi-agent environments](./docs/multi-agent.md) explains in detail how ns3-ai can be used to train multiple-agents in an ns3 simulation. +They provide step-by-step guidance on writing C++-Python interfaces, with some useful code snippets. We also created some **pure C++** examples, which uses C++-based ML frameworks to train @@ -84,6 +87,10 @@ This original work is done based on [5G NR](https://5g-lena.cttc.es/) branch in also run in LTE codebase in ns-3 mainline. We didn't reproduce all the experiments on LTE, and the results in our paper are based on NR work. +### [MULTI-AGENT](examples/multi-agent/) + +This example illustrates with a simple scenario how multi-agent environments can be created using ns3-ai. It also explains how the agents in the environment can be trained using RLlib and how the trained agents can be evaluated. + ## Other materials ### Google Summer of Code 2023 @@ -102,6 +109,9 @@ Note: this tutorial explains the original design, which is not up to date with t Join us in this [online recording](https://vimeo.com/566296651) to get better knowledge about ns3-ai. The slides introducing the ns3-ai model could also be found [here](https://www.nsnam.org/wp-content/uploads/2021/tutorials/ns3-ai-tutorial-June-2021.pdf). +## Related projects +The [defiance project](https://github.com/DEFIANCE-project) builds upon the multi-agent capabilities of ns3-ai and allows the user to realistically simulate the deployment of reinforcement learning components. It handles setup, and communication of these components in a flexible way. The user only needs to write minimal code in order to specify the observations, actions and rewards in the experiment. + ## Cite Our Work Please use the following bibtex: diff --git a/docs/multi-agent.md b/docs/multi-agent.md new file mode 100644 index 0000000..7409ca9 --- /dev/null +++ b/docs/multi-agent.md @@ -0,0 +1,621 @@ +# Multi-Agent Reinforcement Learning + +## Background +The `Ns3MultiAgentEnv` allows the user to +create a multi-agent Gymnasium environment from an ns3 simulation, +facilitating the `OpenGymMultiAgentInterface` for inter-process +communication. This environment can then be used to train the agents +using reinforcement learning algorithms. We assume the reader is already +familiar with the concepts of reinforcement learning, multi-agent +systems, and the ns-3 simulator. + +## Usage Overview + +The following steps have to be carried out to create a multi-agent environment for a specific experiment: +1. Create an ns-3 simulation with the desired network topology and + traffic. +2. Define how each agent observes and acts within the environment. +3. Specify when an agent performs its inference and training steps. +4. Decide on termination criteria for the environment. +5. Register the environment in a Python script where it can be used to + interact with the ns3 simulation + +Steps 1 to 4 require to write **C++** code utilizing the API of the +`OpenGymMultiAgentInterface`. Step 5 is done in **Python** by creating +an instance of the `Ns3MultiAgentEnv`. In the following sections, we +will guide the user through the usage of both of these components in +general and provide a minimal example to demonstrate the usage. + +## Basic Example + +For the scope of this documentation, we decided on the following example. +We will create a variable number of agents in our ns3 simulation. Each +of these agents will be instantiated with a random counter ranging from +-42 to +42. When doing inference, each agent can decide on a number +between -5 and +5. This number will be added to the counter of this +agent. The goal of each agent is to reach the counter value 0, therefore +the reward for each agent is the negative absolute value of its counter. +The agents infer once every second and the experiment is truncated at 60 +seconds (simulation end). The agents are first evaluated with random +actions, then trained using the DQN algorithm and finally an evaluation +based on a checkpoint of the training is performed. + +Because the agents behave very similarly we introduce the `Agent` class +in our ns3 script. The relevant methods this class provides will also be +discussed in the following sections. Overall it is not necessary to +create new classes in order to work with the +`OpenGymMultiAgentInterface` and we will also show how it can be used +without them. + +## OpenGymMultiAgentInterface + +In general, the `OpenGymMultiAgentInterface` is responsible for: +- Registering agents with their corresponding observation and action + spaces in the environment +- Performing inference and training steps for a given agent +- Terminating the environment and handling the simulation end + +### Accessing the Interface + +To use the `OpenGymMultiAgentInterface` in the ns3 simulation, the user +has to include the ns3ai-module via + +``` cpp +#include +``` + +The interface is then provided as a singleton and can be used inside the +simulation without the need for instantiating it. The user can access +the interface via + +``` cpp +OpenGymMultiAgentInterface::Get() +``` + +This returns a pointer to the interface from which the other methods can +be accessed. + +### Registering Agents + +To register an agent with the interface, the user has to provide the following information: +- The agent's ID +- The observation space of the agent +- The action space of the agent + +The **agent id** is an arbitrary string that is used to identify the +agent in the simulation and in the final Python environment. The +observation and action spaces are defined as `OpenGymSpaces` and +registered by providing callbacks, which return the space information. +The callbacks are then used in +`OpenGymMultiAgentInterface::SetGetObservationSpaceCb` and +`OpenGymMultiAgentInterface::SetGetActionSpaceCb` respectively. + +The following code snippets demonstrate how to register the agents from +our example for the environment. + +First, the observation and action spaces are defined in the agent class. In this simple example, +the observation is a single integer - the current number - while the possible action is from the discrete space [0, 10]. +The action will later on be transformed to the range [-5, 5]. + +``` cpp +Ptr +Agent::GetObservationSpace() +{ + auto type = TypeNameGet(); + auto shape = std::vector{1}; + auto obsSpace = CreateObject(-INFINITY, INFINITY, shape, type); + return obsSpace; +} + +Ptr +Agent::GetActionSpace() +{ + auto actionSpace = CreateObject(10); + return actionSpace; +} +``` + +Then the agents are instantiated and registered in the environment. + +``` cpp +auto randomNumber = CreateObject(); +randomNumber->SetAttribute("Min", DoubleValue(-42)); +randomNumber->SetAttribute("Max", DoubleValue(42)); + +std::vector agents; +for (int i = 0; i < numAgents; i++) +{ + // create an agent that will step once a second with its + // counter initialized randomly and a given id + std::string id = "agent_" + std::to_string(i); + int number = randomNumber->GetInteger(); + Time stepTime = Seconds(1); + auto agent = new Agent(id, number, stepTime); + agents.emplace_back(agent); + + // register the newly created agent in the environment + OpenGymMultiAgentInterface::Get()->SetGetObservationSpaceCb( + id, + MakeCallback(&Agent::GetObservationSpace, agents[i])); + OpenGymMultiAgentInterface::Get()->SetGetActionSpaceCb( + id, + MakeCallback(&Agent::GetActionSpace, agents[i])); +} +``` +>[!NOTE] +>In case the user does not want to create an extra class for the agents, +>the callbacks can also be provided as lambda functions. +>``` cpp +>for (int i = 0; i < numAgents; i++) +>{ +> std::string id = "agent_" + std::to_string(i); +> OpenGymMultiAgentInterface::Get()->SetGetObservationSpaceCb(id, []() { +> auto type = TypeNameGet(); +> auto shape = std::vector{1}; +> auto obsSpace = CreateObject(-INFINITY, INFINITY, shape, type); +> return obsSpace; +> }); +> OpenGymMultiAgentInterface::Get()->SetGetActionSpaceCb(id, []() { +> auto actionSpace = CreateObject(10); +> return actionSpace; +> }); +>} +>``` + +### Performing Inference + +To let an agent perform inference the following information has to be provided: +- ID of the agent +- Observation the agent made +- Reward signal the agent received after its previous action +- Indication whether the agent reached a terminal state +- Extra information that is not used for training but the user is + interested in +- Time that indicates how long the inference takes in the simulation +- How the inferred action shall be applied in the simulation + +Signaling that an agent performs inference is done via +`OpenGymMultiAgentInterface::NotifyCurrentState`. This method needs to +be scheduled during simulation time, whenever an agent should compute +its next action. + +>[!NOTE] +>The design of the interface allows only one agent to perform inference +>per call of `NotifyCurrentState`. Still, this does not restrict two +>agents to perform inference at the exact same time in the simulation. To +>do so, the user simply needs to schedule two calls of +>`NotifyCurrentState` at the same simulation time, and provide the +>different arguments. + +The following code snippets demonstrate how `NotifyCurrentState` can be +used to perform an agent step in our example.: + +``` cpp +void +Agent::Step() +{ + OpenGymMultiAgentInterface::Get()->NotifyCurrentState( + m_id, + GetObservation(), + GetReward(), + false, // the agent does not have a terminal state + {}, + Seconds(0), // we assume performing inference is instantaneous + MakeCallback(&Agent::ExecuteAction, this)); + + // We want the agents to step periodically at fixed intervals + Simulator::Schedule(m_stepTime, &Agent::Step, this); +} +``` + +In the simulation, the step method now just needs to be invoked once for +each agent. + +``` cpp +for (const auto agent : agents) +{ + Simulator::Schedule(Seconds(0), &Agent::Step, agent); +} +``` +>[!NOTE] +>The methods `GetObservation`, `GetReward`, and `ExecuteAction` of the newly +>created agent class are not provided by the +>interface itself. Again, as already demonstrated for the registration of +>the agents, the user could also use lambda functions together with +>`NotifyCurrentState` or even pass the corresponding values directly. + +>[!WARNING] +>As already mentioned the interface utilizes so-called spaces and +>containers to communicate the observations and actions of agents. The +>user needs to make sure that the observations are correctly wrapped +>inside such a container and match the space description. For the actions, +>the user must extract the action from the provided container (this also +>needs to match the action space description). +>The following code demonstrates how such an action would be extracted +>and executed in our example: +>``` cpp +>void +>Agent::ExecuteAction(Ptr action) +>{ +> // the action space in this case is a discrete container ranging from 0 to 10 +> // such a container contains exactly one value +> auto raw_action = DynamicCast(action)->GetValue(); +> // the agent is allowed to choose a number between -5 and 5 +> // to and add it to its internal counter +> m_number += raw_action - 5; +>} +>``` + +### Terminating the Environment + +The simulation of the environment can end due to two possible reasons: +1. An agent reached its terminal state +2. The simulation ended + +As we have already seen, the user can signal that an agent reached its +terminal state by setting the corresponding flag in +`NotifyCurrentState`. When the method is called with this flag set to +true, the simulation will be stopped and destroyed and all agents will +be treated as having reached their terminal state. + +To signal the simulation end to the environment, the user can call +`OpenGymMultiAgentInterface::NotifySimulationEnd`. As additional +arguments, a final reward and extra information can be provided. In +reinforcement learning, this corresponds to the truncation of the episode. + +The following code snippet demonstrates how the simulation end can be +signaled: + +``` cpp +Simulator::Stop(Seconds(60)); +Simulator::Run(); +Simulator::Destroy(); +// finish the environment without giving an extra reward and +// without providing extra information +OpenGymMultiAgentInterface::Get()->NotifySimulationEnd(0, {}); +``` +>[!WARNING] +>The call to `NotifySimulationEnd` must be executed as the very last +>method in the simulation script as it will destroy the C++ process once +>the information has been passed to the Python environment. +>It is also advised to include it in every experiment because it ensures +>that the RL algorithms understand that the episode has been truncated +>when the simulation time is over. + +### Conclusion + +With all the previous sections it should have become clear that the +`OpenGymMultiAgentInterface` is a powerful tool to create multi-agent +environments for reinforcement learning experiments. The user can define +the agents, their observations and actions, and the simulation end +criteria in a flexible way. The interface is designed to be easily +integrated into existing ns3 simulations. The interface is what users +will interact with when designing the simulation part of their +environment in C++. The next sections will demonstrate how to use the +`Ns3MultiAgentEnv` to interact with the ns3 simulation from a Python +script. + +Also, we want to emphasize that all of these interactions only require +the `OpenGymMultiAgentInterface`. Any additional classes or methods +(e.g. the custom `Agent` class) are optional and only necessary to +ensure a clean and structured simulation script without too much +redundancy. + +>[!WARNING] +>A simulation script that contains the `OpenGymMultiAgentInterface` is +>not intended to run and will not run properly on its own. It is only a +>part of the environment that is used to interact with the Python script. +>When running the script on its own, the simulation will fail because calling +>methods of the interface will not generate a response. +>Therefore, a Python script is necessary to complete the environment and +>to run the simulation. + +## Ns3MultiAgentEnv + +The `Ns3MultiAgentEnv` is a Python class that is used to interact with +the ns3 simulation via the `OpenGymMultiAgentInterface`. It provides all +the abstractions of a Gymnasium environment (with slight modifications +to allow for multi-agent setup). Overall it provides the step(), +reset(), and close() methods that are necessary to interact with the +environment. Rendering is not supported in the base class as the +requirements for visualization highly depend on the underlying ns3 +simulation that is experimented with. + +The following sections will guide the user through the possible +interactions with the `Ns3MultiAgentEnv` and provide a minimal example +to demonstrate the usage. + +### Creating an Environment Instance + +The `Ns3MultiAgentEnv` can be understood as a wrapper around an ns3 +simulation that interacts with the `OpenGymMultiAgentInterface`. To +create an instance of the environment the user has to provide the build +target that will be run as the environment and the root directory where +the ns3 files are located (this directory contains for example the src, +contrib and build folders as subdirectories). + +Also, the user might want to pass additional arguments to the ns3 +simulation. These arguments can be passed as a dictionary. In our +example, the number of agents is not fixed and therefore the user can +pass the number of agents as an argument. + +The following code snippets demonstrate how an environment instance can +be created for our example. + +Preparation of the ns3 simulation to accept additional arguments: + +``` cpp +int main(int argc, char* argv[]) +{ + int numAgents = 2; + CommandLine cmd; + cmd.AddValue("numAgents", "Number of agents that act in the environment", numAgents); + cmd.Parse(argc, argv); +//... +``` + +Creation of the environment instance in the Python script: + +``` python +import os +from ns3ai_gym_env.envs.ns3_multi_agent_environment import Ns3MultiAgentEnv # this import is necessary to register the environment + +targetName = "ns3ai_multi-agent" +ns3Path = str(os.getenv("NS3_HOME")) # assuming this contains the path to the root directory of ns3 +ns3Settings: dict[str] = {"numAgents": 3} + +env:Ns3MultiAgentEnv = Ns3MultiAgentEnv(targetName=targetName, ns3Path=ns3Path, ns3Settings=ns3Settings) + +# code that interacts with the environment +# ... + +env.close() # this is necessary to free the resources of the environment +``` + +Instead of the name of the build target, the user can also directly +provide the path to the executable that should be run as the +environment. + +>[!NOTE] +>It is advised to use build targets configured with optimized +>build-profile settings. This often results in significant training +>speedups. See the +>[ns3-documentation](https://www.nsnam.org/docs/tutorial/html/getting-started.html#build-profiles) +>for more information on build profiles. + +### Interacting with the Environment + +To interact with the environment, use the `reset()` and `step()` methods +from the Gymnasium standard. An experiment starts by +resetting the environment, which will provide initial observations and +extra information. + +``` python +obs, extraInfo = env.reset() +``` + +Both, observation and extra information are provided as dictionaries +mapping from agent keys to the corresponding values. The agent keys are +the IDs that were used to register the agents in the ns3 simulation. + +The current implementation does not enforce all agents to be present in +the observation and extra information dictionaries. This allows for a +flexible setup where agents do not need to act synchronously. The user +therefore has to check, whether observations from a particular agent +were actually received, before he can act on them. The easiest way to do +this is to simply iterate over the observation dictionary. + +The following code snippet demonstrates how an action is randomly +sampled for each agent that shared its observation. + +``` python +terminated = truncated = False +while not terminated and not truncated: + action = {} + for agent_id, agent_obs in obs.items(): + action[agent_id] = env.action_space[agent_id].sample() + obs, reward, terminated, truncated, info = env.step(action) + terminated = terminated["__all__"] + truncated = truncated["__all__"] +``` + +Note how the action space (and equally the observation space) can be +inferred from the environment instance. + +The step method takes a dictionary of actions as input. This dictionary +maps from agent_ids to actions and it is required that only actions for +agents that shared their observations are provided (but each of these +agents needs to receive an action). The method returns the new +observations, the rewards (also as a dictionary), a dictionary +indicating whether an agent reached a terminal state, a dictionary +indicating whether an agent was stopped due to a time limit, and the new +extra information. + +The terminated and truncated dictionaries contain the special key +**\_\_all\_\_** that indicates whether all agents reached a terminal +state or were stopped due to a time limit. The user can use this +information to decide whether the environment should be reset or not. + +All in all, this enables the user to build +arbitrarily complex training or evaluation loops. + +In the following section, advanced topics will be discussed that might +be of interest to the user when working with the `Ns3MultiAgentEnv` but +are not necessary for basic usage. + +### Advanced Usage + +#### Random Seeding + +Randomness is an often desired property in reinforcement learning +experiments. To ensure reproducibility, the user can set a seed for the +random number generator in the ns3 simulation. In ns3, seeds consist of +an overall seed and a run number. + +The following code snippet demonstrates how the seed can be set in the +ns3 simulation: + +``` cpp +int seed = 1; +int seedRunNumber = 1; +CommandLine cmd; +cmd.AddValue("seed", "The seed used for reproducibility", seed); +cmd.AddValue( + "seedRunNumber", + "Counts how often the environment has been reset (used for seeding)", + seedRunNumber); +cmd.Parse(argc, argv); + +RngSeedManager::SetSeed(seed); +RngSeedManager::SetRun(seedRunNumber); +``` + +In Python, the seed can be set in the `ns3Settings` dictionary: + +``` python +ns3Settings: dict[str] = {"numAgents": 3, "seed": 1, "seedRunNumber": 1} +``` + +>[!NOTE] +>In order to achieve meaningful results it has to be ensured that the +>agents to not overfit during training. Therefore, a different seed +>should be used each time the environment is reset. This is done +>automatically when the argument `seedRunNumber` is provided to the +>`ns3Settings`. The run number is increased by one each time the +>environment is reset. + +#### Registering the Environment + +The method proposed in [Creating an Environment +Instance](#creating-an-environment-instance) is easy to use as long as +this environment shall exist in the same process as the driver Python +script. This is not the case for some distributed reinforcement learning +libraries like RLlib. The Gymnasium standard introduced a pattern to +deal with this issue. The user can register the environment via a string +identifier and a factory function that creates the environment instance. +The factory function is then called whenever the environment is +requested. + +For Gymnasium, registering the environment would look like this: + +``` python +import gymnasium +from ns3ai_gym_env.envs.ns3_multi_agent_environment import Ns3MultiAgentEnv # this import is necessary to register the environment + +# specify the target name, the path to the ns3 root directory and the ns3 settings +# ... + +gymnasium.envs.register( + id="Multi-Agent-Env", + entry_point="ns3ai_gym_env.envs:Ns3MultiAgentEnv", + kwargs={ + "targetName": targetName, + "ns3Path": ns3Path, + "ns3Settings": ns3Settings, + }, +) +env = gymnasium.make("Multi-Agent-Env", disable_env_checker=True) +``` + +>[!NOTE] +>When registering an environment with Gymnasium, environment checking has +>to be disabled because Gymnasium assumes that all agents will have an +>initial observation after environment reset. In the model provided by +>this library, this is not the case. + + +In Ray RLlib the environment might be registered like this: + +``` python +from ray.tune import register_env +from ns3ai_gym_env.envs.ns3_multi_agent_environment import Ns3MultiAgentEnv # this import is necessary to register the environment + +# specify the target name, the path to the ns3 root directory and the ns3 settings +# ... +register_env( + "Multi-Agent-Env", + lambda _: Ns3MultiAgentEnv( + targetName=targetName, + ns3Path=ns3Path, + ns3Settings=ns3Settings, + ), +) +``` + + +>[!NOTE] +>In case the user needs information that is inferred from an environment +>instance they can simply create a dummy instance, get the relevant +>information and immediately close the dummy instance. +>``` python +>dummy_env = Ns3MultiAgentEnv(targetName=targetName, ns3Path=ns3Path, ns3Settings=ns3Settings) +>obs_space = dummy_env.observation_space +>act_space = dummy_env.action_space +>dummy_env.close() +>``` + +#### Running Multiple Environments in Parallel + +Executing multiple experiments in parallel often is an interesting use +case (e.g. for hyperparameter optimization). Because each environment +uses shared memory for communication with the ns3 simulation, it has to +be ensured that the environments do not interfere with each other. This +can be done by naming the memory segments for each newly created +environment instance. This can be done via the argument `trial_name` +that is passed in the ns3 settings. + +Schematically, this might look similar to the following Python code +snippet: + +``` python +trial = {"trial_name": 1} +ns3Settings: dict[str] = {"numAgents": 3, "seed": 1, "seedRunNumber": 1} + +env1:Ns3MultiAgentEnv = Ns3MultiAgentEnv(targetName=targetName, ns3Path=ns3Path, ns3Settings=(ns3Settings | trial)) + +trial["trial_name"] += 1 +env2:Ns3MultiAgentEnv = Ns3MultiAgentEnv(targetName=targetName, ns3Path=ns3Path, ns3Settings=(ns3Settings | trial)) + +# create many more environment instances +``` + +In practice, however, how the user sets the trial_name for each +environment has to fit the creation process of the environment +instances. The user must ensure that the trial_name is unique for each +environment instance. + +Also, the trial name has to be set in the ns3 simulation. This can be +done by adding the following lines to the ns3 simulation: + +``` cpp +std::string trial_name = "0"; +CommandLine cmd; +cmd.AddValue("trial_name", "name of the trial", trial_name); +cmd.Parse(argc, argv); + +OpenGymMultiAgentInterface::Get(); +Ns3AiMsgInterface::Get()->SetNames("My Seg" + trial_name, + "My Cpp to Python Msg" + trial_name, + "My Python to Cpp Msg" + trial_name, + "My Lockable" + trial_name); +``` + +>[!NOTE] +>"My Seg", "My Cpp to Python Msg", "My Python to Cpp Msg" and "My +>Lockable" are the default names of the memory segments that are used for +>communication between the ns3 simulation and the Python environment. + +>[!NOTE] +>In case the setup is messed up and multiple environments use the same +>memory segments this will lead to strange behavior in the simulation. In +>case the segment names are not aligned between the ns3 simulation and +>the Python environment you will encounter the error message +>`boost::interprocess::bad_alloc`. + +### Conclusion + +The previous sections described how the `Ns3MultiAgentEnv` turns an ns3 +simulation into a multi-agent environment that can be interacted with +according to the Gymnasium standard. + +Check out the provided example scripts for even more information. diff --git a/examples/CMakeLists.txt b/examples/CMakeLists.txt index c121b2b..b0c0565 100644 --- a/examples/CMakeLists.txt +++ b/examples/CMakeLists.txt @@ -3,3 +3,4 @@ add_subdirectory(rate-control) add_subdirectory(rl-tcp) add_subdirectory(lte-cqi) add_subdirectory(multi-bss) +add_subdirectory(multi-agent) diff --git a/examples/multi-agent/CMakeLists.txt b/examples/multi-agent/CMakeLists.txt new file mode 100644 index 0000000..7f94081 --- /dev/null +++ b/examples/multi-agent/CMakeLists.txt @@ -0,0 +1,5 @@ +build_lib_example( + NAME ns3ai_multi-agent + SOURCE_FILES multi-agent.cc + LIBRARIES_TO_LINK ${libai} ${libcore} +) diff --git a/examples/multi-agent/multi-agent-inference.py b/examples/multi-agent/multi-agent-inference.py new file mode 100644 index 0000000..6c5801c --- /dev/null +++ b/examples/multi-agent/multi-agent-inference.py @@ -0,0 +1,76 @@ +''' +This script demonstrates how a reinforcement learning library (Ray RLlib) can be used to perform inference with a trained model in a ns3-simulation. +The script performs the following steps in order to evaluate the performance of a trained model: +1. Imports the necessary libraries and modules. +2. Restores the state of the training algorithm from a checkpoint (the environment needs to be registered in the same way as done in the training script). +3. Runs inference in multiple simulations via the policies from the resrored algorithm. +4. Closes the environment after the final simulation has ended. + +Note: Some external libraries like Ray RLlib or Tensorflow are required to run this script. +''' + +import argparse + +from ns3ai_gym_env.envs.ns3_multi_agent_environment import Ns3MultiAgentEnv +from ray.rllib.algorithms.algorithm import Algorithm +from ray.rllib.utils.framework import try_import_tf +from ray.tune import register_env + +# fix for the following issue: https://github.com/ray-project/ray/issues/14533 +tf1, tf, tfv = try_import_tf() +tf1.enable_eager_execution() + +parser = argparse.ArgumentParser() +parser.add_argument("--ns3Path", type=str, required=True, help="Path to the ns3 root directory.") +parser.add_argument("--checkpointPath", type=str, required=True, help="Path to the checkpoint to restore.") +parser.add_argument("--numAgents", type=int, default=3, help="Number of agents in the simulation.") +parser.add_argument("--numSimulations", type=int, default=10, help="Number of simulations to run.") +args = parser.parse_args() + +targetName = "ns3ai_multi-agent" +ns3Settings: dict[str] = {"numAgents": args.numAgents, "seedRunNumber": 1} + +register_env( + "Multi-Agent-Env", + lambda _: Ns3MultiAgentEnv( + targetName=targetName, + ns3Path=args.ns3Path, + ns3Settings=ns3Settings, + ), +) + +restored_algo = Algorithm.from_checkpoint( + args.checkpointPath, policies_to_train=lambda _: False +) +restored_algo.restore(args.checkpointPath) + +env = Ns3MultiAgentEnv(targetName=targetName, ns3Path=args.ns3Path, ns3Settings=ns3Settings) + +for simulation in range(args.numSimulations): + simulation_reward = 0 + terminated = truncated = False + obs, info = env.reset() + step_count = 0 + while not terminated and not truncated: + action = {} + state = {} + for agent_id, agent_obs in obs.items(): + policy_id = restored_algo.config.multi_agent()["policy_mapping_fn"]( + agent_id, None, None + ) + action[agent_id] = restored_algo.compute_single_action( + observation=agent_obs, + policy_id=policy_id, + explore=False, + timestep=step_count, + ) + obs, reward, terminated, truncated, info = env.step(action) + simulation_reward += ( + list(reward.values())[0] if len(list(reward.values())) > 0 else 0 + ) + step_count += 1 + terminated = terminated["__all__"] + truncated = truncated["__all__"] + print(f"simulation {simulation} completed - mean reward: {simulation_reward / step_count}") + +env.close() diff --git a/examples/multi-agent/multi-agent-random.py b/examples/multi-agent/multi-agent-random.py new file mode 100644 index 0000000..8aa49e5 --- /dev/null +++ b/examples/multi-agent/multi-agent-random.py @@ -0,0 +1,50 @@ +''' +This script demonstrates how the Ns3MultiAgentEnv class can be used together with a specific ns3 simulation. + +The script performs the following steps in order to evaluate the performance of random agents: +1. Imports the necessary libraries and modules. +2. Sets up logging configuration. +3. Defines the configuration for the ns3-simulation that shall be run. +5. Runs multiple episodes of the environment with actions sampled randomly from the action space. +6. Closes the environment after the final simulation has ended. + +Note: This script assumes that the ns-3 simulator is already installed and the necessary dependencies are met. +''' + +import argparse +import logging + +from ns3ai_gym_env.envs.ns3_multi_agent_environment import Ns3MultiAgentEnv + +logging.basicConfig(level=logging.INFO) # verbosity can be reduced by changing this to warning +logger = logging.getLogger(__name__) + +parser = argparse.ArgumentParser() +parser.add_argument("--ns3Path", type=str, required=True, help="Path to the ns3 root directory.") +parser.add_argument("--numAgents", type=int, default=3, help="Number of agents in the simulation.") +parser.add_argument("--numSimulations", type=int, default=10, help="Number of simulations to run.") +args = parser.parse_args() + +targetName = "ns3ai_multi-agent" +ns3Settings: dict[str] = {"numAgents": args.numAgents, "seedRunNumber": 1} + +env = Ns3MultiAgentEnv(targetName=targetName, ns3Path=args.ns3Path, ns3Settings=ns3Settings) + + +for simulation in range(args.numSimulations): + simulation_reward = 0 + terminated = truncated = False + step_count = 0 + obs, info = env.reset() + while not terminated and not truncated: + action = {} + for agent_id, agent_obs in obs.items(): + action[agent_id] = env.action_space[agent_id].sample() + obs, reward, terminated, truncated, info = env.step(action) + simulation_reward += list(reward.values())[0] if len(list(reward.values())) > 0 else 0 + step_count += 1 + terminated = terminated["__all__"] + truncated = truncated["__all__"] + print(f"simulation {simulation} completed - mean reward: {simulation_reward / step_count}") + +env.close() diff --git a/examples/multi-agent/multi-agent-train.py b/examples/multi-agent/multi-agent-train.py new file mode 100644 index 0000000..7cb507b --- /dev/null +++ b/examples/multi-agent/multi-agent-train.py @@ -0,0 +1,97 @@ +""" +This example demonstrates how a reinforcement learning library (Ray RLlib) can be used to train multiple agents in a ns3-simulation with DQN. +The script performs the following steps in order to train the agents: +1. Imports the necessary libraries and modules. +2. Registers the environment using ray tune. +3. Configures the training algorithm. +4. Trains the agents for multiple iterations and prints relevant metrics. + +These are the most essential steps to train in any multi-agent environment using Ray RLlib. For advanced usage like hyperparameter tuning please refer to the Ray RLlib documentation and the advanced usage section of this modules documentation. + +Note: Some external libraries like Ray RLlib or Tensorflow are required to run this script. +""" + +import argparse +from pprint import pprint as pp + +from ns3ai_gym_env.envs.ns3_multi_agent_environment import Ns3MultiAgentEnv +from ray.rllib.algorithms.dqn import DQNConfig +from ray.rllib.policy.policy import PolicySpec +from ray.tune import register_env + +parser = argparse.ArgumentParser() +parser.add_argument("--ns3Path", type=str, required=True, help="Path to the ns3 root directory.") +parser.add_argument("--checkpointPath", type=str, required=True, help="Path to the checkpoint to restore.") +parser.add_argument("--numAgents", type=int, default=3, help="Number of agents in the simulation.") +parser.add_argument("--numIterations", type=int, default=50, help="Number of training iterations to run.") +args = parser.parse_args() + +targetName = "ns3ai_multi-agent" +ns3Settings: dict[str] = {"numAgents": args.numAgents, "seedRunNumber": 1} + +env = Ns3MultiAgentEnv(targetName=targetName, ns3Path=args.ns3Path, ns3Settings=ns3Settings) +env_obs_space = env.observation_space +env_act_space = env.action_space +env.close() + +register_env( + "Multi-Agent-Env", + lambda _: Ns3MultiAgentEnv( + targetName=targetName, + ns3Path=args.ns3Path, + ns3Settings=ns3Settings, + ), +) + +replay_config = { + "type": "MultiAgentPrioritizedReplayBuffer", + "capacity": 60000, + "prioritized_replay_alpha": 0.5, + "prioritized_replay_beta": 0.5, + "prioritized_replay_eps": 3e-6, +} + +config = ( + DQNConfig() + .training(train_batch_size=1024, replay_buffer_config=replay_config) + .resources(num_gpus=0) + .rollouts(num_rollout_workers=1, batch_mode="complete_episodes") + .environment("Multi-Agent-Env") + .framework("tf2") + .multi_agent( + policies={ + agent_id: PolicySpec( + observation_space=env_obs_space[agent_id], + action_space=env_act_space[agent_id], + ) + for agent_id in env_obs_space.keys() + }, + policy_mapping_fn=lambda agent_id, episode, worker, **kwargs: agent_id, + ) + .debugging(log_level="ERROR") +) + +algo = config.build() + +metrics_to_print = [ + "episode_reward_mean", + "episode_reward_max", + "episode_reward_min", + "counters", +] + +for i in range(args.numIterations): + print(f"New training iteration {i} started:") + result = algo.train() + pp({k: v for k, v in result.items() if k in metrics_to_print}) + +# checkpointing +save_result = algo.save(args.checkpointPath) +path_to_checkpoint = save_result.checkpoint.path +print( + "An Algorithm checkpoint has been created inside directory: " + f"'{path_to_checkpoint}'." +) + +# final cleanup to free resources +algo.cleanup() diff --git a/examples/multi-agent/multi-agent.cc b/examples/multi-agent/multi-agent.cc new file mode 100644 index 0000000..0bb69a6 --- /dev/null +++ b/examples/multi-agent/multi-agent.cc @@ -0,0 +1,127 @@ +#include + +#include +#include +#include +#include + +using namespace ns3; + +class Agent +{ + public: + Agent(){}; + + Agent(const std::string id, int number, Time stepTime) + : m_id(id), + m_number(number), + m_stepTime(stepTime) + { + } + + ~Agent() + { + } + + void ExecuteAction(Ptr action) + { + // actions that are passed to the agent by the interface are abstract + // OpenGymDataContainer objects and need to be transformed to the actual object type that + // corresponds to the action space of the agent + m_number += DynamicCast(action)->GetValue() - 5; + } + + Ptr GetObservation() const + { + auto shape = std::vector{1}; + auto observation = CreateObject>( + shape); // Create a 1-dimensional + // container that holds the agents observation + observation->AddValue(m_number); + return observation; + } + + double GetReward() const + { + return -abs(m_number); // The goal of the agent is it to reach the number 0 + } + + Ptr GetObservationSpace() + { + auto type = TypeNameGet(); + auto shape = std::vector{1}; + auto obsSpace = CreateObject(-INFINITY, INFINITY, shape, type); + return obsSpace; + } + + Ptr GetActionSpace() + { + auto actionSpace = CreateObject(10); + return actionSpace; + } + + void Step() + { + OpenGymMultiAgentInterface::Get()->NotifyCurrentState( + m_id, + GetObservation(), + GetReward(), + false, + {}, + Seconds(0), + MakeCallback(&Agent::ExecuteAction, this)); + Simulator::Schedule(m_stepTime, &Agent::Step, this); + } + + private: + const std::string m_id; + int m_number; + Time m_stepTime; +}; + +int +main(int argc, char* argv[]) +{ + int numAgents = 2; + int seedRunNumber = 1; + CommandLine cmd; + cmd.AddValue("numAgents", "Number of agents that act in the environment", numAgents); + cmd.AddValue("seedRunNumber", + "Counts how often the environment has been reset (used for seeding)", + seedRunNumber); + cmd.Parse(argc, argv); + + RngSeedManager::SetSeed(42); + RngSeedManager::SetRun(seedRunNumber); + + auto randomNumber = CreateObject(); + randomNumber->SetAttribute("Min", DoubleValue(-42)); + randomNumber->SetAttribute("Max", DoubleValue(42)); + + std::vector agents; + for (int i = 0; i < numAgents; i++) + { + std::string id = "agent_" + std::to_string(i); + int number = randomNumber->GetInteger(); + Time stepTime = Seconds(1); + auto agent = new Agent(id, number, stepTime); + agents.emplace_back(agent); + + OpenGymMultiAgentInterface::Get()->SetGetObservationSpaceCb( + id, + MakeCallback(&Agent::GetObservationSpace, agents[i])); + OpenGymMultiAgentInterface::Get()->SetGetActionSpaceCb( + id, + MakeCallback(&Agent::GetActionSpace, agents[i])); + } + + for (const auto agent : agents) + { + Simulator::Schedule(Seconds(0), &Agent::Step, agent); + } + + Simulator::Stop(Seconds(60)); + Simulator::Run(); + Simulator::Destroy(); + OpenGymMultiAgentInterface::Get()->NotifySimulationEnd(-100, {}); +} diff --git a/model/gym-interface/cpp/ns3-ai-multi-agent-gym-interface.cc b/model/gym-interface/cpp/ns3-ai-multi-agent-gym-interface.cc new file mode 100644 index 0000000..257552c --- /dev/null +++ b/model/gym-interface/cpp/ns3-ai-multi-agent-gym-interface.cc @@ -0,0 +1,257 @@ +#include "ns3-ai-multi-agent-gym-interface.h" + +#include "container.h" +#include "messages.pb.h" +#include "ns3-ai-gym-env.h" +#include "spaces.h" + +#include +#include +#include + +namespace ns3 +{ + +NS_LOG_COMPONENT_DEFINE("OpenGymMultiAgentInterface"); +NS_OBJECT_ENSURE_REGISTERED(OpenGymMultiAgentInterface); + +OpenGymMultiAgentInterface::OpenGymMultiAgentInterface() + : m_simEnd(false), + m_stopEnvRequested(false), + m_initSimMsgSent(false) +{ + auto interface = Ns3AiMsgInterface::Get(); + interface->SetIsMemoryCreator(false); + interface->SetUseVector(false); + interface->SetHandleFinish(false); +} + +OpenGymMultiAgentInterface::~OpenGymMultiAgentInterface() +{ +} + +TypeId +OpenGymMultiAgentInterface::GetTypeId() +{ + static TypeId tid = TypeId("OpenGymMultiAgentInterface") + .SetParent() + .SetGroupName("OpenGym") + .AddConstructor(); + return tid; +} + +void +OpenGymMultiAgentInterface::Init() +{ + // do not send init msg twice + if (m_initSimMsgSent) + { + return; + } + m_initSimMsgSent = true; + + ns3_ai_gym::MultiAgentSimInitMsg simInitMsg; + + // obs space + for (const auto& [key, value] : GetObservationSpace()) + { + (*simInitMsg.mutable_obsspaces())[key] = value->GetSpaceDescription(); + } + + // action space + for (const auto& [key, value] : GetActionSpace()) + { + (*simInitMsg.mutable_actspaces())[key] = value->GetSpaceDescription(); + } + + // get the interface + Ns3AiMsgInterfaceImpl* msgInterface = + Ns3AiMsgInterface::Get()->GetInterface(); + + // send init msg to python + msgInterface->CppSendBegin(); + msgInterface->GetCpp2PyStruct()->size = simInitMsg.ByteSizeLong(); + assert(msgInterface->GetCpp2PyStruct()->size <= MSG_BUFFER_SIZE); + simInitMsg.SerializeToArray(msgInterface->GetCpp2PyStruct()->buffer, + msgInterface->GetCpp2PyStruct()->size); + msgInterface->CppSendEnd(); + + // receive init ack msg from python + ns3_ai_gym::SimInitAck simInitAck; + msgInterface->CppRecvBegin(); + simInitAck.ParseFromArray(msgInterface->GetPy2CppStruct()->buffer, + msgInterface->GetPy2CppStruct()->size); + msgInterface->CppRecvEnd(); + + bool done = simInitAck.done(); + NS_LOG_DEBUG("Sim Init Ack: " << done); + bool stopSim = simInitAck.stopsimreq(); + if (stopSim) + { + NS_LOG_DEBUG("---Stop requested: " << stopSim); + m_stopEnvRequested = true; + Simulator::Stop(); + Simulator::Destroy(); + std::exit(0); + } +} + +void +OpenGymMultiAgentInterface::NotifyCurrentState( + const std::string agentId, + Ptr obsDataContainer, + float reward, + bool isGameOver, + const std::map& extraInfo, + Time actionDelay, + Callback> actionCallback) +{ + if (!m_initSimMsgSent) + { + Init(); + } + if (m_stopEnvRequested) + { + return; + } + ns3_ai_gym::MultiAgentEnvStateMsg envStateMsg; + // observation + ns3_ai_gym::DataContainer obsDataContainerPbMsg; + if (obsDataContainer) + { + obsDataContainerPbMsg = obsDataContainer->GetDataContainerPbMsg(); + envStateMsg.mutable_obsdata()->CopyFrom(obsDataContainerPbMsg); + } + // agent + envStateMsg.set_agentid(agentId); + // reward + envStateMsg.set_reward(reward); + // game over + envStateMsg.set_isgameover(false); + if (isGameOver) + { + envStateMsg.set_isgameover(true); + if (m_simEnd) + { + envStateMsg.set_reason(ns3_ai_gym::MultiAgentEnvStateMsg::SimulationEnd); + } + else + { + envStateMsg.set_reason(ns3_ai_gym::MultiAgentEnvStateMsg::GameOver); + } + } + // extra info + for (const auto& [key, value] : extraInfo) + { + (*envStateMsg.mutable_info())[key] = value; + } + + // get the interface + Ns3AiMsgInterfaceImpl* msgInterface = + Ns3AiMsgInterface::Get()->GetInterface(); + + // send env state msg to python + msgInterface->CppSendBegin(); + msgInterface->GetCpp2PyStruct()->size = envStateMsg.ByteSizeLong(); + assert(msgInterface->GetCpp2PyStruct()->size <= MSG_BUFFER_SIZE); + envStateMsg.SerializeToArray(msgInterface->GetCpp2PyStruct()->buffer, + msgInterface->GetCpp2PyStruct()->size); + + msgInterface->CppSendEnd(); + + // receive act msg from python + ns3_ai_gym::EnvActMsg envActMsg; + msgInterface->CppRecvBegin(); + + envActMsg.ParseFromArray(msgInterface->GetPy2CppStruct()->buffer, + msgInterface->GetPy2CppStruct()->size); + msgInterface->CppRecvEnd(); + + if (m_simEnd) + { + return; + } + + bool stopSim = envActMsg.stopsimreq(); + if (stopSim) + { + NS_LOG_DEBUG("---Stop requested: " << stopSim); + m_stopEnvRequested = true; + Simulator::Stop(); + Simulator::Destroy(); + NS_ABORT_MSG("Simulation stopped!"); + } + + // first step after reset is called without actions, just to get current state + ns3_ai_gym::DataContainer actDataContainerPbMsg = envActMsg.actdata(); + auto action = OpenGymDataContainer::CreateFromDataContainerPbMsg(actDataContainerPbMsg); + Simulator::Schedule(actionDelay, actionCallback.Bind(action)); +} + +void +OpenGymMultiAgentInterface::WaitForStop(float reward, + bool isGameOver, + const std::map& extraInfo) +{ + NS_LOG_FUNCTION(this); + + NotifyCurrentState( + "", + {}, + reward, + isGameOver, + extraInfo, + Seconds(0), + *[](Ptr) {}); +} + +void +OpenGymMultiAgentInterface::NotifySimulationEnd(float reward, + const std::map& extraInfo) +{ + NS_LOG_FUNCTION(this); + m_simEnd = true; + if (m_initSimMsgSent) + { + WaitForStop(reward, true, extraInfo); + } +} + +std::map> +OpenGymMultiAgentInterface::GetActionSpace() +{ + NS_LOG_FUNCTION(this); + std::map> actionSpace; + for (const auto& [agentId, callback] : m_actionSpaceCbs) + { + actionSpace[agentId] = callback(); + } + return actionSpace; +} + +std::map> +OpenGymMultiAgentInterface::GetObservationSpace() +{ + NS_LOG_FUNCTION(this); + std::map> obsSpace; + for (const auto& [agentId, callback] : m_observationSpaceCbs) + { + obsSpace[agentId] = callback(); + } + return obsSpace; +} + +void +OpenGymMultiAgentInterface::SetGetActionSpaceCb(std::string agentId, Callback> cb) +{ + m_actionSpaceCbs[agentId] = cb; +} + +void +OpenGymMultiAgentInterface::SetGetObservationSpaceCb(std::string agentId, + Callback> cb) +{ + m_observationSpaceCbs[agentId] = cb; +} + +} // namespace ns3 diff --git a/model/gym-interface/cpp/ns3-ai-multi-agent-gym-interface.h b/model/gym-interface/cpp/ns3-ai-multi-agent-gym-interface.h new file mode 100644 index 0000000..e6df4ee --- /dev/null +++ b/model/gym-interface/cpp/ns3-ai-multi-agent-gym-interface.h @@ -0,0 +1,59 @@ +#ifndef NS3_AI_MULTI_AGENT_GYM_INTERFACE_H +#define NS3_AI_MULTI_AGENT_GYM_INTERFACE_H + +#include "../ns3-ai-gym-msg.h" + +#include +#include +#include +#include +#include +#include + +namespace ns3 +{ + +class OpenGymSpace; +class OpenGymDataContainer; +class OpenGymEnv; + +class OpenGymMultiAgentInterface : public Singleton, public Object +{ + public: + OpenGymMultiAgentInterface(); + ~OpenGymMultiAgentInterface() override; + static TypeId GetTypeId(); + + void Init(); + void NotifyCurrentState(const std::string agentId, + Ptr obsDataContainer, + float reward, + bool isGameOver, + const std::map& extraInfo, + Time actionDelay, + Callback> actionCallback); + void WaitForStop(float reward, + bool isGameOver, + const std::map& extraInfo = {}); + void NotifySimulationEnd(float reward = 0, + const std::map& extraInfo = {}); + + std::map> GetActionSpace(); + std::map> GetObservationSpace(); + + void SetGetActionSpaceCb(std::string agentId, Callback> cb); + void SetGetObservationSpaceCb(std::string agentId, Callback> cb); + + private: + + bool m_simEnd; + bool m_stopEnvRequested; + bool m_initSimMsgSent; + + std::map>> m_actionSpaceCbs; + std::map>> m_observationSpaceCbs; +}; + +} // end of namespace ns3 + +#endif // NS3_AI_MULTI_AGENT_GYM_INTERFACE_H diff --git a/model/gym-interface/messages.proto b/model/gym-interface/messages.proto index 9045ec5..d3cf150 100644 --- a/model/gym-interface/messages.proto +++ b/model/gym-interface/messages.proto @@ -117,6 +117,25 @@ message EnvStateMsg { string info = 5; } +message MultiAgentSimInitMsg { + map obsSpaces = 1; + map actSpaces = 2; +} + +message MultiAgentEnvStateMsg { + DataContainer obsData = 1; + float reward = 2; + bool isGameOver = 3; + + enum Reason { + SimulationEnd = 0; + GameOver = 1; + } + Reason reason = 4; + map info = 5; + string agentID = 6; +} + message EnvActMsg { DataContainer actData = 1; bool stopSimReq = 2; diff --git a/model/gym-interface/py/CMakeLists.txt b/model/gym-interface/py/CMakeLists.txt index 58451ef..972dc64 100644 --- a/model/gym-interface/py/CMakeLists.txt +++ b/model/gym-interface/py/CMakeLists.txt @@ -1,4 +1,5 @@ pybind11_add_module(ns3ai_gym_msg_py msg_py_binding.cc) +target_link_libraries(ns3ai_gym_msg_py PRIVATE ${libcore}) set_target_properties(ns3ai_gym_msg_py PROPERTIES LIBRARY_OUTPUT_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}) diff --git a/model/gym-interface/py/ns3ai_gym_env/envs/__init__.py b/model/gym-interface/py/ns3ai_gym_env/envs/__init__.py index 4cbc3d7..4e6c528 100644 --- a/model/gym-interface/py/ns3ai_gym_env/envs/__init__.py +++ b/model/gym-interface/py/ns3ai_gym_env/envs/__init__.py @@ -1 +1,4 @@ from ns3ai_gym_env.envs.ns3_environment import Ns3Env +from ns3ai_gym_env.envs.ns3_multi_agent_environment import Ns3MultiAgentEnv + +__all__ = ["Ns3Env", "Ns3MultiAgentEnv"] diff --git a/model/gym-interface/py/ns3ai_gym_env/envs/ns3_multi_agent_environment.py b/model/gym-interface/py/ns3ai_gym_env/envs/ns3_multi_agent_environment.py new file mode 100644 index 0000000..7f3ca0d --- /dev/null +++ b/model/gym-interface/py/ns3ai_gym_env/envs/ns3_multi_agent_environment.py @@ -0,0 +1,113 @@ +from typing import Any, Literal, TypeVar + +import messages_pb2 as pb +import ns3ai_gym_msg_py as py_binding +from gymnasium import spaces +from ray.rllib.env.multi_agent_env import MultiAgentEnv + +from ns3ai_gym_env.typing import copy_signature_from + +from .ns3_environment import Ns3Env + +T = TypeVar("T") + + +class Ns3MultiAgentEnv(Ns3Env, MultiAgentEnv): + @copy_signature_from(Ns3Env.__init__) + def __init__(self, *args: Any, **kwargs: Any) -> None: + self.action_space: spaces.Dict = spaces.Dict() + self.observation_space: spaces.Dict = spaces.Dict() + self.agent_selection: str | None = None + super().__init__(*args, **kwargs) + MultiAgentEnv.__init__(self) + + def initialize_env(self) -> Literal[True]: + init_msg = pb.MultiAgentSimInitMsg() + self.msgInterface.PyRecvBegin() + request = self.msgInterface.GetCpp2PyStruct().get_buffer() + init_msg.ParseFromString(request) + self.msgInterface.PyRecvEnd() + + for agent, space in init_msg.actSpaces.items(): + self.action_space[agent] = self._create_space(space) + + for agent, space in init_msg.obsSpaces.items(): + self.observation_space[agent] = self._create_space(space) + self._agent_ids = list(self.action_space.keys()) + reply = pb.SimInitAck() + reply.done = True + reply.stopSimReq = False + reply_str = reply.SerializeToString() + assert len(reply_str) <= py_binding.msg_buffer_size + + self.msgInterface.PySendBegin() + self.msgInterface.GetPy2CppStruct().size = len(reply_str) + self.msgInterface.GetPy2CppStruct().get_buffer_full()[: len(reply_str)] = reply_str + self.msgInterface.PySendEnd() + return True + + def rx_env_state(self) -> None: + if self.newStateRx: + return + + state_msg = pb.MultiAgentEnvStateMsg() + self.msgInterface.PyRecvBegin() + request = self.msgInterface.GetCpp2PyStruct().get_buffer() + state_msg.ParseFromString(request) + self.msgInterface.PyRecvEnd() + + self.obsData = self._create_data(state_msg.obsData) + self.reward = state_msg.reward + self.gameOver = state_msg.isGameOver + self.gameOverReason = state_msg.reason + self.agent_selection = state_msg.agentID + + if self.gameOver: + self.send_close_command() + + self.extraInfo = dict(state_msg.info) + + self.newStateRx = True + + def send_actions(self, actions: dict[str, Any]) -> bool: + assert self.agent_selection + reply = pb.EnvActMsg() + + action_msg = self._pack_data(actions[self.agent_selection], self.action_space[self.agent_selection]) + reply.actData.CopyFrom(action_msg) + + reply_msg = reply.SerializeToString() + assert len(reply_msg) <= py_binding.msg_buffer_size + self.msgInterface.PySendBegin() + self.msgInterface.GetPy2CppStruct().size = len(reply_msg) + self.msgInterface.GetPy2CppStruct().get_buffer_full()[: len(reply_msg)] = reply_msg + self.msgInterface.PySendEnd() + self.newStateRx = False + return True + + def wrap(self, data: T) -> dict[str, T]: + assert self.agent_selection is not None + return {self.agent_selection: data} + + def step(self, actions: dict[str, Any]) -> tuple[dict[str, Any], ...]: + obs, rew, terminateds, truncateds, info = tuple(self.wrap(state) for state in super().step(actions)) + terminateds["__all__"] = all(terminated for terminated in terminateds.values()) + truncateds["__all__"] = all(truncated for truncated in truncateds.values()) + obs.pop("", "") + rew.pop("", "") + terminateds.pop("", "") + truncateds.pop("", "") + info.pop("", "") + return obs, rew, terminateds, truncateds, info + + def reset( + self, + *, + seed: int | None = None, + options: dict | None = None, + ) -> tuple[dict[str, Any], dict[str, dict[str, Any]]]: + return tuple(self.wrap(state) for state in super().reset(seed, options)) + + def get_random_action(self) -> Any: + assert self.agent_selection is not None + return self.action_space[self.agent_selection].sample() diff --git a/model/gym-interface/py/ns3ai_gym_env/typing.py b/model/gym-interface/py/ns3ai_gym_env/typing.py new file mode 100644 index 0000000..9ec3d80 --- /dev/null +++ b/model/gym-interface/py/ns3ai_gym_env/typing.py @@ -0,0 +1,12 @@ +from collections.abc import Callable +from typing import Any, ParamSpec, TypeVar, cast + +T = TypeVar("T") +P = ParamSpec("P") + + +def copy_signature_from(_origin: Callable[P, Any]) -> Callable[[Callable[..., T]], Callable[P, T]]: + def decorator(target: Callable[..., T]) -> Callable[P, T]: + return cast(Callable[P, T], target) + + return decorator diff --git a/python_utils/ns3ai_utils.py b/python_utils/ns3ai_utils.py index efe605a..f723563 100644 --- a/python_utils/ns3ai_utils.py +++ b/python_utils/ns3ai_utils.py @@ -17,6 +17,7 @@ # Hao Yin # Muyuan Shen +import logging import os import subprocess import psutil @@ -24,6 +25,9 @@ import signal +logger = logging.getLogger(__name__) + + SIMULATION_EARLY_ENDING = 0.5 # wait and see if the subprocess is running after creation @@ -61,7 +65,7 @@ def run_single_ns3(path, pname, setting=None, env=None, show_output=False): # used to kill the ns-3 script process and its child processes def kill_proc_tree(p, timeout=None, on_terminate=None): - print('ns3ai_utils: Killing subprocesses...') + logger.info('ns3ai_utils: Killing subprocesses...') if isinstance(p, int): p = psutil.Process(p) elif not isinstance(p, psutil.Process): @@ -134,12 +138,12 @@ def __init__(self, targetName, ns3Path, msgModule, self.proc = None self.simCmd = None - print('ns3ai_utils: Experiment initialized') + logger.info('ns3ai_utils: Experiment initialized') def __del__(self): self.kill() del self.msgInterface - print('ns3ai_utils: Experiment destroyed') + logger.info('ns3ai_utils: Experiment destroyed') # run ns3 script in cmd with the setting being input # \param[in] setting : ns3 script input parameters(default : None) @@ -147,12 +151,16 @@ def __del__(self): def run(self, setting=None, show_output=False): self.kill() self.simCmd, self.proc = run_single_ns3( - './', self.targetName, setting=setting, show_output=show_output) - print("ns3ai_utils: Running ns-3 with: ", self.simCmd) + "./", + self.targetName, + setting=setting, + show_output=show_output + ) + logger.info("ns3ai_utils: Running ns-3 with: %s", self.simCmd) # exit if an early error occurred, such as wrong target name time.sleep(SIMULATION_EARLY_ENDING) if not self.isalive(): - print('ns3ai_utils: Subprocess died very early') + logger.info('ns3ai_utils: Subprocess died very early') exit(1) signal.signal(signal.SIGINT, sigint_handler) return self.msgInterface