diff --git a/CMakeLists.txt b/CMakeLists.txt
index 22e1853..fbea3ed 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -60,6 +60,7 @@ set(msg_interface_srcs )
 set(msg_interface_hdrs model/msg-interface/ns3-ai-msg-interface.h)
 set(gym_interface_srcs
         model/gym-interface/cpp/ns3-ai-gym-interface.cc
+        model/gym-interface/cpp/ns3-ai-multi-agent-gym-interface.cc
         model/gym-interface/cpp/ns3-ai-gym-env.cc
         model/gym-interface/cpp/container.cc
         model/gym-interface/cpp/spaces.cc
@@ -67,6 +68,7 @@ set(gym_interface_srcs
 )
 set(gym_interface_hdrs
         model/gym-interface/cpp/ns3-ai-gym-interface.h
+        model/gym-interface/cpp/ns3-ai-multi-agent-gym-interface.h
         model/gym-interface/cpp/ns3-ai-gym-env.h
         model/gym-interface/cpp/container.h
         model/gym-interface/cpp/spaces.h
diff --git a/README.md b/README.md
index e1fbb47..4d7a33d 100644
--- a/README.md
+++ b/README.md
@@ -29,6 +29,7 @@ greater flexibility.
 - High-performance data interaction module in both C++ and Python side.
 - A high-level [Gym interface](model/gym-interface) for using Gymnasium APIs, and a low-level
   [message interface](model/msg-interface) for customizing the shared data.
+- Support for multi-agent reinforcement learning
 - Useful skeleton code to easily integrate with AI frameworks on Python side.
 
 ## Installation
@@ -43,11 +44,13 @@ To get started on ns3-ai, check out the [A-Plus-B](examples/a-plus-b) example. T
 C++ passes two numbers to Python and their sum is passed back to C++, with the implementation using
 all available interfaces: Gym interface, message interface (struct-based) and message
 interface (vector-based).
+An advanced example for [multi-agent](examples/multi-agent) reinforcement learning is also provided.
 
 ### Documentation
 
 Ready to deploy ns3-ai in your own research? Before you code, please go over the tutorials on
-[Gym interface](model/gym-interface) and [message interface](model/msg-interface). They provide
+[Gym interface](model/gym-interface) and [message interface](model/msg-interface). The documentation for [multi-agent environments](./docs/multi-agent.md) explains in detail how ns3-ai can be used to train multiple-agents in an ns3 simulation.
+They provide
 step-by-step guidance on writing C++-Python interfaces, with some useful code snippets.
 
 We also created some **pure C++** examples, which uses C++-based ML frameworks to train
@@ -84,6 +87,10 @@ This original work is done based on [5G NR](https://5g-lena.cttc.es/) branch in
 also run in LTE codebase in ns-3 mainline. We didn't reproduce all the experiments on LTE, and the results in our paper
 are based on NR work.
 
+### [MULTI-AGENT](examples/multi-agent/)
+
+This example illustrates with a simple scenario how multi-agent environments can be created using ns3-ai. It also explains how the agents in the environment can be trained using RLlib and how the trained agents can be evaluated.
+
 ## Other materials
 
 ### Google Summer of Code 2023
@@ -102,6 +109,9 @@ Note: this tutorial explains the original design, which is not up to date with t
 Join us in this [online recording](https://vimeo.com/566296651) to get better knowledge about ns3-ai.
 The slides introducing the ns3-ai model could also be found [here](https://www.nsnam.org/wp-content/uploads/2021/tutorials/ns3-ai-tutorial-June-2021.pdf).
 
+## Related projects
+The [defiance project](https://github.com/DEFIANCE-project) builds upon the multi-agent capabilities of ns3-ai and allows the user to realistically simulate the deployment of reinforcement learning components. It handles setup, and communication of these components in a flexible way. The user only needs to write minimal code in order to specify the observations, actions and rewards in the experiment.
+
 ## Cite Our Work
 
 Please use the following bibtex:
diff --git a/docs/multi-agent.md b/docs/multi-agent.md
new file mode 100644
index 0000000..7409ca9
--- /dev/null
+++ b/docs/multi-agent.md
@@ -0,0 +1,621 @@
+# Multi-Agent Reinforcement Learning
+
+## Background
+The `Ns3MultiAgentEnv` allows the user to
+create a multi-agent Gymnasium environment from an ns3 simulation,
+facilitating the `OpenGymMultiAgentInterface` for inter-process
+communication. This environment can then be used to train the agents
+using reinforcement learning algorithms. We assume the reader is already
+familiar with the concepts of reinforcement learning, multi-agent
+systems, and the ns-3 simulator.
+
+## Usage Overview
+
+The following steps have to be carried out to create a multi-agent environment for a specific experiment:
+1.  Create an ns-3 simulation with the desired network topology and
+    traffic.
+2.  Define how each agent observes and acts within the environment.
+3.  Specify when an agent performs its inference and training steps.
+4.  Decide on termination criteria for the environment.
+5.  Register the environment in a Python script where it can be used to
+    interact with the ns3 simulation
+
+Steps 1 to 4 require to write **C++** code utilizing the API of the
+`OpenGymMultiAgentInterface`. Step 5 is done in **Python** by creating
+an instance of the `Ns3MultiAgentEnv`. In the following sections, we
+will guide the user through the usage of both of these components in
+general and provide a minimal example to demonstrate the usage.
+
+## Basic Example
+
+For the scope of this documentation, we decided on the following example.
+We will create a variable number of agents in our ns3 simulation. Each
+of these agents will be instantiated with a random counter ranging from
+-42 to +42. When doing inference, each agent can decide on a number
+between -5 and +5. This number will be added to the counter of this
+agent. The goal of each agent is to reach the counter value 0, therefore
+the reward for each agent is the negative absolute value of its counter.
+The agents infer once every second and the experiment is truncated at 60
+seconds (simulation end). The agents are first evaluated with random
+actions, then trained using the DQN algorithm and finally an evaluation
+based on a checkpoint of the training is performed.
+
+Because the agents behave very similarly we introduce the `Agent` class
+in our ns3 script. The relevant methods this class provides will also be
+discussed in the following sections. Overall it is not necessary to
+create new classes in order to work with the
+`OpenGymMultiAgentInterface` and we will also show how it can be used
+without them.
+
+## OpenGymMultiAgentInterface
+
+In general, the `OpenGymMultiAgentInterface` is responsible for:
+-   Registering agents with their corresponding observation and action
+    spaces in the environment
+-   Performing inference and training steps for a given agent
+-   Terminating the environment and handling the simulation end
+
+### Accessing the Interface
+
+To use the `OpenGymMultiAgentInterface` in the ns3 simulation, the user
+has to include the ns3ai-module via
+
+``` cpp
+#include <ns3/ai-module.h>
+```
+
+The interface is then provided as a singleton and can be used inside the
+simulation without the need for instantiating it. The user can access
+the interface via
+
+``` cpp
+OpenGymMultiAgentInterface::Get()
+```
+
+This returns a pointer to the interface from which the other methods can
+be accessed.
+
+### Registering Agents
+
+To register an agent with the interface, the user has to provide the following information:
+-   The agent's ID
+-   The observation space of the agent
+-   The action space of the agent
+
+The **agent id** is an arbitrary string that is used to identify the
+agent in the simulation and in the final Python environment. The
+observation and action spaces are defined as `OpenGymSpaces` and
+registered by providing callbacks, which return the space information.
+The callbacks are then used in
+`OpenGymMultiAgentInterface::SetGetObservationSpaceCb` and
+`OpenGymMultiAgentInterface::SetGetActionSpaceCb` respectively.
+
+The following code snippets demonstrate how to register the agents from
+our example for the environment.
+
+First, the observation and action spaces are defined in the agent class. In this simple example,
+the observation is a single integer - the current number - while the possible action is from the discrete space [0, 10].
+The action will later on be transformed to the range [-5, 5]. 
+
+``` cpp
+Ptr<OpenGymSpace>
+Agent::GetObservationSpace()
+{
+    auto type = TypeNameGet<int>();
+    auto shape = std::vector<uint32_t>{1};
+    auto obsSpace = CreateObject<OpenGymBoxSpace>(-INFINITY, INFINITY, shape, type);
+    return obsSpace;
+}
+
+Ptr<OpenGymSpace>
+Agent::GetActionSpace()
+{
+    auto actionSpace = CreateObject<OpenGymDiscreteSpace>(10);
+    return actionSpace;
+}
+```
+
+Then the agents are instantiated and registered in the environment.
+
+``` cpp
+auto randomNumber = CreateObject<UniformRandomVariable>();
+randomNumber->SetAttribute("Min", DoubleValue(-42));
+randomNumber->SetAttribute("Max", DoubleValue(42));
+
+std::vector<Agent*> agents;
+for (int i = 0; i < numAgents; i++)
+{
+    // create an agent that will step once a second with its
+    // counter initialized randomly and a given id
+    std::string id = "agent_" + std::to_string(i);
+    int number = randomNumber->GetInteger();
+    Time stepTime = Seconds(1);
+    auto agent = new Agent(id, number, stepTime);
+    agents.emplace_back(agent);
+
+    // register the newly created agent in the environment
+    OpenGymMultiAgentInterface::Get()->SetGetObservationSpaceCb(
+        id,
+        MakeCallback(&Agent::GetObservationSpace, agents[i]));
+    OpenGymMultiAgentInterface::Get()->SetGetActionSpaceCb(
+        id,
+        MakeCallback(&Agent::GetActionSpace, agents[i]));
+}
+```
+>[!NOTE]
+>In case the user does not want to create an extra class for the agents,
+>the callbacks can also be provided as lambda functions.
+>``` cpp
+>for (int i = 0; i < numAgents; i++)
+>{
+>    std::string id = "agent_" + std::to_string(i);
+>    OpenGymMultiAgentInterface::Get()->SetGetObservationSpaceCb(id, []() {
+>        auto type = TypeNameGet<int>();
+>        auto shape = std::vector<uint32_t>{1};
+>        auto obsSpace = CreateObject<OpenGymBoxSpace>(-INFINITY, INFINITY, shape, type);
+>        return obsSpace;
+>    });
+>    OpenGymMultiAgentInterface::Get()->SetGetActionSpaceCb(id, []() {
+>        auto actionSpace = CreateObject<OpenGymDiscreteSpace>(10);
+>        return actionSpace;
+>    });
+>}
+>```
+
+### Performing Inference
+
+To let an agent perform inference the following information has to be provided:
+-   ID of the agent
+-   Observation the agent made
+-   Reward signal the agent received after its previous action
+-   Indication whether the agent reached a terminal state
+-   Extra information that is not used for training but the user is
+    interested in
+-   Time that indicates how long the inference takes in the simulation
+-   How the inferred action shall be applied in the simulation
+
+Signaling that an agent performs inference is done via
+`OpenGymMultiAgentInterface::NotifyCurrentState`. This method needs to
+be scheduled during simulation time, whenever an agent should compute
+its next action.
+
+>[!NOTE]
+>The design of the interface allows only one agent to perform inference
+>per call of `NotifyCurrentState`. Still, this does not restrict two
+>agents to perform inference at the exact same time in the simulation. To
+>do so, the user simply needs to schedule two calls of
+>`NotifyCurrentState` at the same simulation time, and provide the
+>different arguments.
+
+The following code snippets demonstrate how `NotifyCurrentState` can be
+used to perform an agent step in our example.:
+
+``` cpp
+void
+Agent::Step()
+{
+    OpenGymMultiAgentInterface::Get()->NotifyCurrentState(
+        m_id,
+        GetObservation(),
+        GetReward(),
+        false, // the agent does not have a terminal state
+        {},
+        Seconds(0), // we assume performing inference is instantaneous
+        MakeCallback(&Agent::ExecuteAction, this));
+
+    // We want the agents to step periodically at fixed intervals
+    Simulator::Schedule(m_stepTime, &Agent::Step, this);
+}
+```
+
+In the simulation, the step method now just needs to be invoked once for
+each agent.
+
+``` cpp
+for (const auto agent : agents)
+{
+    Simulator::Schedule(Seconds(0), &Agent::Step, agent);
+}
+```
+>[!NOTE]
+>The methods `GetObservation`, `GetReward`, and `ExecuteAction` of the newly
+>created agent class are not provided by the
+>interface itself. Again, as already demonstrated for the registration of
+>the agents, the user could also use lambda functions together with
+>`NotifyCurrentState` or even pass the corresponding values directly.
+
+>[!WARNING]
+>As already mentioned the interface utilizes so-called spaces and
+>containers to communicate the observations and actions of agents. The
+>user needs to make sure that the observations are correctly wrapped
+>inside such a container and match the space description. For the actions,
+>the user must extract the action from the provided container (this also
+>needs to match the action space description).
+>The following code demonstrates how such an action would be extracted
+>and executed in our example:
+>``` cpp
+>void
+>Agent::ExecuteAction(Ptr<OpenGymDataContainer> action)
+>{
+>    // the action space in this case is a discrete container ranging from 0 to 10
+>    // such a container contains exactly one value
+>    auto raw_action = DynamicCast<OpenGymDiscreteContainer>(action)->GetValue();
+>    // the agent is allowed to choose a number between -5 and 5
+>    // to and add it to its internal counter
+>    m_number += raw_action - 5;
+>}
+>```
+
+### Terminating the Environment
+
+The simulation of the environment can end due to two possible reasons:
+1.  An agent reached its terminal state
+2.  The simulation ended
+
+As we have already seen, the user can signal that an agent reached its
+terminal state by setting the corresponding flag in
+`NotifyCurrentState`. When the method is called with this flag set to
+true, the simulation will be stopped and destroyed and all agents will
+be treated as having reached their terminal state.
+
+To signal the simulation end to the environment, the user can call
+`OpenGymMultiAgentInterface::NotifySimulationEnd`. As additional
+arguments, a final reward and extra information can be provided. In
+reinforcement learning, this corresponds to the truncation of the episode.
+
+The following code snippet demonstrates how the simulation end can be
+signaled:
+
+``` cpp
+Simulator::Stop(Seconds(60));
+Simulator::Run();
+Simulator::Destroy();
+// finish the environment without giving an extra reward and
+// without providing extra information
+OpenGymMultiAgentInterface::Get()->NotifySimulationEnd(0, {});
+```
+>[!WARNING]
+>The call to `NotifySimulationEnd` must be executed as the very last
+>method in the simulation script as it will destroy the C++ process once
+>the information has been passed to the Python environment.
+>It is also advised to include it in every experiment because it ensures
+>that the RL algorithms understand that the episode has been truncated
+>when the simulation time is over.
+
+### Conclusion
+
+With all the previous sections it should have become clear that the
+`OpenGymMultiAgentInterface` is a powerful tool to create multi-agent
+environments for reinforcement learning experiments. The user can define
+the agents, their observations and actions, and the simulation end
+criteria in a flexible way. The interface is designed to be easily
+integrated into existing ns3 simulations. The interface is what users
+will interact with when designing the simulation part of their
+environment in C++. The next sections will demonstrate how to use the
+`Ns3MultiAgentEnv` to interact with the ns3 simulation from a Python
+script.
+
+Also, we want to emphasize that all of these interactions only require
+the `OpenGymMultiAgentInterface`. Any additional classes or methods
+(e.g. the custom `Agent` class) are optional and only necessary to
+ensure a clean and structured simulation script without too much
+redundancy.
+
+>[!WARNING]
+>A simulation script that contains the `OpenGymMultiAgentInterface` is
+>not intended to run and will not run properly on its own. It is only a
+>part of the environment that is used to interact with the Python script.
+>When running the script on its own, the simulation will fail because calling
+>methods of the interface will not generate a response.
+>Therefore, a Python script is necessary to complete the environment and
+>to run the simulation.
+
+## Ns3MultiAgentEnv
+
+The `Ns3MultiAgentEnv` is a Python class that is used to interact with
+the ns3 simulation via the `OpenGymMultiAgentInterface`. It provides all
+the abstractions of a Gymnasium environment (with slight modifications
+to allow for multi-agent setup). Overall it provides the step(),
+reset(), and close() methods that are necessary to interact with the
+environment. Rendering is not supported in the base class as the
+requirements for visualization highly depend on the underlying ns3
+simulation that is experimented with.
+
+The following sections will guide the user through the possible
+interactions with the `Ns3MultiAgentEnv` and provide a minimal example
+to demonstrate the usage.
+
+### Creating an Environment Instance
+
+The `Ns3MultiAgentEnv` can be understood as a wrapper around an ns3
+simulation that interacts with the `OpenGymMultiAgentInterface`. To
+create an instance of the environment the user has to provide the build
+target that will be run as the environment and the root directory where
+the ns3 files are located (this directory contains for example the src,
+contrib and build folders as subdirectories).
+
+Also, the user might want to pass additional arguments to the ns3
+simulation. These arguments can be passed as a dictionary. In our
+example, the number of agents is not fixed and therefore the user can
+pass the number of agents as an argument.
+
+The following code snippets demonstrate how an environment instance can
+be created for our example.
+
+Preparation of the ns3 simulation to accept additional arguments:
+
+``` cpp
+int main(int argc, char* argv[])
+{
+    int numAgents = 2;
+    CommandLine cmd;
+    cmd.AddValue("numAgents", "Number of agents that act in the environment", numAgents);
+    cmd.Parse(argc, argv);
+//...
+```
+
+Creation of the environment instance in the Python script:
+
+``` python
+import os
+from ns3ai_gym_env.envs.ns3_multi_agent_environment import Ns3MultiAgentEnv  # this import is necessary to register the environment
+
+targetName = "ns3ai_multi-agent"
+ns3Path = str(os.getenv("NS3_HOME")) # assuming this contains the path to the root directory of ns3
+ns3Settings: dict[str] = {"numAgents": 3}
+
+env:Ns3MultiAgentEnv = Ns3MultiAgentEnv(targetName=targetName, ns3Path=ns3Path, ns3Settings=ns3Settings)
+
+# code that interacts with the environment
+# ...
+
+env.close() # this is necessary to free the resources of the environment
+```
+
+Instead of the name of the build target, the user can also directly
+provide the path to the executable that should be run as the
+environment.
+
+>[!NOTE]
+>It is advised to use build targets configured with optimized
+>build-profile settings. This often results in significant training
+>speedups. See the
+>[ns3-documentation](https://www.nsnam.org/docs/tutorial/html/getting-started.html#build-profiles)
+>for more information on build profiles.
+
+### Interacting with the Environment
+
+To interact with the environment, use the `reset()` and `step()` methods
+from the Gymnasium standard. An experiment starts by
+resetting the environment, which will provide initial observations and
+extra information.
+
+``` python
+obs, extraInfo = env.reset()
+```
+
+Both, observation and extra information are provided as dictionaries
+mapping from agent keys to the corresponding values. The agent keys are
+the IDs that were used to register the agents in the ns3 simulation.
+
+The current implementation does not enforce all agents to be present in
+the observation and extra information dictionaries. This allows for a
+flexible setup where agents do not need to act synchronously. The user
+therefore has to check, whether observations from a particular agent
+were actually received, before he can act on them. The easiest way to do
+this is to simply iterate over the observation dictionary.
+
+The following code snippet demonstrates how an action is randomly
+sampled for each agent that shared its observation.
+
+``` python
+terminated = truncated = False
+while not terminated and not truncated:
+    action = {}
+    for agent_id, agent_obs in obs.items():
+        action[agent_id] = env.action_space[agent_id].sample()
+    obs, reward, terminated, truncated, info = env.step(action)
+    terminated = terminated["__all__"]
+    truncated = truncated["__all__"]
+```
+
+Note how the action space (and equally the observation space) can be
+inferred from the environment instance.
+
+The step method takes a dictionary of actions as input. This dictionary
+maps from agent_ids to actions and it is required that only actions for
+agents that shared their observations are provided (but each of these
+agents needs to receive an action). The method returns the new
+observations, the rewards (also as a dictionary), a dictionary
+indicating whether an agent reached a terminal state, a dictionary
+indicating whether an agent was stopped due to a time limit, and the new
+extra information.
+
+The terminated and truncated dictionaries contain the special key
+**\_\_all\_\_** that indicates whether all agents reached a terminal
+state or were stopped due to a time limit. The user can use this
+information to decide whether the environment should be reset or not.
+
+All in all, this enables the user to build
+arbitrarily complex training or evaluation loops.
+
+In the following section, advanced topics will be discussed that might
+be of interest to the user when working with the `Ns3MultiAgentEnv` but
+are not necessary for basic usage.
+
+### Advanced Usage
+
+#### Random Seeding
+
+Randomness is an often desired property in reinforcement learning
+experiments. To ensure reproducibility, the user can set a seed for the
+random number generator in the ns3 simulation. In ns3, seeds consist of
+an overall seed and a run number.
+
+The following code snippet demonstrates how the seed can be set in the
+ns3 simulation:
+
+``` cpp
+int seed = 1;
+int seedRunNumber = 1;
+CommandLine cmd;
+cmd.AddValue("seed", "The seed used for reproducibility", seed);
+cmd.AddValue(
+    "seedRunNumber",
+    "Counts how often the environment has been reset (used for seeding)",
+    seedRunNumber);
+cmd.Parse(argc, argv);
+
+RngSeedManager::SetSeed(seed);
+RngSeedManager::SetRun(seedRunNumber);
+```
+
+In Python, the seed can be set in the `ns3Settings` dictionary:
+
+``` python
+ns3Settings: dict[str] = {"numAgents": 3, "seed": 1, "seedRunNumber": 1}
+```
+
+>[!NOTE]
+>In order to achieve meaningful results it has to be ensured that the
+>agents to not overfit during training. Therefore, a different seed
+>should be used each time the environment is reset. This is done
+>automatically when the argument `seedRunNumber` is provided to the
+>`ns3Settings`. The run number is increased by one each time the
+>environment is reset.
+
+#### Registering the Environment
+
+The method proposed in [Creating an Environment
+Instance](#creating-an-environment-instance) is easy to use as long as
+this environment shall exist in the same process as the driver Python
+script. This is not the case for some distributed reinforcement learning
+libraries like RLlib. The Gymnasium standard introduced a pattern to
+deal with this issue. The user can register the environment via a string
+identifier and a factory function that creates the environment instance.
+The factory function is then called whenever the environment is
+requested.
+
+For Gymnasium, registering the environment would look like this:
+
+``` python
+import gymnasium
+from ns3ai_gym_env.envs.ns3_multi_agent_environment import Ns3MultiAgentEnv  # this import is necessary to register the environment
+
+# specify the target name, the path to the ns3 root directory and the ns3 settings
+# ...
+
+gymnasium.envs.register(
+    id="Multi-Agent-Env",
+    entry_point="ns3ai_gym_env.envs:Ns3MultiAgentEnv",
+    kwargs={
+        "targetName": targetName,
+        "ns3Path": ns3Path,
+        "ns3Settings": ns3Settings,
+    },
+)
+env = gymnasium.make("Multi-Agent-Env", disable_env_checker=True)
+```
+
+>[!NOTE]
+>When registering an environment with Gymnasium, environment checking has
+>to be disabled because Gymnasium assumes that all agents will have an
+>initial observation after environment reset. In the model provided by
+>this library, this is not the case.
+
+
+In Ray RLlib the environment might be registered like this:
+
+``` python
+from ray.tune import register_env
+from ns3ai_gym_env.envs.ns3_multi_agent_environment import Ns3MultiAgentEnv  # this import is necessary to register the environment
+
+# specify the target name, the path to the ns3 root directory and the ns3 settings
+# ...
+register_env(
+    "Multi-Agent-Env",
+    lambda _: Ns3MultiAgentEnv(
+        targetName=targetName,
+        ns3Path=ns3Path,
+        ns3Settings=ns3Settings,
+    ),
+)
+```
+
+
+>[!NOTE]
+>In case the user needs information that is inferred from an environment
+>instance they can simply create a dummy instance, get the relevant
+>information and immediately close the dummy instance.
+>``` python
+>dummy_env = Ns3MultiAgentEnv(targetName=targetName, ns3Path=ns3Path, ns3Settings=ns3Settings)
+>obs_space = dummy_env.observation_space
+>act_space = dummy_env.action_space
+>dummy_env.close()
+>```
+
+#### Running Multiple Environments in Parallel
+
+Executing multiple experiments in parallel often is an interesting use
+case (e.g. for hyperparameter optimization). Because each environment
+uses shared memory for communication with the ns3 simulation, it has to
+be ensured that the environments do not interfere with each other. This
+can be done by naming the memory segments for each newly created
+environment instance. This can be done via the argument `trial_name`
+that is passed in the ns3 settings.
+
+Schematically, this might look similar to the following Python code
+snippet:
+
+``` python
+trial = {"trial_name": 1}
+ns3Settings: dict[str] = {"numAgents": 3, "seed": 1, "seedRunNumber": 1}
+
+env1:Ns3MultiAgentEnv = Ns3MultiAgentEnv(targetName=targetName, ns3Path=ns3Path, ns3Settings=(ns3Settings | trial))
+
+trial["trial_name"] += 1
+env2:Ns3MultiAgentEnv = Ns3MultiAgentEnv(targetName=targetName, ns3Path=ns3Path, ns3Settings=(ns3Settings | trial))
+
+# create many more environment instances
+```
+
+In practice, however, how the user sets the trial_name for each
+environment has to fit the creation process of the environment
+instances. The user must ensure that the trial_name is unique for each
+environment instance.
+
+Also, the trial name has to be set in the ns3 simulation. This can be
+done by adding the following lines to the ns3 simulation:
+
+``` cpp
+std::string trial_name = "0";
+CommandLine cmd;
+cmd.AddValue("trial_name", "name of the trial", trial_name);
+cmd.Parse(argc, argv);
+
+OpenGymMultiAgentInterface::Get();
+Ns3AiMsgInterface::Get()->SetNames("My Seg" + trial_name,
+                                   "My Cpp to Python Msg" + trial_name,
+                                   "My Python to Cpp Msg" + trial_name,
+                                   "My Lockable" + trial_name);
+```
+
+>[!NOTE]
+>"My Seg", "My Cpp to Python Msg", "My Python to Cpp Msg" and "My
+>Lockable" are the default names of the memory segments that are used for
+>communication between the ns3 simulation and the Python environment.
+
+>[!NOTE]
+>In case the setup is messed up and multiple environments use the same
+>memory segments this will lead to strange behavior in the simulation. In
+>case the segment names are not aligned between the ns3 simulation and
+>the Python environment you will encounter the error message
+>`boost::interprocess::bad_alloc`.
+
+### Conclusion
+
+The previous sections described how the `Ns3MultiAgentEnv` turns an ns3
+simulation into a multi-agent environment that can be interacted with
+according to the Gymnasium standard.
+
+Check out the provided example scripts for even more information.
diff --git a/examples/CMakeLists.txt b/examples/CMakeLists.txt
index c121b2b..b0c0565 100644
--- a/examples/CMakeLists.txt
+++ b/examples/CMakeLists.txt
@@ -3,3 +3,4 @@ add_subdirectory(rate-control)
 add_subdirectory(rl-tcp)
 add_subdirectory(lte-cqi)
 add_subdirectory(multi-bss)
+add_subdirectory(multi-agent)
diff --git a/examples/multi-agent/CMakeLists.txt b/examples/multi-agent/CMakeLists.txt
new file mode 100644
index 0000000..7f94081
--- /dev/null
+++ b/examples/multi-agent/CMakeLists.txt
@@ -0,0 +1,5 @@
+build_lib_example(
+        NAME ns3ai_multi-agent
+        SOURCE_FILES multi-agent.cc
+        LIBRARIES_TO_LINK ${libai} ${libcore}
+)
diff --git a/examples/multi-agent/multi-agent-inference.py b/examples/multi-agent/multi-agent-inference.py
new file mode 100644
index 0000000..6c5801c
--- /dev/null
+++ b/examples/multi-agent/multi-agent-inference.py
@@ -0,0 +1,76 @@
+'''
+This script demonstrates how a reinforcement learning library (Ray RLlib) can be used to perform inference with a trained model in a ns3-simulation.
+The script performs the following steps in order to evaluate the performance of a trained model:
+1. Imports the necessary libraries and modules.
+2. Restores the state of the training algorithm from a checkpoint (the environment needs to be registered in the same way as done in the training script).
+3. Runs inference in multiple simulations via the policies from the resrored algorithm.
+4. Closes the environment after the final simulation has ended.
+
+Note: Some external libraries like Ray RLlib or Tensorflow are required to run this script.
+'''
+
+import argparse
+
+from ns3ai_gym_env.envs.ns3_multi_agent_environment import Ns3MultiAgentEnv
+from ray.rllib.algorithms.algorithm import Algorithm
+from ray.rllib.utils.framework import try_import_tf
+from ray.tune import register_env
+
+# fix for the following issue: https://github.com/ray-project/ray/issues/14533
+tf1, tf, tfv = try_import_tf()
+tf1.enable_eager_execution()
+
+parser = argparse.ArgumentParser()
+parser.add_argument("--ns3Path", type=str, required=True, help="Path to the ns3 root directory.")
+parser.add_argument("--checkpointPath", type=str, required=True, help="Path to the checkpoint to restore.")
+parser.add_argument("--numAgents", type=int, default=3, help="Number of agents in the simulation.")
+parser.add_argument("--numSimulations", type=int, default=10, help="Number of simulations to run.")
+args = parser.parse_args()
+
+targetName = "ns3ai_multi-agent"
+ns3Settings: dict[str] = {"numAgents": args.numAgents, "seedRunNumber": 1}
+
+register_env(
+    "Multi-Agent-Env",
+    lambda _: Ns3MultiAgentEnv(
+        targetName=targetName,
+        ns3Path=args.ns3Path,
+        ns3Settings=ns3Settings,
+    ),
+)
+
+restored_algo = Algorithm.from_checkpoint(
+    args.checkpointPath, policies_to_train=lambda _: False
+)
+restored_algo.restore(args.checkpointPath)
+
+env = Ns3MultiAgentEnv(targetName=targetName, ns3Path=args.ns3Path, ns3Settings=ns3Settings)
+
+for simulation in range(args.numSimulations):
+    simulation_reward = 0
+    terminated = truncated = False
+    obs, info = env.reset()
+    step_count = 0
+    while not terminated and not truncated:
+        action = {}
+        state = {}
+        for agent_id, agent_obs in obs.items():
+            policy_id = restored_algo.config.multi_agent()["policy_mapping_fn"](
+                agent_id, None, None
+            )
+            action[agent_id] = restored_algo.compute_single_action(
+                observation=agent_obs,
+                policy_id=policy_id,
+                explore=False,
+                timestep=step_count,
+            )
+        obs, reward, terminated, truncated, info = env.step(action)
+        simulation_reward += (
+            list(reward.values())[0] if len(list(reward.values())) > 0 else 0
+        )
+        step_count += 1
+        terminated = terminated["__all__"]
+        truncated = truncated["__all__"]
+    print(f"simulation {simulation} completed - mean reward: {simulation_reward / step_count}")
+
+env.close()
diff --git a/examples/multi-agent/multi-agent-random.py b/examples/multi-agent/multi-agent-random.py
new file mode 100644
index 0000000..8aa49e5
--- /dev/null
+++ b/examples/multi-agent/multi-agent-random.py
@@ -0,0 +1,50 @@
+'''
+This script demonstrates how the Ns3MultiAgentEnv class can be used together with a specific ns3 simulation.
+
+The script performs the following steps in order to evaluate the performance of random agents:
+1. Imports the necessary libraries and modules.
+2. Sets up logging configuration.
+3. Defines the configuration for the ns3-simulation that shall be run.
+5. Runs multiple episodes of the environment with actions sampled randomly from the action space.
+6. Closes the environment after the final simulation has ended.
+
+Note: This script assumes that the ns-3 simulator is already installed and the necessary dependencies are met.
+'''
+
+import argparse
+import logging
+
+from ns3ai_gym_env.envs.ns3_multi_agent_environment import Ns3MultiAgentEnv
+
+logging.basicConfig(level=logging.INFO)  # verbosity can be reduced by changing this to warning
+logger = logging.getLogger(__name__)
+
+parser = argparse.ArgumentParser()
+parser.add_argument("--ns3Path", type=str, required=True, help="Path to the ns3 root directory.")
+parser.add_argument("--numAgents", type=int, default=3, help="Number of agents in the simulation.")
+parser.add_argument("--numSimulations", type=int, default=10, help="Number of simulations to run.")
+args = parser.parse_args()
+
+targetName = "ns3ai_multi-agent"
+ns3Settings: dict[str] = {"numAgents": args.numAgents, "seedRunNumber": 1}
+
+env = Ns3MultiAgentEnv(targetName=targetName, ns3Path=args.ns3Path, ns3Settings=ns3Settings)
+
+
+for simulation in range(args.numSimulations):
+    simulation_reward = 0
+    terminated = truncated = False
+    step_count = 0
+    obs, info = env.reset()
+    while not terminated and not truncated:
+        action = {}
+        for agent_id, agent_obs in obs.items():
+            action[agent_id] = env.action_space[agent_id].sample()
+        obs, reward, terminated, truncated, info = env.step(action)
+        simulation_reward += list(reward.values())[0] if len(list(reward.values())) > 0 else 0
+        step_count += 1
+        terminated = terminated["__all__"]
+        truncated = truncated["__all__"]
+    print(f"simulation {simulation} completed - mean reward: {simulation_reward / step_count}")
+
+env.close()
diff --git a/examples/multi-agent/multi-agent-train.py b/examples/multi-agent/multi-agent-train.py
new file mode 100644
index 0000000..7cb507b
--- /dev/null
+++ b/examples/multi-agent/multi-agent-train.py
@@ -0,0 +1,97 @@
+"""
+This example demonstrates how a reinforcement learning library (Ray RLlib) can be used to train multiple agents in a ns3-simulation with DQN.
+The script performs the following steps in order to train the agents:
+1. Imports the necessary libraries and modules.
+2. Registers the environment using ray tune.
+3. Configures the training algorithm.
+4. Trains the agents for multiple iterations and prints relevant metrics.
+
+These are the most essential steps to train in any multi-agent environment using Ray RLlib. For advanced usage like hyperparameter tuning please refer to the Ray RLlib documentation and the advanced usage section of this modules documentation.
+
+Note: Some external libraries like Ray RLlib or Tensorflow are required to run this script.
+"""
+
+import argparse
+from pprint import pprint as pp
+
+from ns3ai_gym_env.envs.ns3_multi_agent_environment import Ns3MultiAgentEnv
+from ray.rllib.algorithms.dqn import DQNConfig
+from ray.rllib.policy.policy import PolicySpec
+from ray.tune import register_env
+
+parser = argparse.ArgumentParser()
+parser.add_argument("--ns3Path", type=str, required=True, help="Path to the ns3 root directory.")
+parser.add_argument("--checkpointPath", type=str, required=True, help="Path to the checkpoint to restore.")
+parser.add_argument("--numAgents", type=int, default=3, help="Number of agents in the simulation.")
+parser.add_argument("--numIterations", type=int, default=50, help="Number of training iterations to run.")
+args = parser.parse_args()
+
+targetName = "ns3ai_multi-agent"
+ns3Settings: dict[str] = {"numAgents": args.numAgents, "seedRunNumber": 1}
+
+env = Ns3MultiAgentEnv(targetName=targetName, ns3Path=args.ns3Path, ns3Settings=ns3Settings)
+env_obs_space = env.observation_space
+env_act_space = env.action_space
+env.close()
+
+register_env(
+    "Multi-Agent-Env",
+    lambda _: Ns3MultiAgentEnv(
+        targetName=targetName,
+        ns3Path=args.ns3Path,
+        ns3Settings=ns3Settings,
+    ),
+)
+
+replay_config = {
+    "type": "MultiAgentPrioritizedReplayBuffer",
+    "capacity": 60000,
+    "prioritized_replay_alpha": 0.5,
+    "prioritized_replay_beta": 0.5,
+    "prioritized_replay_eps": 3e-6,
+}
+
+config = (
+    DQNConfig()
+    .training(train_batch_size=1024, replay_buffer_config=replay_config)
+    .resources(num_gpus=0)
+    .rollouts(num_rollout_workers=1, batch_mode="complete_episodes")
+    .environment("Multi-Agent-Env")
+    .framework("tf2")
+    .multi_agent(
+        policies={
+            agent_id: PolicySpec(
+                observation_space=env_obs_space[agent_id],
+                action_space=env_act_space[agent_id],
+            )
+            for agent_id in env_obs_space.keys()
+        },
+        policy_mapping_fn=lambda agent_id, episode, worker, **kwargs: agent_id,
+    )
+    .debugging(log_level="ERROR")
+)
+
+algo = config.build()
+
+metrics_to_print = [
+    "episode_reward_mean",
+    "episode_reward_max",
+    "episode_reward_min",
+    "counters",
+]
+
+for i in range(args.numIterations):
+    print(f"New training iteration {i} started:")
+    result = algo.train()
+    pp({k: v for k, v in result.items() if k in metrics_to_print})
+
+# checkpointing
+save_result = algo.save(args.checkpointPath)
+path_to_checkpoint = save_result.checkpoint.path
+print(
+    "An Algorithm checkpoint has been created inside directory: "
+    f"'{path_to_checkpoint}'."
+)
+
+# final cleanup to free resources
+algo.cleanup()
diff --git a/examples/multi-agent/multi-agent.cc b/examples/multi-agent/multi-agent.cc
new file mode 100644
index 0000000..0bb69a6
--- /dev/null
+++ b/examples/multi-agent/multi-agent.cc
@@ -0,0 +1,127 @@
+#include <ns3/ai-module.h>
+
+#include <cmath>
+#include <cstdint>
+#include <cstdlib>
+#include <vector>
+
+using namespace ns3;
+
+class Agent
+{
+  public:
+    Agent(){};
+
+    Agent(const std::string id, int number, Time stepTime)
+        : m_id(id),
+          m_number(number),
+          m_stepTime(stepTime)
+    {
+    }
+
+    ~Agent()
+    {
+    }
+
+    void ExecuteAction(Ptr<OpenGymDataContainer> action)
+    {
+        // actions that are passed to the agent by the interface are abstract
+        // OpenGymDataContainer objects and need to be transformed to the actual object type that
+        // corresponds to the action space of the agent
+        m_number += DynamicCast<OpenGymDiscreteContainer>(action)->GetValue() - 5;
+    }
+
+    Ptr<OpenGymDataContainer> GetObservation() const
+    {
+        auto shape = std::vector<uint32_t>{1};
+        auto observation = CreateObject<OpenGymBoxContainer<int>>(
+            shape); // Create a 1-dimensional
+                    // container that holds the agents observation
+        observation->AddValue(m_number);
+        return observation;
+    }
+
+    double GetReward() const
+    {
+        return -abs(m_number); // The goal of the agent is it to reach the number 0
+    }
+
+    Ptr<OpenGymSpace> GetObservationSpace()
+    {
+        auto type = TypeNameGet<int>();
+        auto shape = std::vector<uint32_t>{1};
+        auto obsSpace = CreateObject<OpenGymBoxSpace>(-INFINITY, INFINITY, shape, type);
+        return obsSpace;
+    }
+
+    Ptr<OpenGymSpace> GetActionSpace()
+    {
+        auto actionSpace = CreateObject<OpenGymDiscreteSpace>(10);
+        return actionSpace;
+    }
+
+    void Step()
+    {
+        OpenGymMultiAgentInterface::Get()->NotifyCurrentState(
+            m_id,
+            GetObservation(),
+            GetReward(),
+            false,
+            {},
+            Seconds(0),
+            MakeCallback(&Agent::ExecuteAction, this));
+        Simulator::Schedule(m_stepTime, &Agent::Step, this);
+    }
+
+  private:
+    const std::string m_id;
+    int m_number;
+    Time m_stepTime;
+};
+
+int
+main(int argc, char* argv[])
+{
+    int numAgents = 2;
+    int seedRunNumber = 1;
+    CommandLine cmd;
+    cmd.AddValue("numAgents", "Number of agents that act in the environment", numAgents);
+    cmd.AddValue("seedRunNumber",
+                 "Counts how often the environment has been reset (used for seeding)",
+                 seedRunNumber);
+    cmd.Parse(argc, argv);
+
+    RngSeedManager::SetSeed(42);
+    RngSeedManager::SetRun(seedRunNumber);
+
+    auto randomNumber = CreateObject<UniformRandomVariable>();
+    randomNumber->SetAttribute("Min", DoubleValue(-42));
+    randomNumber->SetAttribute("Max", DoubleValue(42));
+
+    std::vector<Agent*> agents;
+    for (int i = 0; i < numAgents; i++)
+    {
+        std::string id = "agent_" + std::to_string(i);
+        int number = randomNumber->GetInteger();
+        Time stepTime = Seconds(1);
+        auto agent = new Agent(id, number, stepTime);
+        agents.emplace_back(agent);
+
+        OpenGymMultiAgentInterface::Get()->SetGetObservationSpaceCb(
+            id,
+            MakeCallback(&Agent::GetObservationSpace, agents[i]));
+        OpenGymMultiAgentInterface::Get()->SetGetActionSpaceCb(
+            id,
+            MakeCallback(&Agent::GetActionSpace, agents[i]));
+    }
+
+    for (const auto agent : agents)
+    {
+        Simulator::Schedule(Seconds(0), &Agent::Step, agent);
+    }
+
+    Simulator::Stop(Seconds(60));
+    Simulator::Run();
+    Simulator::Destroy();
+    OpenGymMultiAgentInterface::Get()->NotifySimulationEnd(-100, {});
+}
diff --git a/model/gym-interface/cpp/ns3-ai-multi-agent-gym-interface.cc b/model/gym-interface/cpp/ns3-ai-multi-agent-gym-interface.cc
new file mode 100644
index 0000000..257552c
--- /dev/null
+++ b/model/gym-interface/cpp/ns3-ai-multi-agent-gym-interface.cc
@@ -0,0 +1,257 @@
+#include "ns3-ai-multi-agent-gym-interface.h"
+
+#include "container.h"
+#include "messages.pb.h"
+#include "ns3-ai-gym-env.h"
+#include "spaces.h"
+
+#include <ns3/config.h>
+#include <ns3/log.h>
+#include <ns3/simulator.h>
+
+namespace ns3
+{
+
+NS_LOG_COMPONENT_DEFINE("OpenGymMultiAgentInterface");
+NS_OBJECT_ENSURE_REGISTERED(OpenGymMultiAgentInterface);
+
+OpenGymMultiAgentInterface::OpenGymMultiAgentInterface()
+    : m_simEnd(false),
+      m_stopEnvRequested(false),
+      m_initSimMsgSent(false)
+{
+    auto interface = Ns3AiMsgInterface::Get();
+    interface->SetIsMemoryCreator(false);
+    interface->SetUseVector(false);
+    interface->SetHandleFinish(false);
+}
+
+OpenGymMultiAgentInterface::~OpenGymMultiAgentInterface()
+{
+}
+
+TypeId
+OpenGymMultiAgentInterface::GetTypeId()
+{
+    static TypeId tid = TypeId("OpenGymMultiAgentInterface")
+                            .SetParent<Object>()
+                            .SetGroupName("OpenGym")
+                            .AddConstructor<OpenGymMultiAgentInterface>();
+    return tid;
+}
+
+void
+OpenGymMultiAgentInterface::Init()
+{
+    // do not send init msg twice
+    if (m_initSimMsgSent)
+    {
+        return;
+    }
+    m_initSimMsgSent = true;
+
+    ns3_ai_gym::MultiAgentSimInitMsg simInitMsg;
+
+    // obs space
+    for (const auto& [key, value] : GetObservationSpace())
+    {
+        (*simInitMsg.mutable_obsspaces())[key] = value->GetSpaceDescription();
+    }
+
+    // action space
+    for (const auto& [key, value] : GetActionSpace())
+    {
+        (*simInitMsg.mutable_actspaces())[key] = value->GetSpaceDescription();
+    }
+
+    // get the interface
+    Ns3AiMsgInterfaceImpl<Ns3AiGymMsg, Ns3AiGymMsg>* msgInterface =
+        Ns3AiMsgInterface::Get()->GetInterface<Ns3AiGymMsg, Ns3AiGymMsg>();
+
+    // send init msg to python
+    msgInterface->CppSendBegin();
+    msgInterface->GetCpp2PyStruct()->size = simInitMsg.ByteSizeLong();
+    assert(msgInterface->GetCpp2PyStruct()->size <= MSG_BUFFER_SIZE);
+    simInitMsg.SerializeToArray(msgInterface->GetCpp2PyStruct()->buffer,
+                                msgInterface->GetCpp2PyStruct()->size);
+    msgInterface->CppSendEnd();
+
+    // receive init ack msg from python
+    ns3_ai_gym::SimInitAck simInitAck;
+    msgInterface->CppRecvBegin();
+    simInitAck.ParseFromArray(msgInterface->GetPy2CppStruct()->buffer,
+                              msgInterface->GetPy2CppStruct()->size);
+    msgInterface->CppRecvEnd();
+
+    bool done = simInitAck.done();
+    NS_LOG_DEBUG("Sim Init Ack: " << done);
+    bool stopSim = simInitAck.stopsimreq();
+    if (stopSim)
+    {
+        NS_LOG_DEBUG("---Stop requested: " << stopSim);
+        m_stopEnvRequested = true;
+        Simulator::Stop();
+        Simulator::Destroy();
+        std::exit(0);
+    }
+}
+
+void
+OpenGymMultiAgentInterface::NotifyCurrentState(
+    const std::string agentId,
+    Ptr<OpenGymDataContainer> obsDataContainer,
+    float reward,
+    bool isGameOver,
+    const std::map<std::string, std::string>& extraInfo,
+    Time actionDelay,
+    Callback<void, Ptr<OpenGymDataContainer>> actionCallback)
+{
+    if (!m_initSimMsgSent)
+    {
+        Init();
+    }
+    if (m_stopEnvRequested)
+    {
+        return;
+    }
+    ns3_ai_gym::MultiAgentEnvStateMsg envStateMsg;
+    // observation
+    ns3_ai_gym::DataContainer obsDataContainerPbMsg;
+    if (obsDataContainer)
+    {
+        obsDataContainerPbMsg = obsDataContainer->GetDataContainerPbMsg();
+        envStateMsg.mutable_obsdata()->CopyFrom(obsDataContainerPbMsg);
+    }
+    // agent
+    envStateMsg.set_agentid(agentId);
+    // reward
+    envStateMsg.set_reward(reward);
+    // game over
+    envStateMsg.set_isgameover(false);
+    if (isGameOver)
+    {
+        envStateMsg.set_isgameover(true);
+        if (m_simEnd)
+        {
+            envStateMsg.set_reason(ns3_ai_gym::MultiAgentEnvStateMsg::SimulationEnd);
+        }
+        else
+        {
+            envStateMsg.set_reason(ns3_ai_gym::MultiAgentEnvStateMsg::GameOver);
+        }
+    }
+    // extra info
+    for (const auto& [key, value] : extraInfo)
+    {
+        (*envStateMsg.mutable_info())[key] = value;
+    }
+
+    // get the interface
+    Ns3AiMsgInterfaceImpl<Ns3AiGymMsg, Ns3AiGymMsg>* msgInterface =
+        Ns3AiMsgInterface::Get()->GetInterface<Ns3AiGymMsg, Ns3AiGymMsg>();
+
+    // send env state msg to python
+    msgInterface->CppSendBegin();
+    msgInterface->GetCpp2PyStruct()->size = envStateMsg.ByteSizeLong();
+    assert(msgInterface->GetCpp2PyStruct()->size <= MSG_BUFFER_SIZE);
+    envStateMsg.SerializeToArray(msgInterface->GetCpp2PyStruct()->buffer,
+                                 msgInterface->GetCpp2PyStruct()->size);
+
+    msgInterface->CppSendEnd();
+
+    // receive act msg from python
+    ns3_ai_gym::EnvActMsg envActMsg;
+    msgInterface->CppRecvBegin();
+
+    envActMsg.ParseFromArray(msgInterface->GetPy2CppStruct()->buffer,
+                             msgInterface->GetPy2CppStruct()->size);
+    msgInterface->CppRecvEnd();
+
+    if (m_simEnd)
+    {
+        return;
+    }
+
+    bool stopSim = envActMsg.stopsimreq();
+    if (stopSim)
+    {
+        NS_LOG_DEBUG("---Stop requested: " << stopSim);
+        m_stopEnvRequested = true;
+        Simulator::Stop();
+        Simulator::Destroy();
+        NS_ABORT_MSG("Simulation stopped!");
+    }
+
+    // first step after reset is called without actions, just to get current state
+    ns3_ai_gym::DataContainer actDataContainerPbMsg = envActMsg.actdata();
+    auto action = OpenGymDataContainer::CreateFromDataContainerPbMsg(actDataContainerPbMsg);
+    Simulator::Schedule(actionDelay, actionCallback.Bind(action));
+}
+
+void
+OpenGymMultiAgentInterface::WaitForStop(float reward,
+                                        bool isGameOver,
+                                        const std::map<std::string, std::string>& extraInfo)
+{
+    NS_LOG_FUNCTION(this);
+
+    NotifyCurrentState(
+        "",
+        {},
+        reward,
+        isGameOver,
+        extraInfo,
+        Seconds(0),
+        *[](Ptr<OpenGymDataContainer>) {});
+}
+
+void
+OpenGymMultiAgentInterface::NotifySimulationEnd(float reward,
+                                                const std::map<std::string, std::string>& extraInfo)
+{
+    NS_LOG_FUNCTION(this);
+    m_simEnd = true;
+    if (m_initSimMsgSent)
+    {
+        WaitForStop(reward, true, extraInfo);
+    }
+}
+
+std::map<std::string, Ptr<OpenGymSpace>>
+OpenGymMultiAgentInterface::GetActionSpace()
+{
+    NS_LOG_FUNCTION(this);
+    std::map<std::string, Ptr<OpenGymSpace>> actionSpace;
+    for (const auto& [agentId, callback] : m_actionSpaceCbs)
+    {
+        actionSpace[agentId] = callback();
+    }
+    return actionSpace;
+}
+
+std::map<std::string, Ptr<OpenGymSpace>>
+OpenGymMultiAgentInterface::GetObservationSpace()
+{
+    NS_LOG_FUNCTION(this);
+    std::map<std::string, Ptr<OpenGymSpace>> obsSpace;
+    for (const auto& [agentId, callback] : m_observationSpaceCbs)
+    {
+        obsSpace[agentId] = callback();
+    }
+    return obsSpace;
+}
+
+void
+OpenGymMultiAgentInterface::SetGetActionSpaceCb(std::string agentId, Callback<Ptr<OpenGymSpace>> cb)
+{
+    m_actionSpaceCbs[agentId] = cb;
+}
+
+void
+OpenGymMultiAgentInterface::SetGetObservationSpaceCb(std::string agentId,
+                                                     Callback<Ptr<OpenGymSpace>> cb)
+{
+    m_observationSpaceCbs[agentId] = cb;
+}
+
+} // namespace ns3
diff --git a/model/gym-interface/cpp/ns3-ai-multi-agent-gym-interface.h b/model/gym-interface/cpp/ns3-ai-multi-agent-gym-interface.h
new file mode 100644
index 0000000..e6df4ee
--- /dev/null
+++ b/model/gym-interface/cpp/ns3-ai-multi-agent-gym-interface.h
@@ -0,0 +1,59 @@
+#ifndef NS3_AI_MULTI_AGENT_GYM_INTERFACE_H
+#define NS3_AI_MULTI_AGENT_GYM_INTERFACE_H
+
+#include "../ns3-ai-gym-msg.h"
+
+#include <ns3/ai-module.h>
+#include <ns3/callback.h>
+#include <ns3/core-module.h>
+#include <ns3/object.h>
+#include <ns3/ptr.h>
+#include <ns3/type-id.h>
+
+namespace ns3
+{
+
+class OpenGymSpace;
+class OpenGymDataContainer;
+class OpenGymEnv;
+
+class OpenGymMultiAgentInterface : public Singleton<OpenGymMultiAgentInterface>, public Object
+{
+  public:
+    OpenGymMultiAgentInterface();
+    ~OpenGymMultiAgentInterface() override;
+    static TypeId GetTypeId();
+
+    void Init();
+    void NotifyCurrentState(const std::string agentId,
+                            Ptr<OpenGymDataContainer> obsDataContainer,
+                            float reward,
+                            bool isGameOver,
+                            const std::map<std::string, std::string>& extraInfo,
+                            Time actionDelay,
+                            Callback<void, Ptr<OpenGymDataContainer>> actionCallback);
+    void WaitForStop(float reward,
+                     bool isGameOver,
+                     const std::map<std::string, std::string>& extraInfo = {});
+    void NotifySimulationEnd(float reward = 0,
+                             const std::map<std::string, std::string>& extraInfo = {});
+
+    std::map<std::string, Ptr<OpenGymSpace>> GetActionSpace();
+    std::map<std::string, Ptr<OpenGymSpace>> GetObservationSpace();
+
+    void SetGetActionSpaceCb(std::string agentId, Callback<Ptr<OpenGymSpace>> cb);
+    void SetGetObservationSpaceCb(std::string agentId, Callback<Ptr<OpenGymSpace>> cb);
+
+  private:
+
+    bool m_simEnd;
+    bool m_stopEnvRequested;
+    bool m_initSimMsgSent;
+
+    std::map<std::string, Callback<Ptr<OpenGymSpace>>> m_actionSpaceCbs;
+    std::map<std::string, Callback<Ptr<OpenGymSpace>>> m_observationSpaceCbs;
+};
+
+} // end of namespace ns3
+
+#endif // NS3_AI_MULTI_AGENT_GYM_INTERFACE_H
diff --git a/model/gym-interface/messages.proto b/model/gym-interface/messages.proto
index 9045ec5..d3cf150 100644
--- a/model/gym-interface/messages.proto
+++ b/model/gym-interface/messages.proto
@@ -117,6 +117,25 @@ message EnvStateMsg {
 	string info = 5;
 }
 
+message MultiAgentSimInitMsg {
+	map<string, SpaceDescription> obsSpaces = 1;
+	map<string, SpaceDescription> actSpaces = 2;
+}
+
+message MultiAgentEnvStateMsg {
+	DataContainer obsData = 1;
+	float reward = 2;
+	bool isGameOver = 3;
+
+	enum Reason {
+		SimulationEnd = 0;
+		GameOver = 1;
+	}
+	Reason reason = 4;
+	map<string, string> info = 5;
+	string agentID = 6;
+}
+
 message EnvActMsg {
 	DataContainer actData = 1;
 	bool stopSimReq = 2;
diff --git a/model/gym-interface/py/CMakeLists.txt b/model/gym-interface/py/CMakeLists.txt
index 58451ef..972dc64 100644
--- a/model/gym-interface/py/CMakeLists.txt
+++ b/model/gym-interface/py/CMakeLists.txt
@@ -1,4 +1,5 @@
 pybind11_add_module(ns3ai_gym_msg_py msg_py_binding.cc)
+target_link_libraries(ns3ai_gym_msg_py PRIVATE ${libcore})
 set_target_properties(ns3ai_gym_msg_py PROPERTIES
         LIBRARY_OUTPUT_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR})
 
diff --git a/model/gym-interface/py/ns3ai_gym_env/envs/__init__.py b/model/gym-interface/py/ns3ai_gym_env/envs/__init__.py
index 4cbc3d7..4e6c528 100644
--- a/model/gym-interface/py/ns3ai_gym_env/envs/__init__.py
+++ b/model/gym-interface/py/ns3ai_gym_env/envs/__init__.py
@@ -1 +1,4 @@
 from ns3ai_gym_env.envs.ns3_environment import Ns3Env
+from ns3ai_gym_env.envs.ns3_multi_agent_environment import Ns3MultiAgentEnv
+
+__all__ = ["Ns3Env", "Ns3MultiAgentEnv"]
diff --git a/model/gym-interface/py/ns3ai_gym_env/envs/ns3_multi_agent_environment.py b/model/gym-interface/py/ns3ai_gym_env/envs/ns3_multi_agent_environment.py
new file mode 100644
index 0000000..7f3ca0d
--- /dev/null
+++ b/model/gym-interface/py/ns3ai_gym_env/envs/ns3_multi_agent_environment.py
@@ -0,0 +1,113 @@
+from typing import Any, Literal, TypeVar
+
+import messages_pb2 as pb
+import ns3ai_gym_msg_py as py_binding
+from gymnasium import spaces
+from ray.rllib.env.multi_agent_env import MultiAgentEnv
+
+from ns3ai_gym_env.typing import copy_signature_from
+
+from .ns3_environment import Ns3Env
+
+T = TypeVar("T")
+
+
+class Ns3MultiAgentEnv(Ns3Env, MultiAgentEnv):
+    @copy_signature_from(Ns3Env.__init__)
+    def __init__(self, *args: Any, **kwargs: Any) -> None:
+        self.action_space: spaces.Dict = spaces.Dict()
+        self.observation_space: spaces.Dict = spaces.Dict()
+        self.agent_selection: str | None = None
+        super().__init__(*args, **kwargs)
+        MultiAgentEnv.__init__(self)
+
+    def initialize_env(self) -> Literal[True]:
+        init_msg = pb.MultiAgentSimInitMsg()
+        self.msgInterface.PyRecvBegin()
+        request = self.msgInterface.GetCpp2PyStruct().get_buffer()
+        init_msg.ParseFromString(request)
+        self.msgInterface.PyRecvEnd()
+
+        for agent, space in init_msg.actSpaces.items():
+            self.action_space[agent] = self._create_space(space)
+
+        for agent, space in init_msg.obsSpaces.items():
+            self.observation_space[agent] = self._create_space(space)
+        self._agent_ids = list(self.action_space.keys())
+        reply = pb.SimInitAck()
+        reply.done = True
+        reply.stopSimReq = False
+        reply_str = reply.SerializeToString()
+        assert len(reply_str) <= py_binding.msg_buffer_size
+
+        self.msgInterface.PySendBegin()
+        self.msgInterface.GetPy2CppStruct().size = len(reply_str)
+        self.msgInterface.GetPy2CppStruct().get_buffer_full()[: len(reply_str)] = reply_str
+        self.msgInterface.PySendEnd()
+        return True
+
+    def rx_env_state(self) -> None:
+        if self.newStateRx:
+            return
+
+        state_msg = pb.MultiAgentEnvStateMsg()
+        self.msgInterface.PyRecvBegin()
+        request = self.msgInterface.GetCpp2PyStruct().get_buffer()
+        state_msg.ParseFromString(request)
+        self.msgInterface.PyRecvEnd()
+
+        self.obsData = self._create_data(state_msg.obsData)
+        self.reward = state_msg.reward
+        self.gameOver = state_msg.isGameOver
+        self.gameOverReason = state_msg.reason
+        self.agent_selection = state_msg.agentID
+
+        if self.gameOver:
+            self.send_close_command()
+
+        self.extraInfo = dict(state_msg.info)
+
+        self.newStateRx = True
+
+    def send_actions(self, actions: dict[str, Any]) -> bool:
+        assert self.agent_selection
+        reply = pb.EnvActMsg()
+
+        action_msg = self._pack_data(actions[self.agent_selection], self.action_space[self.agent_selection])
+        reply.actData.CopyFrom(action_msg)
+
+        reply_msg = reply.SerializeToString()
+        assert len(reply_msg) <= py_binding.msg_buffer_size
+        self.msgInterface.PySendBegin()
+        self.msgInterface.GetPy2CppStruct().size = len(reply_msg)
+        self.msgInterface.GetPy2CppStruct().get_buffer_full()[: len(reply_msg)] = reply_msg
+        self.msgInterface.PySendEnd()
+        self.newStateRx = False
+        return True
+
+    def wrap(self, data: T) -> dict[str, T]:
+        assert self.agent_selection is not None
+        return {self.agent_selection: data}
+
+    def step(self, actions: dict[str, Any]) -> tuple[dict[str, Any], ...]:
+        obs, rew, terminateds, truncateds, info = tuple(self.wrap(state) for state in super().step(actions))
+        terminateds["__all__"] = all(terminated for terminated in terminateds.values())
+        truncateds["__all__"] = all(truncated for truncated in truncateds.values())
+        obs.pop("", "")
+        rew.pop("", "")
+        terminateds.pop("", "")
+        truncateds.pop("", "")
+        info.pop("", "")
+        return obs, rew, terminateds, truncateds, info
+
+    def reset(
+        self,
+        *,
+        seed: int | None = None,
+        options: dict | None = None,
+    ) -> tuple[dict[str, Any], dict[str, dict[str, Any]]]:
+        return tuple(self.wrap(state) for state in super().reset(seed, options))
+
+    def get_random_action(self) -> Any:
+        assert self.agent_selection is not None
+        return self.action_space[self.agent_selection].sample()
diff --git a/model/gym-interface/py/ns3ai_gym_env/typing.py b/model/gym-interface/py/ns3ai_gym_env/typing.py
new file mode 100644
index 0000000..9ec3d80
--- /dev/null
+++ b/model/gym-interface/py/ns3ai_gym_env/typing.py
@@ -0,0 +1,12 @@
+from collections.abc import Callable
+from typing import Any, ParamSpec, TypeVar, cast
+
+T = TypeVar("T")
+P = ParamSpec("P")
+
+
+def copy_signature_from(_origin: Callable[P, Any]) -> Callable[[Callable[..., T]], Callable[P, T]]:
+    def decorator(target: Callable[..., T]) -> Callable[P, T]:
+        return cast(Callable[P, T], target)
+
+    return decorator
diff --git a/python_utils/ns3ai_utils.py b/python_utils/ns3ai_utils.py
index efe605a..f723563 100644
--- a/python_utils/ns3ai_utils.py
+++ b/python_utils/ns3ai_utils.py
@@ -17,6 +17,7 @@
 #         Hao Yin <haoyin@uw.edu>
 #         Muyuan Shen <muyuan_shen@hust.edu.cn>
 
+import logging
 import os
 import subprocess
 import psutil
@@ -24,6 +25,9 @@
 import signal
 
 
+logger = logging.getLogger(__name__)
+
+
 SIMULATION_EARLY_ENDING = 0.5   # wait and see if the subprocess is running after creation
 
 
@@ -61,7 +65,7 @@ def run_single_ns3(path, pname, setting=None, env=None, show_output=False):
 
 # used to kill the ns-3 script process and its child processes
 def kill_proc_tree(p, timeout=None, on_terminate=None):
-    print('ns3ai_utils: Killing subprocesses...')
+    logger.info('ns3ai_utils: Killing subprocesses...')
     if isinstance(p, int):
         p = psutil.Process(p)
     elif not isinstance(p, psutil.Process):
@@ -134,12 +138,12 @@ def __init__(self, targetName, ns3Path, msgModule,
 
         self.proc = None
         self.simCmd = None
-        print('ns3ai_utils: Experiment initialized')
+        logger.info('ns3ai_utils: Experiment initialized')
 
     def __del__(self):
         self.kill()
         del self.msgInterface
-        print('ns3ai_utils: Experiment destroyed')
+        logger.info('ns3ai_utils: Experiment destroyed')
 
     # run ns3 script in cmd with the setting being input
     # \param[in] setting : ns3 script input parameters(default : None)
@@ -147,12 +151,16 @@ def __del__(self):
     def run(self, setting=None, show_output=False):
         self.kill()
         self.simCmd, self.proc = run_single_ns3(
-            './', self.targetName, setting=setting, show_output=show_output)
-        print("ns3ai_utils: Running ns-3 with: ", self.simCmd)
+            "./",
+            self.targetName,
+            setting=setting,
+            show_output=show_output
+        )
+        logger.info("ns3ai_utils: Running ns-3 with: %s", self.simCmd)
         # exit if an early error occurred, such as wrong target name
         time.sleep(SIMULATION_EARLY_ENDING)
         if not self.isalive():
-            print('ns3ai_utils: Subprocess died very early')
+            logger.info('ns3ai_utils: Subprocess died very early')
             exit(1)
         signal.signal(signal.SIGINT, sigint_handler)
         return self.msgInterface