diff --git a/.gitignore b/.gitignore
index 2ee1107b..a1124042 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1402,10 +1402,16 @@ htmlcov
 *.lock
 .coverage*
 
-project_test.py
+## DI-engine
+*total_config.py
+openaigym*
+
+## Carla
 *.csv
 *.avi
+
+.vscode
+project_test.py
 *episode_metainfo.json
 *measurements.lmdb
 *index.txt
-openaigym*
diff --git a/CHANGELOG b/CHANGELOG
index 620eb674..227dd7e2 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,3 +1,9 @@
+## v0.3.3 (2022.6.5)
+- Update readme, reorganize info
+- Add MetaDriveTrajEnv and doc
+- Add utils for MetaDrive
+- Modify utils for macro env
+   
 ## v0.3.2 (2022.4.24)
 - Update banner logo
 - Update to DI-engine 0.3, modify env properties
diff --git a/README.md b/README.md
index 71caaef1..e4faa8fc 100644
--- a/README.md
+++ b/README.md
@@ -2,21 +2,11 @@
 
 <img src="./docs/figs/di-drive_banner.png" alt="icon"/>
 
-Updated on 2022.4.16 DI-drive-v0.3.2 (beta)
-
-DI-drive - Decision Intelligence Platform for Autonomous Driving simulation.
-
-DI-drive is application platform under [OpenDILab](http://opendilab.org/)
-
-![icon](./docs/figs/big_cam_auto.png)
-
-## Introduction
-
-**DI-drive** is an open-source application platform under **OpenDILab**. DI-drive applies different simulator/datasets/cases in **Decision Intelligence** Training & Testing for **Autonomous Driving** Policy.
+**DI-drive** is an open-source Decision Intelligence Platform for Autonomous Driving simulation. DI-drive applies different simulators/datasets/cases in **Decision Intelligence** Training & Testing for **Autonomous Driving** Policy.
 It aims to
 
 - run Imitation Learning, Reinforcement Learning, GAIL etc. in a single platform and simple unified entry
-- apply Decision Intelligence in any parts of driving simulation
+- apply Decision Intelligence in any part of the driving simulation
 - suit most of the driving simulators input & output
 - run designed driving cases and scenarios
 
@@ -24,22 +14,61 @@ and most importantly, to **put these all together!**
 
 **DI-drive** uses [DI-engine](https://github.com/opendilab/DI-engine), a Reinforcement Learning
 platform to build most of the running modules and demos. **DI-drive** currently supports [Carla](http://carla.org),
-an open-source Autonomous Drining simulator to operate driving simualtion, and [MetaDrive](https://decisionforce.github.io/metadrive/),
-a diverse driving scenarios for Generalizable Reinforcement Learning. Users can specify any of them to run in global config under `core`.
+an open-source Autonomous Driving simulator to operate driving simulation, and [MetaDrive](https://decisionforce.github.io/metadrive/),
+a diverse driving scenarios for Generalizable Reinforcement Learning. DI-drive is an application platform under [OpenDILab](http://opendilab.org/)
+
+![icon](./docs/figs/big_cam_auto.png)
+<p align="center"> Visualization of Carla driving in DI-drive </p>
+
+## Outline
+
+ * [Installation](#installation)
+ * [Quick Start](#quick-start)
+ * [Model Zoo](#model-zoo)
+ * [Casezoo](#di-drive-casezoo)
+ * [File Structure](#file-structure)
+ * [Contributing](#contributing)
+ * [License](#license)
+ * [Citation](#citation)
 
 ## Installation
 
-**DI-drive** needs to have the following modules installed:
+**DI-drive** runs with **Python >= 3.5** and **DI-engine >= 0.3.1** (Pytorch is needed in DI-engine). You can install DI-drive from the source code:
+
+```bash
+git clone https://gitlab.bj.sensetime.com/open-XLab/cell/xad.git
+cd xad
+pip install -e .
+```
+
+DI-engine and Pytorch will be installed automatically.
+
+In addition, at least one simulator in Carla and MetaDrive need to be able to run in DI-drive. [MetaDrive](https://decisionforce.github.io/metadrive/) can be easily installed via `pip`.
+If [Carla](http://carla.org) server is used for simulation, users need to install 'Carla Python API' in addition. You can use either one of them or both. Make sure to modify the activated simulators in `core.__init__.py` to avoid import error.
 
-- Pytorch
-- DI-engine
+Please refer to the [installation guide](https://opendilab.github.io/DI-drive/installation/index.html) for details about the installation of **DI-drive**.
 
-[MetaDrive](https://decisionforce.github.io/metadrive/) can be easily installed via `pip`.
-If [Carla](http://carla.org) server is used for simulation, users need to install 'Carla Python API' in addition.
-Please refer to the [documentation](https://opendilab.github.io/DI-drive/) for details about installation and user guide of **DI-drive**.
-We provide IL and RL tutorials, and full guidance for quick run existing policy for beginners.
+## Quick Start
 
-Please refer to [FAQ](https://opendilab.github.io/DI-drive/faq/index.html) for frequently asked questions.
+### Carla
+
+Users can check the installation of Carla and watch the visualization by running an 'auto' policy in provided town map. You need to start a Carla server first and modify the Carla host and port in `auto_run.py` into yours. Then run:
+
+```bash
+cd demo/auto_run
+python auto_run.py
+```
+
+### MetaDrive
+
+After installation of MetaDrive, you can start an RL training in MetaDrive Macro Environment by running the following code:
+
+```bash
+cd demo/metadrive
+python macro_env_dqn_train.py.
+```
+
+We provide detail guidance for IL and RL experiments in all simulators and quick run of existing policy for beginners in our [documentation](https://opendilab.github.io/DI-drive/). Please refer to it if you have further questions.
 
 ## Model Zoo
 
@@ -58,15 +87,99 @@ Please refer to [FAQ](https://opendilab.github.io/DI-drive/faq/index.html) for f
 
 ## DI-drive Casezoo
 
-**DI-drive Casezoo** is a scenario set for training and testing of Autonomous Driving policy in simulator.
-**Casezoo** combines data collected by real vehicles and Shanghai Lingang road license test Scenarios.
-**Casezoo** supports both evaluating and training, whick makes the simulation closer to real driving.
+**DI-drive Casezoo** is a scenario set for training and testing the Autonomous Driving policy in simulator.
+**Casezoo** combines data collected from actual vehicles and Shanghai Lingang road license test Scenarios.
+**Casezoo** supports both evaluating and training, which makes the simulation closer to real driving.
 
 Please see [casezoo instruction](docs/casezoo_instruction.md) for details about **Casezoo**.
 
-## Contributing
+## File Structure
+
+```
+DI-drive
+|-- .gitignore
+|-- .style.yapf
+|-- CHANGELOG
+|-- LICENSE
+|-- README.md
+|-- format.sh
+|-- setup.py
+|-- core
+|   |-- data
+|   |   |-- base_collector.py
+|   |   |-- benchmark_dataset_saver.py
+|   |   |-- bev_vae_dataset.py
+|   |   |-- carla_benchmark_collector.py
+|   |   |-- cict_dataset.py
+|   |   |-- cilrs_dataset.py
+|   |   |-- lbc_dataset.py
+|   |   |-- benchmark
+|   |   |-- casezoo
+|   |   |-- srunner
+|   |-- envs
+|   |   |-- base_drive_env.py
+|   |   |-- drive_env_wrapper.py
+|   |   |-- md_macro_env.py
+|   |   |-- md_traj_env.py
+|   |   |-- scenario_carla_env.py
+|   |   |-- simple_carla_env.py
+|   |-- eval
+|   |   |-- base_evaluator.py
+|   |   |-- carla_benchmark_evaluator.py
+|   |   |-- serial_evaluator.py
+|   |   |-- single_carla_evaluator.py
+|   |-- models
+|   |   |-- bev_speed_model.py
+|   |   |-- cilrs_model.py
+|   |   |-- common_model.py
+|   |   |-- lbc_model.py
+|   |   |-- model_wrappers.py
+|   |   |-- mpc_controller.py
+|   |   |-- pid_controller.py
+|   |   |-- vae_model.py
+|   |   |-- vehicle_controller.py
+|   |-- policy
+|   |   |-- auto_policy.py
+|   |   |-- base_carla_policy.py
+|   |   |-- cilrs_policy.py
+|   |   |-- lbc_policy.py
+|   |-- simulators
+|   |   |-- base_simulator.py
+|   |   |-- carla_data_provider.py
+|   |   |-- carla_scenario_simulator.py
+|   |   |-- carla_simulator.py
+|   |   |-- fake_simulator.py
+|   |   |-- srunner
+|   |-- utils
+|       |-- data_utils
+|       |-- env_utils
+|       |-- learner_utils
+|       |-- model_utils
+|       |-- others
+|       |-- planner
+|       |-- simulator_utils
+|-- demo
+|   |-- auto_run
+|   |-- cict
+|   |-- cilrs
+|   |-- implicit
+|   |-- latent_rl
+|   |-- lbc
+|   |-- metadrive
+|   |-- simple_rl
+|-- docs
+|   |-- casezoo_instruction.md
+|   |-- figs
+|   |-- source
+```
+
+## Join and Contribute
+
+We appreciate all contributions to improve DI-drive, both algorithms and system designs. Welcome to OpenDILab community! Scan the QR code and add us on Wechat:
+
+<div align=center><img width="250" height="250" src="./docs/figs/qr.png" alt="qr"/></div>
 
-We appreciate all contributions to improve DI-drive, both algorithms and system designs.
+Or you can contact us with [slack](https://opendilab.slack.com/join/shared_invite/zt-v9tmv4fp-nUBAQEH1_Kuyu_q4plBssQ#/shared-invite/email) or email (opendilab.contact@gmail.com).
 
 ## License
 
diff --git a/core/__init__.py b/core/__init__.py
index 2ed7d248..98129c81 100644
--- a/core/__init__.py
+++ b/core/__init__.py
@@ -1,5 +1,5 @@
 __TITLE__ = "DI-drive"
-__VERSION__ = "0.3.2"
+__VERSION__ = "0.3.3"
 __DESCRIPTION__ = "Decision AI Auto-Driving Platform"
 __AUTHOR__ = "OpenDILab Contributors"
 __AUTHOR_EMAIL__ = "opendilab.contact@gmail.com"
diff --git a/core/envs/__init__.py b/core/envs/__init__.py
index a024a772..d058d00d 100644
--- a/core/envs/__init__.py
+++ b/core/envs/__init__.py
@@ -22,8 +22,10 @@
 
 if 'metadrive' in SIMULATORS:
     from .md_macro_env import MetaDriveMacroEnv
+    from .md_traj_env import MetaDriveTrajEnv
     env_map.update({
         "Macro-v1": 'core.envs.md_macro_env:MetaDriveMacroEnv',
+        "Traj-v1": 'core.envs.md_traj_env:MetaDriveTrajEnv'
     })
 
 for k, v in env_map.items():
diff --git a/core/envs/md_macro_env.py b/core/envs/md_macro_env.py
index 335a5b73..012df298 100644
--- a/core/envs/md_macro_env.py
+++ b/core/envs/md_macro_env.py
@@ -6,14 +6,12 @@
 from gym import spaces
 from collections import defaultdict
 from typing import Union, Dict, AnyStr, Tuple, Optional
-from gym.envs.registration import register
 import logging
 
 from ding.utils import ENV_REGISTRY
 from core.utils.simulator_utils.md_utils.discrete_policy import DiscreteMetaAction
 from core.utils.simulator_utils.md_utils.agent_manager_utils import MacroAgentManager
-from core.utils.simulator_utils.md_utils.engine_utils import initialize_engine, close_engine, \
-    engine_initialized, set_global_random_seed, MacroBaseEngine
+from core.utils.simulator_utils.md_utils.engine_utils import MacroEngine
 from core.utils.simulator_utils.md_utils.traffic_manager_utils import TrafficMode
 
 from metadrive.envs.base_env import BaseEnv
@@ -22,7 +20,7 @@
 # from metadrive.manager.traffic_manager import TrafficMode
 from metadrive.component.pgblock.first_block import FirstPGBlock
 from metadrive.constants import DEFAULT_AGENT, TerminationState
-from metadrive.component.vehicle.base_vehicle import BaseVehicle
+from metadrive.engine.base_engine import BaseEngine
 from metadrive.utils import Config, merge_dicts, get_np_random, clip
 
 from metadrive.envs.base_env import BASE_DEFAULT_CONFIG
@@ -142,7 +140,7 @@ def __init__(self, config: dict = None) -> None:
         #self.action_space = self.action_type.space()
 
         # lazy initialization, create the main vehicle in the lazy_init() func
-        self.engine: Optional[MacroBaseEngine] = None
+        self.engine: Optional[BaseEngine] = None
         self._top_down_renderer = None
         self.episode_steps = 0
         # self.current_seed = None
@@ -446,9 +444,10 @@ def lazy_init(self):
         :return: None
         """
         # It is the true init() func to create the main vehicle and its module, to avoid incompatible with ray
-        if engine_initialized():
+        if MacroEngine.singleton is not None:
             return
-        self.engine = initialize_engine(self.config)
+        MacroEngine.singleton = MacroEngine(self.config)
+        self.engine = MacroEngine.singleton
         # engine setup
         self.setup_engine()
         # other optional initialization
diff --git a/core/envs/md_traj_env.py b/core/envs/md_traj_env.py
new file mode 100644
index 00000000..c2a00320
--- /dev/null
+++ b/core/envs/md_traj_env.py
@@ -0,0 +1,885 @@
+import os
+import copy
+import time
+import gym
+import numpy as np
+from gym import spaces
+from collections import defaultdict
+from typing import Union, Dict, AnyStr, Tuple, Optional
+import logging
+
+from ding.utils import ENV_REGISTRY
+from core.utils.simulator_utils.md_utils.discrete_policy import DiscreteMetaAction
+from core.utils.simulator_utils.md_utils.agent_manager_utils import TrajAgentManager
+from core.utils.simulator_utils.md_utils.engine_utils import TrajEngine
+from core.utils.simulator_utils.md_utils.traffic_manager_utils import TrafficMode
+from metadrive.constants import RENDER_MODE_NONE, DEFAULT_AGENT, REPLAY_DONE, TerminationState
+from metadrive.envs.base_env import BaseEnv
+from metadrive.component.map.base_map import BaseMap
+from metadrive.component.map.pg_map import parse_map_config, MapGenerateMethod
+# from metadrive.manager.traffic_manager import TrafficMode
+from metadrive.component.pgblock.first_block import FirstPGBlock
+from metadrive.utils import Config, merge_dicts, get_np_random, clip, concat_step_infos
+from metadrive.envs.base_env import BASE_DEFAULT_CONFIG
+from metadrive.obs.top_down_obs_multi_channel import TopDownMultiChannel
+from metadrive.engine.base_engine import BaseEngine
+from metadrive.utils.utils import auto_termination
+
+DIDRIVE_DEFAULT_CONFIG = dict(
+    # ===== Generalization =====
+    start_seed=0,
+    use_render=False,
+    environment_num=10,
+
+    # ===== Map Config =====
+    map='SSSSSSSSSS',  # int or string: an easy way to fill map_config
+    random_lane_width=True,
+    random_lane_num=True,
+    map_config={
+        BaseMap.GENERATE_TYPE: MapGenerateMethod.BIG_BLOCK_SEQUENCE,
+        BaseMap.GENERATE_CONFIG: 'SSSSSSSSSS',  # None,  # it can be a file path / block num / block ID sequence
+        BaseMap.LANE_WIDTH: 3.5,
+        BaseMap.LANE_NUM: 3,
+        "exit_length": 70,
+    },
+
+    # ===== Traffic =====
+    traffic_density=0.3,
+    on_screen=False,
+    rgb_clip=True,
+    need_inverse_traffic=False,
+    traffic_mode=TrafficMode.Synch,  # "Respawn", "Trigger"
+    random_traffic=True,  # Traffic is randomized at default.
+    # this will update the vehicle_config and set to traffic
+    traffic_vehicle_config=dict(
+        show_navi_mark=False,
+        show_dest_mark=False,
+        enable_reverse=False,
+        show_lidar=False,
+        show_lane_line_detector=False,
+        show_side_detector=False,
+    ),
+
+    # ===== Object =====
+    accident_prob=0.,  # accident may happen on each block with this probability, except multi-exits block
+
+    # ===== Others =====
+    use_AI_protector=False,
+    save_level=0.5,
+    is_multi_agent=False,
+    vehicle_config=dict(spawn_lane_index=(FirstPGBlock.NODE_1, FirstPGBlock.NODE_2, 0)),
+
+    # ===== Agent =====
+    target_vehicle_configs={
+        DEFAULT_AGENT: dict(use_special_color=True, spawn_lane_index=(FirstPGBlock.NODE_1, FirstPGBlock.NODE_2, 2))
+    },
+
+    # ===== Reward Scheme =====
+    # See: https://github.com/decisionforce/metadrive/issues/283
+    success_reward=10.0,  # 10.0,
+    out_of_road_penalty=1.0,  # 5.0,
+    crash_vehicle_penalty=1.0,  # 1.0,
+    crash_object_penalty=5.0,  # 5.0,
+    run_out_of_time_penalty=5.0,  # 5.0,
+
+    # Transition reward
+    driving_reward=0.1,  # 0.1
+    speed_reward=0.2,  # 0.2
+    heading_reward=0.3,
+    jerk_bias=15.0,
+    jerk_dominator=45.0,  # 50.0
+    jerk_importance=0.6,  # 0.6
+
+    # ===== Reward Switch =====
+    # whether to use a reward option or not
+    use_speed_reward=True,
+    use_heading_reward=False,
+    use_jerk_reward=False,
+    use_lateral=True,
+    lateral_scale=0.25,
+
+    # ===== Cost Scheme =====
+    crash_vehicle_cost=1.0,
+    crash_object_cost=1.0,
+    out_of_road_cost=1.0,
+
+    # ===== Termination Scheme =====
+    out_of_route_done=True,
+    physics_world_step_size=1e-1,
+    const_episode_max_step=False,
+    episode_max_step=150,
+    avg_speed=6.5,
+
+    # ===== Trajectory length =====
+    seq_traj_len=10,
+    show_seq_traj=False,
+
+    # ===== traj_control_mode = 'acc', # another type is 'jerk'  =====
+    # if we choose traj_control_mode = 'acc', then the current state is [0,0,0,v] and the control signal
+    # is throttle and steer
+    # If not, we will use jerk control, the current state we have vel, acc, current steer, and the control
+    # signal is jerk and steer rate (delta_steer)
+    traj_control_mode='acc',
+
+    # ===== Expert data saving =====
+    save_expert_data=False,
+    expert_data_folder=None,
+)
+
+
+@ENV_REGISTRY.register("md_traj")
+class MetaDriveTrajEnv(BaseEnv):
+    """
+    MetaDrive single-agent trajectory env. The agent is controlled by a trajectory
+    (a list of waypoints) of a time period. Its length determines the times of env steps
+    the agent will track it. The vehicle will execute actions along the trajectory by 'move_to'
+    method of the simulator rather than physical controlling. The position is calculated
+    from the trajectory with kinematic constraints before the vehicle is transmitted.
+    The observation is a 5-channel top-down view image and a vector of structure
+    information by default. This env is registered and can be used via `gym.make`.
+
+    :Arguments:
+        - config (Dict): Env config dict.
+
+    :Interfaces: reset, step, close, render, seed
+    """
+
+    @classmethod
+    def default_config(cls) -> "Config":
+        #config = super(MetaDriveTrajEnv, cls).default_config()
+        config = Config(BASE_DEFAULT_CONFIG)
+        config.update(DIDRIVE_DEFAULT_CONFIG)
+        config.register_type("map", str, int)
+        config["map_config"].register_type("config", None)
+        return config
+
+    def __init__(self, config: dict = None):
+        merged_config = self._merge_extra_config(config)
+        global_config = self._post_process_config(merged_config)
+        self.config = global_config
+
+        # if self.config["seq_traj_len"] == 1:
+        #     self.config["episode_max_step"] = self.config["episode_max_step"] * 10
+        # if self.config["seq_traj_len"] == 20:
+        #     self.config["episode_max_step"] = self.config["episode_max_step"] // 2
+
+        # agent check
+        self.num_agents = self.config["num_agents"]
+        self.is_multi_agent = self.config["is_multi_agent"]
+        if not self.is_multi_agent:
+            assert self.num_agents == 1
+        assert isinstance(self.num_agents, int) and (self.num_agents > 0 or self.num_agents == -1)
+
+        # observation and action space
+        self.agent_manager = TrajAgentManager(
+            init_observations=self._get_observations(), init_action_space=self._get_action_space()
+        )
+        self.action_type = DiscreteMetaAction()
+        #self.action_space = self.action_type.space()
+
+        # lazy initialization, create the main vehicle in the lazy_init() func
+        self.engine: Optional[BaseEngine] = None
+        self._top_down_renderer = None
+        self.episode_steps = 0
+        # self.current_seed = None
+
+        # In MARL envs with respawn mechanism, varying episode lengths might happen.
+        self.dones = None
+        self.episode_rewards = defaultdict(float)
+        self.episode_lengths = defaultdict(int)
+
+        self.start_seed = self.config["start_seed"]
+        self.env_num = self.config["environment_num"]
+
+        self.time = 0
+        self.step_num = 0
+        self.episode_rwd = 0
+        self.vel_speed = 0.0
+        self.z_state = np.zeros(6)
+        self.avg_speed = self.config["avg_speed"]
+
+        self.episode_max_step = self.config["episode_max_step"]
+        if self.config["save_expert_data"]:
+            assert self.config["expert_data_folder"] is not None
+            self.single_transition_list = []
+
+    # define a action type, and execution style
+    # Now only one action will be taken, cosin function, and we set dt equals self.engine.dt
+    # now that in this situation, we directly set trajectory len equals to simulation frequency
+
+    def step(self, actions: Union[np.ndarray, Dict[AnyStr, np.ndarray]]):
+        self.episode_steps += 1
+        macro_actions = self._preprocess_macro_waypoints(actions)
+        step_infos = self._step_macro_simulator(macro_actions)
+        o, r, d, i = self._get_step_return(actions, step_infos)
+        self.step_num = self.step_num + 1
+        self.episode_rwd = self.episode_rwd + r
+        #print('step number is: {}'.format(self.step_num))
+        #o = o.transpose((2,0,1))
+        return o, r, d, i
+
+    def _update_pen_state(self, vehicle):
+        vehicle.pen_state['position'] = copy.deepcopy(vehicle.prev_state['position'])
+        vehicle.pen_state['yaw'] = copy.deepcopy(vehicle.prev_state['yaw'])
+        vehicle.pen_state['speed'] = copy.deepcopy(vehicle.prev_state['speed'])
+
+    def compute_jerk(self, vehicle):
+        v_t0 = vehicle.pen_state['speed']
+        theta_t0 = vehicle.pen_state['yaw']
+        v_t1 = vehicle.prev_state['speed']
+        theta_t1 = vehicle.prev_state['yaw']
+        v_t2 = vehicle.curr_state['speed']
+        theta_t2 = vehicle.curr_state['yaw']
+        t_inverse = 1.0 / self.config['physics_world_step_size']
+        jerk_x = (
+            v_t2 * np.cos(theta_t2) - 2 * v_t1 * np.cos(theta_t1) + v_t0 * np.cos(theta_t0)
+        ) * t_inverse * t_inverse
+        jerk_y = (
+            v_t2 * np.sin(theta_t2) - 2 * v_t1 * np.sin(theta_t1) + v_t0 * np.sin(theta_t0)
+        ) * t_inverse * t_inverse
+        jerk_array = np.array([jerk_x, jerk_y])
+        jerk = np.linalg.norm(jerk_array)
+        return jerk
+
+    def calc_onestep_reward(self, vehicle):
+        # vehicle.curr_state['position'] = vehicle.position
+        # vehicle.curr_state['yaw'] = vehicle.heading_theta
+        # vehicle.curr_state['speed'] = vehicle.speed / 3.6
+
+        if vehicle.lane in vehicle.navigation.current_ref_lanes:
+            current_lane = vehicle.lane
+            positive_road = 1
+        else:
+            current_lane = vehicle.navigation.current_ref_lanes[0]
+            current_road = vehicle.navigation.current_road
+            positive_road = 1 if not current_road.is_negative_road() else -1
+        long_last, _ = current_lane.local_coordinates(vehicle.prev_state['position'])
+        long_now, lateral_now = current_lane.local_coordinates(vehicle.position)
+
+        # reward for lane keeping, without it vehicle can learn to overtake but fail to keep in lane
+        if self.config["use_lateral"]:
+            lateral_factor = clip(1 - 0.5 * abs(lateral_now) / vehicle.navigation.get_current_lane_width(), 0.0, 1.0)
+        else:
+            lateral_factor = 1.0
+
+        reward = 0.0
+        #lateral_factor
+        reward += self.config["driving_reward"] * (long_now - long_last) * lateral_factor * positive_road
+        max_spd = 10
+        reward += self.config["speed_reward"] * (vehicle.curr_state['speed'] / max_spd) * positive_road
+        speed_rwd = -0.06 if vehicle.curr_state['speed'] < self.avg_speed else 0
+        reward += speed_rwd
+
+        v_t0 = vehicle.pen_state['speed']
+        theta_t0 = vehicle.pen_state['yaw']
+        v_t1 = vehicle.prev_state['speed']
+        theta_t1 = vehicle.prev_state['yaw']
+        v_t2 = vehicle.curr_state['speed']
+        theta_t2 = vehicle.curr_state['yaw']
+        t_inverse = 1.0 / self.config['physics_world_step_size']
+        jerk_x = (
+            v_t2 * np.cos(theta_t2) - 2 * v_t1 * np.cos(theta_t1) + v_t0 * np.cos(theta_t0)
+        ) * t_inverse * t_inverse
+        jerk_y = (
+            v_t2 * np.sin(theta_t2) - 2 * v_t1 * np.sin(theta_t1) + v_t0 * np.sin(theta_t0)
+        ) * t_inverse * t_inverse
+        jerk_array = np.array([jerk_x, jerk_y])
+        jerk = np.linalg.norm(jerk_array)
+        #jerk_penalty = np.tanh( (jerk)/ 50)
+        jerk_penalty = max(np.tanh((jerk - self.config["jerk_bias"]) / self.config["jerk_dominator"]), 0)
+        jerk_penalty = self.config["jerk_importance"] * jerk_penalty
+        reward -= jerk_penalty
+
+        if vehicle.arrive_destination:
+            reward = +self.config["success_reward"]
+        elif vehicle.macro_succ:
+            reward = +self.config["success_reward"]
+        elif self._is_out_of_road(vehicle):
+            reward = -self.config["out_of_road_penalty"]
+        elif vehicle.crash_vehicle:
+            reward = -self.config["crash_vehicle_penalty"]
+        elif vehicle.macro_crash:
+            reward = -self.config["crash_vehicle_penalty"]
+        elif vehicle.crash_object:
+            reward = -self.config["crash_object_penalty"]
+        elif self.step_num >= self.episode_max_step:
+            reward = -self.config["run_out_of_time_penalty"]
+        #print('reward: {}'.format(reward))
+        return reward
+
+    def _merge_extra_config(self, config: Union[dict, "Config"]) -> "Config":
+        config = self.default_config().update(config, allow_add_new_key=True)
+        if config["vehicle_config"]["lidar"]["distance"] > 50:
+            config["max_distance"] = config["vehicle_config"]["lidar"]["distance"]
+        return config
+
+    def _post_process_config(self, config):
+        config = super(MetaDriveTrajEnv, self)._post_process_config(config)
+        if not config["rgb_clip"]:
+            logging.warning(
+                "You have set rgb_clip = False, which means the observation will be uint8 values in [0, 255]. "
+                "Please make sure you have parsed them later before feeding them to network!"
+            )
+        config["map_config"] = parse_map_config(
+            easy_map_config=config["map"], new_map_config=config["map_config"], default_config=self.default_config()
+        )
+        config["vehicle_config"]["rgb_clip"] = config["rgb_clip"]
+        config["vehicle_config"]["random_agent_model"] = config["random_agent_model"]
+        if config.get("gaussian_noise", 0) > 0:
+            assert config["vehicle_config"]["lidar"]["gaussian_noise"] == 0, "You already provide config!"
+            assert config["vehicle_config"]["side_detector"]["gaussian_noise"] == 0, "You already provide config!"
+            assert config["vehicle_config"]["lane_line_detector"]["gaussian_noise"] == 0, "You already provide config!"
+            config["vehicle_config"]["lidar"]["gaussian_noise"] = config["gaussian_noise"]
+            config["vehicle_config"]["side_detector"]["gaussian_noise"] = config["gaussian_noise"]
+            config["vehicle_config"]["lane_line_detector"]["gaussian_noise"] = config["gaussian_noise"]
+        if config.get("dropout_prob", 0) > 0:
+            assert config["vehicle_config"]["lidar"]["dropout_prob"] == 0, "You already provide config!"
+            assert config["vehicle_config"]["side_detector"]["dropout_prob"] == 0, "You already provide config!"
+            assert config["vehicle_config"]["lane_line_detector"]["dropout_prob"] == 0, "You already provide config!"
+            config["vehicle_config"]["lidar"]["dropout_prob"] = config["dropout_prob"]
+            config["vehicle_config"]["side_detector"]["dropout_prob"] = config["dropout_prob"]
+            config["vehicle_config"]["lane_line_detector"]["dropout_prob"] = config["dropout_prob"]
+        target_v_config = copy.deepcopy(config["vehicle_config"])
+        if not config["is_multi_agent"]:
+            target_v_config.update(config["target_vehicle_configs"][DEFAULT_AGENT])
+            config["target_vehicle_configs"][DEFAULT_AGENT] = target_v_config
+        return config
+
+    def _get_observations(self):
+        return {DEFAULT_AGENT: self.get_single_observation(self.config["vehicle_config"])}
+
+    def done_function(self, vehicle_id: str):
+        vehicle = self.vehicles[vehicle_id]
+        done = False
+        done_info = dict(
+            crash_vehicle=False, crash_object=False, crash_building=False, out_of_road=False, arrive_dest=False
+        )
+        if vehicle.arrive_destination:
+            done = True
+            logging.info("Episode ended! Reason: arrive_dest.")
+            done_info[TerminationState.SUCCESS] = True
+        elif hasattr(vehicle, 'macro_succ') and vehicle.macro_succ:
+            done = True
+            logging.info("Episode ended! Reason: arrive_dest.")
+            done_info[TerminationState.SUCCESS] = True
+        elif hasattr(vehicle, 'macro_crash') and vehicle.macro_crash:
+            done = True
+            logging.info("Episode ended! Reason: crash vehicle ")
+            done_info[TerminationState.CRASH_VEHICLE] = True
+        if self._is_out_of_road(vehicle):
+            done = True
+            logging.info("Episode ended! Reason: out_of_road.")
+            done_info[TerminationState.OUT_OF_ROAD] = True
+        if vehicle.crash_vehicle:
+            done = True
+            logging.info("Episode ended! Reason: crash vehicle ")
+            done_info[TerminationState.CRASH_VEHICLE] = True
+        if vehicle.crash_object:
+            done = True
+            done_info[TerminationState.CRASH_OBJECT] = True
+            logging.info("Episode ended! Reason: crash object ")
+        if vehicle.crash_building:
+            done = True
+            done_info[TerminationState.CRASH_BUILDING] = True
+            logging.info("Episode ended! Reason: crash building ")
+        if self.step_num >= self.episode_max_step:
+            done = True
+            done_info[TerminationState.CRASH_BUILDING] = True
+            logging.info("Episode ended! Reason: crash building ")
+
+        # for compatibility
+        # crash almost equals to crashing with vehicles
+        done_info[TerminationState.CRASH] = (
+            done_info[TerminationState.CRASH_VEHICLE] or done_info[TerminationState.CRASH_OBJECT]
+            or done_info[TerminationState.CRASH_BUILDING]
+        )
+        done_info['complete_ratio'] = clip(self.already_go_dist / self.navi_distance + 0.05, 0.0, 1.0)
+        done_info['seq_traj_len'] = self.config['seq_traj_len']
+
+        return done, done_info
+
+    def cost_function(self, vehicle_id: str):
+        vehicle = self.vehicles[vehicle_id]
+        step_info = dict()
+        step_info["cost"] = 0
+        if self._is_out_of_road(vehicle):
+            step_info["cost"] = self.config["out_of_road_cost"]
+        elif vehicle.crash_vehicle:
+            step_info["cost"] = self.config["crash_vehicle_cost"]
+        elif vehicle.crash_object:
+            step_info["cost"] = self.config["crash_object_cost"]
+        elif self.step_num > self.config["episode_max_step"]:
+            step_info['cost'] = 1
+        return step_info['cost'], step_info
+
+    def _is_out_of_road(self, vehicle):
+        # A specified function to determine whether this vehicle should be done.
+        # return vehicle.on_yellow_continuous_line or (not vehicle.on_lane) or vehicle.crash_sidewalk
+        ret = vehicle.on_yellow_continuous_line or vehicle.on_white_continuous_line or \
+              (not vehicle.on_lane) or vehicle.crash_sidewalk
+        if self.config["out_of_route_done"]:
+            ret = ret or vehicle.out_of_route
+        return ret
+
+    def reward_function(self, vehicle_id: str):
+        """
+        Override this func to get a new reward function
+        :param vehicle_id: id of BaseVehicle
+        :return: reward
+        """
+        vehicle = self.vehicles[vehicle_id]
+        step_info = dict()
+        if self._compute_navi_dist:
+            self.navi_distance = self.get_navigation_len(vehicle)
+            if not self.config['const_episode_max_step']:
+                self.episode_max_step = self.get_episode_max_step(self.navi_distance, self.avg_speed)
+            self._compute_navi_dist = False
+        #self.update_current_state(vehicle)
+        # Reward for moving forward in current lane
+        if vehicle.lane in vehicle.navigation.current_ref_lanes:
+            current_lane = vehicle.lane
+            positive_road = 1
+        else:
+            current_lane = vehicle.navigation.current_ref_lanes[0]
+            current_road = vehicle.navigation.current_road
+            positive_road = 1 if not current_road.is_negative_road() else -1
+        long_last, _ = current_lane.local_coordinates(vehicle.last_macro_position)
+        long_now, lateral_now = current_lane.local_coordinates(vehicle.position)
+        self.already_go_dist += (long_now - long_last)
+        #print('already_go_dist: {}'.format(self.already_go_dist))
+        avg_lateral_cum = self.compute_avg_lateral_cum(vehicle, current_lane)
+        # use_lateral_penalty = False
+        # # reward for lane keeping, without it vehicle can learn to overtake but fail to keep in lane
+        if self.config["use_lateral"]:
+            lateral_factor = clip(
+                1 - 0.5 * abs(avg_lateral_cum) / vehicle.navigation.get_current_lane_width(), 0.0, 1.0
+            )
+            #lateral_factor = clip(1 - 2 * abs(lateral_now) / vehicle.navigation.get_current_lane_width(), 0.0, 1.0)
+        else:
+            lateral_factor = 1.0
+        #     use_lateral_penalty = True
+        reward = 0.0
+        driving_reward = 0.0
+        speed_reward = 0.0
+        heading_reward = 0.0
+        jerk_reward = 0.0
+        # Generally speaking, driving reward is a necessity
+        driving_reward += self.config["driving_reward"] * (long_now - long_last) * lateral_factor * positive_road
+        # # Speed reward
+        if self.config["use_speed_reward"]:
+            max_spd = 10
+            speed_list = self.compute_speed_list(vehicle)
+            for speed in speed_list:
+                speed_reward += self.config["speed_reward"] * (speed / max_spd) * positive_road
+                if speed < self.avg_speed:
+                    #speed_reward -= 0.00 #0.06
+                    speed_reward = speed_reward
+        if self.config["use_heading_reward"]:
+            # Heading Reward
+            heading_error_list = self.compute_heading_error_list(vehicle, current_lane)
+            for heading_error in heading_error_list:
+                heading_reward += self.config["heading_reward"] * (0 - np.abs(heading_error))
+        if self.config["use_jerk_reward"]:
+            jerk_list = self.compute_jerk_list(vehicle)
+            for jerk in jerk_list:
+                #jerk_reward += (0.03 - 0.6 * np.tanh(jerk / 100.0))
+                #jerk_reward += (0.03 - self.config["jerk_importance"] * np.tanh(jerk / self.config["jerk_dominator"]))
+                jerk_penalty = max(np.tanh((jerk - self.config["jerk_bias"]) / self.config["jerk_dominator"]), 0)
+                jerk_penalty = self.config["jerk_importance"] * jerk_penalty
+                jerk_reward -= jerk_penalty
+        reward = driving_reward + speed_reward + heading_reward + jerk_reward
+        # print('driving reward: {}'.format(driving_reward))
+        # print('speed reward: {}'.format(speed_reward))
+        # print('heading reward: {}'.format(heading_reward))
+        # print('jerk reward: {}'.format(jerk_reward))
+        step_info["step_reward"] = reward
+        if vehicle.arrive_destination:
+            reward = +self.config["success_reward"]
+        elif vehicle.macro_succ:
+            reward = +self.config["success_reward"]
+        elif self._is_out_of_road(vehicle):
+            reward = -self.config["out_of_road_penalty"]
+        elif vehicle.crash_vehicle:
+            reward = -self.config["crash_vehicle_penalty"]
+        elif vehicle.macro_crash:
+            reward = -self.config["crash_vehicle_penalty"]
+        elif vehicle.crash_object:
+            reward = -self.config["crash_object_penalty"]
+        elif self.step_num >= self.episode_max_step:
+            reward = -self.config["run_out_of_time_penalty"]
+        return reward, step_info
+
+    def get_navigation_len(self, vehicle):
+        checkpoints = vehicle.navigation.checkpoints
+        road_network = vehicle.navigation.map.road_network
+        total_dist = 0
+        assert len(checkpoints) >= 2
+        for check_num in range(0, len(checkpoints) - 1):
+            front_node = checkpoints[check_num]
+            end_node = checkpoints[check_num + 1]
+            cur_lanes = road_network.graph[front_node][end_node]
+            target_lane_num = int(len(cur_lanes) / 2)
+            target_lane = cur_lanes[target_lane_num]
+            target_lane_length = target_lane.length
+            total_dist += target_lane_length
+        return total_dist
+
+    def compute_jerk_list(self, vehicle):
+        jerk_list = []
+        #vehicle = self.vehicles[vehicle_id]
+        v_t0 = vehicle.penultimate_state['speed']
+        theta_t0 = vehicle.penultimate_state['yaw']
+        v_t1 = vehicle.traj_wp_list[0]['speed']
+        theta_t1 = vehicle.traj_wp_list[0]['yaw']
+        v_t2 = vehicle.traj_wp_list[1]['speed']
+        theta_t2 = vehicle.traj_wp_list[1]['yaw']
+        t_inverse = 1.0 / self.config['physics_world_step_size']
+        first_point_jerk_x = (
+            v_t2 * np.cos(theta_t2) - 2 * v_t1 * np.cos(theta_t1) + v_t0 * np.cos(theta_t0)
+        ) * t_inverse * t_inverse
+        first_point_jerk_y = (
+            v_t2 * np.sin(theta_t2) - 2 * v_t1 * np.sin(theta_t1) + v_t0 * np.sin(theta_t0)
+        ) * t_inverse * t_inverse
+        jerk_list.append(np.array([first_point_jerk_x, first_point_jerk_y]))
+        # plus one because we store the current value as first, which means the whole trajectory is seq_traj_len + 1
+        for i in range(2, self.config['seq_traj_len'] + 1):
+            v_t0 = vehicle.traj_wp_list[i - 2]['speed']
+            theta_t0 = vehicle.traj_wp_list[i - 2]['yaw']
+            v_t1 = vehicle.traj_wp_list[i - 1]['speed']
+            theta_t1 = vehicle.traj_wp_list[i - 1]['yaw']
+            v_t2 = vehicle.traj_wp_list[i]['speed']
+            theta_t2 = vehicle.traj_wp_list[i]['yaw']
+            point_jerk_x = (
+                v_t2 * np.cos(theta_t2) - 2 * v_t1 * np.cos(theta_t1) + v_t0 * np.cos(theta_t0)
+            ) * t_inverse * t_inverse
+            point_jerk_y = (
+                v_t2 * np.sin(theta_t2) - 2 * v_t1 * np.sin(theta_t1) + v_t0 * np.sin(theta_t0)
+            ) * t_inverse * t_inverse
+            jerk_list.append(np.array([point_jerk_x, point_jerk_y]))
+        #final_jerk_value = 0
+        step_jerk_list = []
+        for jerk in jerk_list:
+            #final_jerk_value += np.linalg.norm(jerk)
+            step_jerk_list.append(np.linalg.norm(jerk))
+        return step_jerk_list
+
+    def update_current_state(self, vehicle_id):
+        vehicle = self.vehicles[vehicle_id]
+        t_inverse = 1.0 / self.config['physics_world_step_size']
+        theta_t1 = vehicle.traj_wp_list[-2]['yaw']
+        theta_t2 = vehicle.traj_wp_list[-1]['yaw']
+        v_t1 = vehicle.traj_wp_list[-2]['speed']
+        v_t2 = vehicle.traj_wp_list[-1]['speed']
+        v_state = np.zeros(6)
+        v_state[3] = v_t2
+        v_state[4] = (v_t2 - v_t1) * t_inverse
+        theta_dot = (theta_t2 - theta_t1) * t_inverse
+        v_state[5] = np.arctan(2.5 * theta_dot / v_t2) if v_t2 > 0.001 else 0.0
+        self.z_state = v_state
+
+    def compute_heading_error_list(self, vehicle, lane):
+        heading_error_list = []
+        for i in range(1, self.config['seq_traj_len'] + 1):
+            theta = vehicle.traj_wp_list[i]['yaw']
+            long_now, lateral_now = lane.local_coordinates(vehicle.traj_wp_list[i]['position'])
+            road_heading_theta = lane.heading_theta_at(long_now)
+            theta_error = self.wrap_angle(theta - road_heading_theta)
+            heading_error_list.append(np.abs(theta_error))
+        return heading_error_list
+
+    def compute_speed_list(self, vehicle):
+        speed_list = []
+        for i in range(1, self.config['seq_traj_len'] + 1):
+            speed = vehicle.traj_wp_list[i]['speed']
+            speed_list.append(speed)
+        return speed_list
+
+    def compute_avg_lateral_cum(self, vehicle, lane):
+        # Compute lateral distance for each wp
+        # average the factor by seq traj len
+        # For example, if traj len is 10, then i = 1, 2, ... 10
+        lateral_cum = 0
+        for i in range(1, self.config['seq_traj_len'] + 1):
+            long_now, lateral_now = lane.local_coordinates(vehicle.traj_wp_list[i]['position'])
+            lateral_cum += np.abs(lateral_now)
+        avg_lateral_cum = lateral_cum / float(self.config['seq_traj_len'])
+        return avg_lateral_cum
+
+    def switch_to_third_person_view(self) -> None:
+        if self.main_camera is None:
+            return
+        self.main_camera.reset()
+        if self.config["prefer_track_agent"] is not None and self.config["prefer_track_agent"] in self.vehicles.keys():
+            new_v = self.vehicles[self.config["prefer_track_agent"]]
+            current_track_vehicle = new_v
+        else:
+            if self.main_camera.is_bird_view_camera():
+                current_track_vehicle = self.current_track_vehicle
+            else:
+                vehicles = list(self.engine.agents.values())
+                if len(vehicles) <= 1:
+                    return
+                if self.current_track_vehicle in vehicles:
+                    vehicles.remove(self.current_track_vehicle)
+                new_v = get_np_random().choice(vehicles)
+                current_track_vehicle = new_v
+        self.main_camera.track(current_track_vehicle)
+        return
+
+    def switch_to_top_down_view(self):
+        self.main_camera.stop_track()
+
+    def _get_step_return(self, actions, engine_info):
+        # update obs, dones, rewards, costs, calculate done at first !
+        obses = {}
+        done_infos = {}
+        cost_infos = {}
+        reward_infos = {}
+        rewards = {}
+        for v_id, v in self.vehicles.items():
+            o = self.observations[v_id].observe(v)
+            self.update_current_state(v_id)
+            self.vel_speed = v.last_spd
+            if self.config["traj_control_mode"] == 'jerk':
+                o_dict = {}
+                o_dict['birdview'] = o
+                # v_state = np.zeros(4)
+                # v_state[3] = v.last_spd
+                v_state = self.z_state
+                o_dict['vehicle_state'] = v_state
+                #o_dict['speed'] = v.last_spd
+            elif self.config["traj_control_mode"] == 'acc':
+                o_dict = {}
+                o_dict['birdview'] = o
+                # v_state = np.zeros(4)
+                # v_state[3] = v.last_spd
+                v_state = self.z_state[:4]
+                o_dict['vehicle_state'] = v_state
+                #o_dict['speed'] = v.last_spd
+            else:
+                o_dict = o
+            obses[v_id] = o_dict
+
+            done_function_result, done_infos[v_id] = self.done_function(v_id)
+            rewards[v_id], reward_infos[v_id] = self.reward_function(v_id)
+            _, cost_infos[v_id] = self.cost_function(v_id)
+            done = done_function_result or self.dones[v_id]
+            self.dones[v_id] = done
+
+        should_done = engine_info.get(REPLAY_DONE, False
+                                      ) or (self.config["horizon"] and self.episode_steps >= self.config["horizon"])
+        termination_infos = self.for_each_vehicle(auto_termination, should_done)
+
+        step_infos = concat_step_infos([
+            engine_info,
+            done_infos,
+            reward_infos,
+            cost_infos,
+            termination_infos,
+        ])
+
+        if should_done:
+            for k in self.dones:
+                self.dones[k] = True
+
+        dones = {k: self.dones[k] for k in self.vehicles.keys()}
+        for v_id, r in rewards.items():
+            self.episode_rewards[v_id] += r
+            step_infos[v_id]["episode_reward"] = self.episode_rewards[v_id]
+            self.episode_lengths[v_id] += 1
+            step_infos[v_id]["episode_length"] = self.episode_lengths[v_id]
+        if not self.is_multi_agent:
+            return self._wrap_as_single_agent(obses), self._wrap_as_single_agent(rewards), \
+                   self._wrap_as_single_agent(dones), self._wrap_as_single_agent(step_infos)
+        else:
+            return obses, rewards, dones, step_infos
+
+    def setup_engine(self):
+        super(MetaDriveTrajEnv, self).setup_engine()
+        self.engine.accept("b", self.switch_to_top_down_view)
+        self.engine.accept("q", self.switch_to_third_person_view)
+        from core.utils.simulator_utils.md_utils.traffic_manager_utils import MacroTrafficManager
+        from core.utils.simulator_utils.md_utils.map_manager_utils import MacroMapManager
+        self.engine.register_manager("map_manager", MacroMapManager())
+        self.engine.register_manager("traffic_manager", MacroTrafficManager())
+
+    def _reset_global_seed(self, force_seed=None):
+        current_seed = force_seed if force_seed is not None else \
+            get_np_random(self._DEBUG_RANDOM_SEED).randint(self.start_seed, self.start_seed + self.env_num)
+        self.seed(current_seed)
+
+    def _preprocess_macro_actions(self, actions: Union[np.ndarray, Dict[AnyStr, np.ndarray]]) \
+            -> Union[np.ndarray, Dict[AnyStr, np.ndarray]]:
+        if not self.is_multi_agent:
+            # print('action.dtype: {}'.format(type(actions)))
+            #print('action: {}'.format(actions))
+            actions = int(actions)
+            actions = {v_id: actions for v_id in self.vehicles.keys()}
+        else:
+            if self.config["vehicle_config"]["action_check"]:
+                # Check whether some actions are not provided.
+                given_keys = set(actions.keys())
+                have_keys = set(self.vehicles.keys())
+                assert given_keys == have_keys, "The input actions: {} have incompatible keys with existing {}!".format(
+                    given_keys, have_keys
+                )
+            else:
+                # That would be OK if extra actions is given. This is because, when evaluate a policy with naive
+                # implementation, the "termination observation" will still be given in T=t-1. And at T=t, when you
+                # collect action from policy(last_obs) without masking, then the action for "termination observation"
+                # will still be computed. We just filter it out here.
+                actions = {v_id: actions[v_id] for v_id in self.vehicles.keys()}
+        return actions
+
+    def _preprocess_macro_waypoints(self, waypoint_list: Union[np.ndarray, Dict[AnyStr, np.ndarray]]) \
+            -> Union[np.ndarray, Dict[AnyStr, np.ndarray]]:
+        if not self.is_multi_agent:
+            # print('action.dtype: {}'.format(type(actions)))
+            #print('action: {}'.format(actions))
+            actions = waypoint_list
+            actions = {v_id: actions for v_id in self.vehicles.keys()}
+        return actions
+
+    def _step_macro_simulator(self, actions):
+        #simulation_frequency = 30  # 60 80
+        simulation_frequency = self.config['seq_traj_len']
+        policy_frequency = 1
+        frames = int(simulation_frequency / policy_frequency)
+        self.time = 0
+        wps = actions
+        for frame in range(frames):
+            # we use frame to update robot position, and use wps to represent the whole trajectory
+            assert len(self.vehicles.items()) == 1
+            onestep_o = np.zeros((200, 200, 5))
+            for v_id, v in self.vehicles.items():
+                onestep_o = self.observations[v_id].observe(v)
+                self._update_pen_state(v)
+            scene_manager_before_step_infos = self.engine.before_step_traj(frame, wps)
+            self.engine.step()
+
+            onestep_a = scene_manager_before_step_infos['default_agent']['raw_action']
+            onestep_rwd = 0
+            for v_id, v in self.vehicles.items():
+                onestep_rwd = self.calc_onestep_reward(v)
+            single_transition = {'state': onestep_o, 'action': onestep_a, 'reward': onestep_rwd}
+            if self.config["save_expert_data"]:
+                self.single_transition_list.append(single_transition)
+            scene_manager_after_step_infos = self.engine.after_step()
+        #scene_manager_after_step_infos = self.engine.after_step()
+        return merge_dicts(
+            scene_manager_after_step_infos, scene_manager_before_step_infos, allow_new_keys=True, without_copy=True
+        )
+
+    def _get_reset_return(self):
+        ret = {}
+        self.engine.after_step()
+        o = None
+        o_reset = None
+        print('episode reward: {}'.format(self.episode_rwd))
+        #self.episode_rwd = 0
+        self.step_num = 0
+        for v_id, v in self.vehicles.items():
+            self.observations[v_id].reset(self, v)
+            ret[v_id] = self.observations[v_id].observe(v)
+            o = self.observations[v_id].observe(v)
+            if self.config["save_expert_data"] and len(self.single_transition_list) > 50:
+                print('success: {}'.format(v.macro_succ))
+                print('traj len: {}'.format(len(self.single_transition_list)))
+                folder_name = self.config["expert_data_folder"]
+                file_num = len(os.listdir(folder_name))
+                new_file_name = "expert_data_%02i.pickle" % file_num
+                new_file_path = os.path.join(folder_name, new_file_name)
+                traj_dict = {"transition_list": self.single_transition_list, "episode_rwd": self.episode_rwd}
+                import pickle
+                with open(new_file_path, "wb") as fp:
+                    pickle.dump(traj_dict, fp)
+
+            self.update_current_state(v_id)
+            self.vel_speed = 0
+            if self.config["traj_control_mode"] == 'jerk':
+                o_dict = {}
+                o_dict['birdview'] = o
+                # v_state = np.zeros(4)
+                # v_state[3] = v.last_spd
+                v_state = self.z_state
+                o_dict['vehicle_state'] = v_state
+                #o_dict['speed'] = v.last_spd
+            elif self.config["traj_control_mode"] == 'acc':
+                o_dict = {}
+                o_dict['birdview'] = o
+                # v_state = np.zeros(4)
+                # v_state[3] = v.last_spd
+                v_state = self.z_state[:4]
+                o_dict['vehicle_state'] = v_state
+                #o_dict['speed'] = v.last_spd
+            else:
+                o_dict = o
+            o_reset = o_dict
+            if hasattr(v, 'macro_succ'):
+                v.reset_state_stack()
+                v.macro_succ = False
+            if hasattr(v, 'macro_crash'):
+                v.macro_crash = False
+            v.penultimate_state = {}
+            v.penultimate_state['position'] = np.array([0, 0])
+            v.penultimate_state['yaw'] = 0
+            v.penultimate_state['speed'] = 0
+            v.traj_wp_list = []
+            v.traj_wp_list.append(copy.deepcopy(v.penultimate_state))
+            v.traj_wp_list.append(copy.deepcopy(v.penultimate_state))
+            v.last_spd = 0
+
+        if self.config["save_expert_data"]:
+            self.single_transition_list = []
+        self.episode_rwd = 0.0
+        self.already_go_dist = 0
+        self._compute_navi_dist = True
+        self.navi_distance = 100.0
+        self.remove_init_stop = True
+        self.episode_max_step = self.config['episode_max_step']
+        if self.remove_init_stop:
+            return o_reset
+        return o_reset
+
+    def lazy_init(self):
+        """
+        Only init once in runtime, variable here exists till the close_env is called
+        :return: None
+        """
+        # It is the true init() func to create the main vehicle and its module, to avoid incompatible with ray
+        if TrajEngine.singleton is not None:
+            return
+        TrajEngine.singleton = TrajEngine(self.config)
+        self.engine = TrajEngine.singleton
+        # engine setup
+        self.setup_engine()
+        # other optional initialization
+        self._after_lazy_init()
+
+    def get_single_observation(self, _=None):
+        o = TopDownMultiChannel(
+            self.config["vehicle_config"],
+            self.config["on_screen"],
+            self.config["rgb_clip"],
+            frame_stack=3,
+            post_stack=10,
+            frame_skip=1,
+            resolution=(200, 200),
+            max_distance=50
+        )
+        #o = TopDownMultiChannel(vehicle_config, self, False)
+        return o
+
+    def wrap_angle(self, angle_in_rad):
+        #angle_in_rad = angle_in_degree / 180.0 * np.pi
+        while (angle_in_rad > np.pi):
+            angle_in_rad -= 2 * np.pi
+        while (angle_in_rad <= -np.pi):
+            angle_in_rad += 2 * np.pi
+        return angle_in_rad
+
+    def get_episode_max_step(self, distance, average_speed=6.5):
+        average_dist_per_step = float(self.config['seq_traj_len']
+                                      ) * average_speed * self.config['physics_world_step_size']
+        max_step = int(distance / average_dist_per_step) + 1
+        return max_step
diff --git a/core/utils/simulator_utils/md_utils/agent_manager_utils.py b/core/utils/simulator_utils/md_utils/agent_manager_utils.py
index 89a1bce2..10b2911a 100644
--- a/core/utils/simulator_utils/md_utils/agent_manager_utils.py
+++ b/core/utils/simulator_utils/md_utils/agent_manager_utils.py
@@ -1,5 +1,7 @@
 from metadrive.manager.agent_manager import AgentManager
 from core.utils.simulator_utils.md_utils.macro_policy import ManualMacroDiscretePolicy
+from core.utils.simulator_utils.md_utils.traj_policy import TrajPolicy
+from core.utils.simulator_utils.md_utils.vehicle_utils import MDDefaultVehicle
 from metadrive.utils.space import ParameterSpace, VehicleParameterSpace
 from metadrive.component.vehicle.vehicle_type import DefaultVehicle
 from metadrive.utils import Config, safe_clip_for_small_array
@@ -66,3 +68,47 @@ def _get_vehicles(self, config_dict: dict):
             policy = self._get_policy(obj)
             self.engine.add_policy(obj.id, policy)
         return ret
+
+
+class TrajAgentManager(AgentManager):
+
+    def _get_policy(self, obj):
+        policy = TrajPolicy(obj, self.generate_seed())
+        return policy
+
+    def before_step(self, frame=0, wps=None):
+        # not in replay mode
+        self._agents_finished_this_frame = dict()
+        step_infos = {}
+        for agent_id in self.active_agents.keys():
+            policy = self.engine.get_policy(self._agent_to_object[agent_id])
+            if agent_id in wps.keys():
+                waypoints = wps[agent_id]
+            self.get_agent(agent_id).before_macro_step(frame)
+            action = policy.act(agent_id, frame, waypoints)
+            step_infos[agent_id] = policy.get_action_info()
+            step_infos[agent_id].update(self.get_agent(agent_id).before_step(action))
+
+        finished = set()
+        for v_name in self._dying_objects.keys():
+            self._dying_objects[v_name][1] -= 1
+            if self._dying_objects[v_name][1] == 0:  # Countdown goes to 0, it's time to remove the vehicles!
+                v = self._dying_objects[v_name][0]
+                self._remove_vehicle(v)
+                finished.add(v_name)
+        for v_name in finished:
+            self._dying_objects.pop(v_name)
+        return step_infos
+
+    def _get_vehicles(self, config_dict: dict):
+        from metadrive.component.vehicle.vehicle_type import random_vehicle_type, vehicle_type
+        ret = {}
+        # v_type = random_vehicle_type(self.np_random) if self.engine.global_config["random_agent_model"] else \
+        #     vehicle_type[self.engine.global_config["vehicle_config"]["vehicle_model"]]
+        v_type = MDDefaultVehicle
+        for agent_id, v_config in config_dict.items():
+            obj = self.spawn_object(v_type, vehicle_config=v_config)
+            ret[agent_id] = obj
+            policy = self._get_policy(obj)
+            self.engine.add_policy(obj.id, policy)
+        return ret
diff --git a/core/utils/simulator_utils/md_utils/engine_utils.py b/core/utils/simulator_utils/md_utils/engine_utils.py
index 7ff18123..9c889040 100644
--- a/core/utils/simulator_utils/md_utils/engine_utils.py
+++ b/core/utils/simulator_utils/md_utils/engine_utils.py
@@ -5,7 +5,7 @@
 from typing import Callable, Optional, Union, List, Dict, AnyStr
 
 
-class MacroBaseEngine(BaseEngine):
+class MacroEngine(BaseEngine):
 
     def before_step_macro(self, actions=None) -> Dict:
         """
@@ -21,47 +21,17 @@ def before_step_macro(self, actions=None) -> Dict:
         return step_infos
 
 
-def initialize_engine(env_global_config):
-    cls = MacroBaseEngine
-    if cls.singleton is None:
-        # assert cls.global_config is not None, "Set global config before initialization BaseEngine"
-        cls.singleton = cls(env_global_config)
-    else:
-        raise PermissionError("There should be only one BaseEngine instance in one process")
-    return cls.singleton
+class TrajEngine(BaseEngine):
 
-
-def get_engine():
-    return MacroBaseEngine.singleton
-
-
-def get_object(object_name):
-    return get_engine().get_objects([object_name])
-
-
-def engine_initialized():
-    return False if MacroBaseEngine.singleton is None else True
-
-
-def close_engine():
-    if MacroBaseEngine.singleton is not None:
-        MacroBaseEngine.singleton.close()
-        MacroBaseEngine.singleton = None
-
-
-def get_global_config():
-    engine = get_engine()
-    return engine.global_config.copy()
-
-
-def set_global_random_seed(random_seed: Optional[int]):
-    """
-    Update the random seed and random engine
-    All subclasses of Randomizable will hold the same random engine, after calling this function
-    :param random_seed: int, random seed
-    """
-    engine = get_engine()
-    if engine is not None:
-        engine.seed(random_seed)
-    else:
-        logging.warning("BaseEngine is not launched, fail to sync seed to engine!")
+    def before_step_traj(self, frame=0, wps=None) -> Dict:
+        """
+        Update states after finishing movement
+        :return: if this episode is done
+        """
+        step_infos = {}
+        for manager in self._managers.values():
+            if (manager.__class__.__name__ == 'TrajAgentManager'):
+                step_infos.update(manager.before_step(frame, wps))
+            else:
+                step_infos.update(manager.before_step())
+        return step_infos
diff --git a/core/utils/simulator_utils/md_utils/navigation_utils.py b/core/utils/simulator_utils/md_utils/navigation_utils.py
new file mode 100644
index 00000000..56d4f6a4
--- /dev/null
+++ b/core/utils/simulator_utils/md_utils/navigation_utils.py
@@ -0,0 +1,204 @@
+import numpy as np
+from panda3d.core import TransparencyAttrib, LineSegs, NodePath
+
+from metadrive.component.road_network import Road
+from metadrive.component.vehicle_navigation_module.node_network_navigation import NodeNetworkNavigation
+from metadrive.utils import norm
+from metadrive.utils.coordinates_shift import panda_position
+from metadrive.utils.math_utils import wrap_to_pi
+from metadrive.constants import RENDER_MODE_ONSCREEN, CamMask
+
+
+class TrajNodeNavigation(NodeNetworkNavigation):
+
+    def __init__(
+        self,
+        engine,
+        show_navi_mark: bool = False,
+        random_navi_mark_color=False,
+        show_dest_mark=False,
+        show_line_to_dest=False,
+        seq_traj_len=30,
+        show_seq_traj=False,
+    ):
+        super(NodeNetworkNavigation,
+              self).__init__(engine, show_navi_mark, random_navi_mark_color, show_dest_mark, show_line_to_dest)
+        self._show_traj = show_seq_traj
+        self.seq_traj_len = seq_traj_len
+        if self._show_traj:
+            self._init_trajs()
+
+        #self.drawd = False
+
+        self.LINE_TO_DEST_HEIGHT += 1
+        self.activate_car_pos_marker = False
+
+    def _init_trajs(self):
+        for i in range(self.seq_traj_len):
+            init_line = LineSegs()
+            init_line.setColor(self.navi_mark_color[0], self.navi_mark_color[1], self.navi_mark_color[2], 0.7)
+            self.__dict__['traj_{}'.format(i)] = NodePath(init_line.create())
+            self.__dict__['traj_{}'.format(i)].reparentTo(self.origin)
+        init_line = LineSegs()
+        init_line.setColor(0.5, 0.5, 0.5, 0.7)
+        self.current_pos_marker = NodePath(init_line.create())
+        self.current_pos_marker.reparentTo(self.origin)
+
+    def _draw_trajectories(self, wp_list):
+        for i in range(self.seq_traj_len):
+            lines = LineSegs()
+            lines.setColor(self.navi_mark_color[0], self.navi_mark_color[1], self.navi_mark_color[2], 0.7)
+            #lines.moveTo(panda_position(wp_list[i][0], self.LINE_TO_DEST_HEIGHT+4))
+            lines.moveTo(panda_position((wp_list[i][0], wp_list[i][1]), self.LINE_TO_DEST_HEIGHT))
+            lines.drawTo(panda_position((wp_list[i + 1][0], wp_list[i + 1][1]), self.LINE_TO_DEST_HEIGHT))
+            lines.setThickness(2)
+            self.__dict__['traj_{}'.format(i)].removeNode()
+            self.__dict__['traj_{}'.format(i)] = NodePath(lines.create(False))
+            self.__dict__['traj_{}'.format(i)].hide(CamMask.Shadow | CamMask.RgbCam)
+            self.__dict__['traj_{}'.format(i)].reparentTo(self.origin)
+
+    def convert_wp_to_world_coord(self, index, rbt_pos, rbt_heading, wp):
+        theta = np.arctan2(wp[1], wp[0])
+        rbt_heading = np.arctan2(rbt_heading[1], rbt_heading[0])
+        theta = wrap_to_pi(rbt_heading) + wrap_to_pi(theta)
+        norm_len = norm(wp[0], wp[1])
+        position = rbt_pos
+        heading = np.sin(theta) * norm_len
+        side = np.cos(theta) * norm_len
+        return position[0] + side, position[1] + heading
+
+    def convert_waypoint_list_coord(self, rbt_pos, rbt_heading, wp_list):
+        wp_w_list = []
+        for wp in wp_list:
+            wp_w = self.convert_wp_to_world_coord(0, rbt_pos, rbt_heading, wp)
+            wp_w_list.append(wp_w)
+        return wp_w_list
+
+    # def draw_path(self, rbt_pos, rbt_heading):
+    #     wp_list = self.get_waypoint_list()
+    #     wp_list = self.convert_waypoint_list_coord(rbt_pos, rbt_heading, wp_list)
+    #     self._draw_trajectories(wp_list)
+
+    def draw_car_path(self, wp_list):
+        for i in range(self.seq_traj_len):
+            lines = LineSegs()
+            lines.setColor(self.navi_mark_color[0], self.navi_mark_color[1], self.navi_mark_color[2], 0.7)
+            #lines.moveTo(panda_position(wp_list[i][0], self.LINE_TO_DEST_HEIGHT+4))
+            lines.moveTo(panda_position((wp_list[i][0], wp_list[i][1]), self.LINE_TO_DEST_HEIGHT))
+            lines.drawTo(panda_position((wp_list[i + 1][0], wp_list[i + 1][1]), self.LINE_TO_DEST_HEIGHT))
+            lines.setThickness(2)
+            self.__dict__['traj_{}'.format(i)].removeNode()
+            self.__dict__['traj_{}'.format(i)] = NodePath(lines.create(False))
+            self.__dict__['traj_{}'.format(i)].hide(CamMask.Shadow | CamMask.RgbCam)
+            self.__dict__['traj_{}'.format(i)].reparentTo(self.origin)
+
+    def show_car_pos(self, wp_list, current_time_step):
+        #print(current_time_step)
+        cx = wp_list[current_time_step][0]
+        cy = wp_list[current_time_step][1]
+        ncx = wp_list[current_time_step + 1][0]
+        ncy = wp_list[current_time_step + 1][1]
+        theta = np.arctan2(ncy - cy, ncx - cx)
+        theta = 0.0
+        lines = LineSegs()
+        lines.setColor(0.9, 0.9, 0.9, 1.0)
+        #lines.moveTo(panda_position(wp_list[i][0], self.LINE_TO_DEST_HEIGHT+4))
+        lines.moveTo(panda_position((cx - 0.1 * np.sin(theta), cy + 0.1 * np.cos(theta)), self.LINE_TO_DEST_HEIGHT))
+        lines.drawTo(panda_position((cx + 0.1 * np.sin(theta), cy - 0.1 * np.cos(theta)), self.LINE_TO_DEST_HEIGHT))
+        lines.setThickness(10)
+        self.current_pos_marker.removeNode()
+        self.current_pos_marker = NodePath(lines.create(False))
+        self.current_pos_marker.hide(CamMask.Shadow | CamMask.RgbCam)
+        self.current_pos_marker.reparentTo(self.origin)
+        for i in range(self.seq_traj_len):
+            lines = LineSegs()
+            if current_time_step > i:
+                lines.setColor(1.0, 0.4, 0.0, 0.7)
+            else:
+                lines.setColor(0.0, 0.7, 1.0, 0.7)
+            #lines.setColor(self.navi_mark_color[0], self.navi_mark_color[1], self.navi_mark_color[2], 0.7)
+            #lines.moveTo(panda_position(wp_list[i][0], self.LINE_TO_DEST_HEIGHT+4))
+            lines.moveTo(panda_position((wp_list[i][0], wp_list[i][1]), self.LINE_TO_DEST_HEIGHT))
+            lines.drawTo(panda_position((wp_list[i + 1][0], wp_list[i + 1][1]), self.LINE_TO_DEST_HEIGHT))
+            lines.setThickness(2)
+            self.__dict__['traj_{}'.format(i)].removeNode()
+            self.__dict__['traj_{}'.format(i)] = NodePath(lines.create(False))
+            self.__dict__['traj_{}'.format(i)].hide(CamMask.Shadow | CamMask.RgbCam)
+            self.__dict__['traj_{}'.format(i)].reparentTo(self.origin)
+
+    def get_waypoint_list(self):
+        x = np.arange(0, 50, 0.1)
+        LENGTH = 4.51
+        y = 1 * np.cos(x) - 1
+        x = x + LENGTH / 2
+        lst = []
+        for i in range(x.shape[0]):
+            lst.append([x[i], y[i]])
+        return lst
+
+    def update_localization(self, ego_vehicle):
+        position = ego_vehicle.position
+        lane, lane_index = self._update_current_lane(ego_vehicle)
+        long, _ = lane.local_coordinates(position)
+        need_update = self._update_target_checkpoints(lane_index, long)
+        assert len(self.checkpoints) >= 2
+
+        # target_road_1 is the road segment the vehicle is driving on.
+        if need_update:
+            target_road_1_start = self.checkpoints[self._target_checkpoints_index[0]]
+            target_road_1_end = self.checkpoints[self._target_checkpoints_index[0] + 1]
+            target_lanes_1 = self.map.road_network.graph[target_road_1_start][target_road_1_end]
+            self.current_ref_lanes = target_lanes_1
+            self.current_road = Road(target_road_1_start, target_road_1_end)
+
+            # target_road_2 is next road segment the vehicle should drive on.
+            target_road_2_start = self.checkpoints[self._target_checkpoints_index[1]]
+            target_road_2_end = self.checkpoints[self._target_checkpoints_index[1] + 1]
+            target_lanes_2 = self.map.road_network.graph[target_road_2_start][target_road_2_end]
+
+            if target_road_1_start == target_road_2_start:
+                # When we are in the final road segment that there is no further road to drive on
+                self.next_road = None
+                self.next_ref_lanes = None
+            else:
+                self.next_road = Road(target_road_2_start, target_road_2_end)
+                self.next_ref_lanes = target_lanes_2
+
+        self._navi_info.fill(0.0)
+        half = self.navigation_info_dim // 2
+        self._navi_info[:half], lanes_heading1, checkpoint = self._get_info_for_checkpoint(
+            lanes_id=0, ref_lane=self.current_ref_lanes[0], ego_vehicle=ego_vehicle
+        )
+
+        self._navi_info[half:], lanes_heading2, _ = self._get_info_for_checkpoint(
+            lanes_id=1,
+            ref_lane=self.next_ref_lanes[0] if self.next_ref_lanes is not None else self.current_ref_lanes[0],
+            ego_vehicle=ego_vehicle
+        )
+
+        if hasattr(ego_vehicle, 'v_indx') and self._show_traj:
+            #print(ego_vehicle.v_indx)
+            if ego_vehicle.v_indx == 0:
+                #self.draw_car_path(ego_vehicle.v_wps)
+                self.activate_car_pos_marker = True
+            if self.activate_car_pos_marker:
+                self.show_car_pos(ego_vehicle.v_wps, ego_vehicle.v_indx)
+
+            # if ego_vehicle.v_indx == 4:
+            #     print('zt')
+
+        # if ego_vehicle.v_indx == 0:
+        #     self.draw_car_path(ego_vehicle.v_wps)
+
+        # if self.drawd is False:
+        #     self.draw_path(ego_vehicle.position, ego_vehicle.heading)
+        #     self.drawd = True
+
+        if self._show_navi_info:
+            # Whether to visualize little boxes in the scene denoting the checkpoints
+            pos_of_goal = checkpoint
+            self._goal_node_path.setPos(pos_of_goal[0], -pos_of_goal[1], 1.8)
+            self._goal_node_path.setH(self._goal_node_path.getH() + 3)
+            self.navi_arrow_dir = [lanes_heading1, lanes_heading2]
+            dest_pos = self._dest_node_path.getPos()
+            #self._draw_line_to_dest(start_position=ego_vehicle.position, end_position=(dest_pos[0], -dest_pos[1]))
diff --git a/core/utils/simulator_utils/md_utils/traj_policy.py b/core/utils/simulator_utils/md_utils/traj_policy.py
new file mode 100644
index 00000000..6d92cfcc
--- /dev/null
+++ b/core/utils/simulator_utils/md_utils/traj_policy.py
@@ -0,0 +1,107 @@
+import numpy as np
+
+from metadrive.policy.base_policy import BasePolicy
+from metadrive.utils.math_utils import not_zero, wrap_to_pi, point_distance
+from metadrive.utils import norm
+
+
+class TrajPolicy(BasePolicy):
+
+    def __init__(self, control_object, random_seed):
+        super(TrajPolicy, self).__init__(control_object=control_object, random_seed=random_seed)
+        self.last_heading = 0
+
+    def convert_wp_to_world_coord(self, rbt_pos, rbt_heading, wp, visual=False):
+        compose_visual = 0
+        if visual:
+            compose_visual += 0  # 4.51 / 2
+        theta = np.arctan2(wp[1], wp[0] + compose_visual)
+        rbt_heading = rbt_heading  # np.arctan2(rbt_heading[1], rbt_heading[0])
+        theta = wrap_to_pi(rbt_heading) + wrap_to_pi(theta)
+        norm_len = norm(wp[0] + compose_visual, wp[1])
+        position = rbt_pos
+        #position += 4.51 /2
+        heading = np.sin(theta) * norm_len
+        side = np.cos(theta) * norm_len
+        return position[0] + side, position[1] + heading
+
+    def convert_waypoint_list_coord(self, rbt_pos, rbt_heading, wp_list, visual=False):
+        wp_w_list = []
+        LENGTH = 4.51
+        for wp in wp_list:
+            # wp[0] = wp[0] + LENGTH / 2
+            wp_w = self.convert_wp_to_world_coord(rbt_pos, rbt_heading, wp, visual)
+            wp_w_list.append(wp_w)
+        return wp_w_list
+
+    def act(self, *args, **kwargs):
+        if (self.control_object.arrive_destination and hasattr(self.control_object, 'macro_succ')):
+            self.control_object.macro_succ = True
+        if (self.control_object.crash_vehicle and hasattr(self.control_object, 'crash_vehicle')):
+            self.control_object.macro_crash = True
+        if (len(args) >= 2):
+            macro_action = args[1]
+        frame = args[1]
+        wp_list = args[2]
+        ego_vehicle = self.control_object
+        if frame == 0:
+            self.base_pos = ego_vehicle.position
+            self.base_heading = ego_vehicle.heading_theta
+            self.control_object.v_wps = self.convert_waypoint_list_coord(
+                self.base_pos, self.base_heading, wp_list, True
+            )
+            self.control_object.penultimate_state = self.control_object.traj_wp_list[
+                -2]  # if len(wp_list)>2 else self.control_object.traj_wp_list[-1]
+            new_state = {}
+            new_state['position'] = ego_vehicle.position
+            new_state['yaw'] = ego_vehicle.heading_theta
+            new_state['speed'] = ego_vehicle.last_spd
+            self.control_object.traj_wp_list = []
+            self.control_object.traj_wp_list.append(new_state)
+        self.control_object.v_indx = frame
+        wp_list = self.convert_waypoint_list_coord(self.base_pos, self.base_heading, wp_list)
+        current_pos = np.array(wp_list[frame])
+        target_pos = np.array(wp_list[frame + 1])
+        diff = target_pos - current_pos
+        norm = np.sqrt(diff[0] * diff[0] + diff[1] * diff[1])
+
+        if abs(norm) < 0.001:
+            heading_theta_at = self.last_heading
+        else:
+            direction = diff / norm
+            heading_theta_at = np.arctan2(direction[1], direction[0])
+        self.last_heading = heading_theta_at
+        steering = 0  # self.steering_conrol_traj(lateral, heading_theta_at)
+        throtle_brake = 0  # self.speed_control(target_vel)
+        # print('target_vel: {}'.format(target_vel))
+        # print('target_heading_theta: {}'.format(heading_theta_at))
+        ttarget_pos = self.base_pos + target_pos
+        hheading_theata_at = heading_theta_at + self.base_heading
+        # print('target frame: {} with position ({}, {}) and orientation {}'.format(
+        #     frame, target_pos[0], target_pos[1], heading_theta_at))
+
+        # onestep state update, for trex
+        ego_vehicle.prev_state['position'] = ego_vehicle.curr_state['position']
+        ego_vehicle.prev_state['yaw'] = ego_vehicle.curr_state['yaw']
+        ego_vehicle.prev_state['speed'] = ego_vehicle.curr_state['speed']
+
+        ego_vehicle.set_position(target_pos)
+        ego_vehicle.set_heading_theta(heading_theta_at)
+        ego_vehicle.last_spd = norm / ego_vehicle.physics_world_step_size
+        new_state = {}
+        new_state['position'] = target_pos
+        new_state['yaw'] = heading_theta_at
+        new_state['speed'] = ego_vehicle.last_spd
+        self.control_object.traj_wp_list.append(new_state)
+
+        # onestep state update, for trex
+        ego_vehicle.curr_state['position'] = target_pos
+        ego_vehicle.curr_state['yaw'] = heading_theta_at
+        ego_vehicle.curr_state['speed'] = ego_vehicle.last_spd
+
+        #print(ego_vehicle.physics_world_step_size)
+        #ego_vehicle.last_spd = norm / 0.03 * 3.6
+        #ego_vehicle.set_velocity(heading_theta_at, norm / 0.03 *3.6)
+
+        #throtle_brake = throtle_brake if self.stop_label is False else -1
+        return [steering, throtle_brake]
diff --git a/core/utils/simulator_utils/md_utils/vehicle_utils.py b/core/utils/simulator_utils/md_utils/vehicle_utils.py
index aa5385c4..4d55bc08 100644
--- a/core/utils/simulator_utils/md_utils/vehicle_utils.py
+++ b/core/utils/simulator_utils/md_utils/vehicle_utils.py
@@ -1,6 +1,82 @@
+import copy
+import numpy as np
+from typing import Union, Dict, AnyStr, Tuple
+
 from metadrive.component.vehicle.base_vehicle import BaseVehicle
 from metadrive.utils import Config, safe_clip_for_small_array
-from typing import Union, Dict, AnyStr, Tuple
+from core.utils.simulator_utils.md_utils.navigation_utils import TrajNodeNavigation
+from metadrive.component.vehicle.vehicle_type import DefaultVehicle
+
+
+class MDDefaultVehicle(DefaultVehicle):
+
+    def __init__(self, vehicle_config: Union[dict, Config] = None, name: str = None, random_seed=None):
+        super(MDDefaultVehicle, self).__init__(vehicle_config, name, random_seed)
+        self.macro_succ = False
+        self.macro_crash = False
+        self.last_spd = 0
+        self.last_macro_position = self.last_position
+        self.v_wps = [[0, 0], [1, 1]]
+        self.v_indx = 1
+        self.physics_world_step_size = self.engine.global_config["physics_world_step_size"]
+        self.penultimate_state = {}
+
+        self.penultimate_state['position'] = np.array([0, 0])  # self.last_position
+        self.penultimate_state['yaw'] = 0
+        self.penultimate_state['speed'] = 0
+        self.traj_wp_list = []
+        self.traj_wp_list.append(copy.deepcopy(self.penultimate_state))
+        self.traj_wp_list.append(copy.deepcopy(self.penultimate_state))
+
+        self.pen_state = {}
+        self.prev_state = {}
+        self.curr_state = {}
+        self.pen_state['position'] = np.array([0, 0])  # self.last_position
+        self.pen_state['yaw'] = 0
+        self.pen_state['speed'] = 0
+        self.prev_state['position'] = np.array([0, 0])  # self.last_position
+        self.prev_state['yaw'] = 0
+        self.prev_state['speed'] = 0
+        self.curr_state['position'] = np.array([0, 0])  # self.last_position
+        self.curr_state['yaw'] = 0
+        self.curr_state['speed'] = 0
+
+    def before_macro_step(self, macro_action):
+        if macro_action is not None:
+            self.last_macro_position = self.position
+        else:
+            pass
+        return
+
+    def reset_state_stack(self):
+        self.pen_state['position'] = np.array([0, 0])  # self.last_position
+        self.pen_state['yaw'] = 0
+        self.pen_state['speed'] = 0
+        self.prev_state['position'] = np.array([0, 0])  # self.last_position
+        self.prev_state['yaw'] = 0
+        self.prev_state['speed'] = 0
+        self.curr_state['position'] = np.array([0, 0])  # self.last_position
+        self.curr_state['yaw'] = 0
+        self.curr_state['speed'] = 0
+
+    def add_navigation(self):
+        # if not self.config["need_navigation"]:
+        #     return
+        # navi = NodeNetworkNavigation if self.engine.current_map.road_network_type == NodeRoadNetwork \
+        #     else EdgeNetworkNavigation
+        navi = TrajNodeNavigation
+        # print('seq len len ')
+        # print(self.engine.global_config["seq_traj_len"])
+        #print('navigation rigister')
+        self.navigation = navi(
+            self.engine,
+            show_navi_mark=self.engine.global_config["vehicle_config"]["show_navi_mark"],
+            random_navi_mark_color=self.engine.global_config["vehicle_config"]["random_navi_mark_color"],
+            show_dest_mark=self.engine.global_config["vehicle_config"]["show_dest_mark"],
+            show_line_to_dest=self.engine.global_config["vehicle_config"]["show_line_to_dest"],
+            seq_traj_len=self.engine.global_config["seq_traj_len"],
+            show_seq_traj=self.engine.global_config["show_seq_traj"]
+        )
 
 
 class MacroBaseVehicle(BaseVehicle):
diff --git a/docs/figs/qr.png b/docs/figs/qr.png
new file mode 100644
index 00000000..6b426d12
Binary files /dev/null and b/docs/figs/qr.png differ
diff --git a/docs/source/api_doc/envs.rst b/docs/source/api_doc/envs.rst
index 29a20141..1e45caf4 100644
--- a/docs/source/api_doc/envs.rst
+++ b/docs/source/api_doc/envs.rst
@@ -28,6 +28,11 @@ MetaDriveMacroEnv
 .. autoclass:: core.envs.MetaDriveMacroEnv
 
 
+MetaDriveTrajEnv
+=====================
+.. autoclass:: core.envs.MetaDriveTrajEnv
+
+
 DriveEnvWrapper
 =======================
 .. autoclass:: core.envs.DriveEnvWrapper
diff --git a/docs/source/conf.py b/docs/source/conf.py
index 94b4570d..3adb7ebd 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -23,8 +23,8 @@
 author = 'OpenDILab'
 
 # The full version, including alpha/beta/rc tags
-version = '0.3.2'
-release = '0.3.2'
+version = '0.3.3'
+release = '0.3.3'
 
 
 # -- General configuration ---------------------------------------------------
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 70b76489..86a754c1 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -14,7 +14,7 @@ DI-drive Documentation
 
 Decision Intelligence Platform for Autonomous Driving simulation.
 
-Last updated on 2022.2.25
+Last updated on 2022.6.5
 
 -----