diff --git a/docs/chronics.rst b/docs/chronics.rst index 42885255..8a13f567 100644 --- a/docs/chronics.rst +++ b/docs/chronics.rst @@ -1,7 +1,9 @@ .. currentmodule:: grid2op.Chronics -Chronics -=================================== +.. _time-series-module: + +Time series (formerly called "chronics") +========================================= This page is organized as follow: diff --git a/docs/mdp.rst b/docs/mdp.rst index 94eed2f0..d85a193e 100644 --- a/docs/mdp.rst +++ b/docs/mdp.rst @@ -109,7 +109,7 @@ MDP): :nowrap: \begin{align*} - \min_{\pi \in \Pi} ~& \sum_{t=1}^T r_t \\ + \min_{\pi \in \Pi} ~& \sum_{t=1}^T \mathbb{E} r_t \\ \text{s.t.} ~ \\ & \forall t, a_t \sim \pi (s_{t}) & \text{policy produces the action} \\ & \forall t, s_{t+1} \sim \mathcal{L}_S(s_t, a_t) & \text{environment produces next state} \\ @@ -134,14 +134,17 @@ This simulator is able to compute some informations that are part of the state space :math:`\mathcal{S}` (*eg* flows on powerlines, active production value of generators etc.) and thus are used in the computation of the transition kernel. -TODO how to model it. +We can model this simulator with a function :math:`\text{Sim}` that takes as input some data from an +"input space" :math:`\mathcal{S}_{\text{im}}^{(\text{in})}` and result +in data in :math:`\mathcal{S}_{\text{im}}^{(\text{out})}`. -.. This simulator is also used when implementing the transition kernel. Some part of the state space - - -.. other information given by the Environment (see :ref:`environment-module` for details about the -.. way the `Environment` is coded and refer to :class:`grid2op.Action._backendAction._BackendAction` for list -.. of all available informations informatically available for the solver). +.. note:: + In grid2op we don't force the "shape" of :math:`\mathcal{S}_{\text{im}}^{(\text{in})}`, including + the format used to read the grid file from the hard drive, the solved equations, the way + these equations are used. Everything here is "free" and grid2op only needs that the simulator + (wrapped in a `Backend`) understands the "format" sent by grid2op (through a + :class:`grid2op.Action._backendAction._BackendAction`) and is able to expose + to grid2op some of its internal variables (accessed with the `***_infos()` methods of the backend) To make a parallel with similar concepts "simulator", @@ -153,21 +156,63 @@ here excepts that it solves powerflows. Some Time Series +++++++++++++++++ -TODO +Another type of data that we need to define "the" grid2op MDP is the "time series", implemented in the `chronics` +grid2op module documented on the page +:ref:`time-series-module` with some complements given in the :ref:`doc_timeseries` page as well. + +These time series define what exactly would happen if the grid was a +"copper plate" without any constraints. Said differently it provides what would each consumer +consume and what would each producer produce if they could all be connected together with +infinite "bandwith", without any constraints on the powerline etc. + +In particular, grid2op supposes that these "time series" are balanced, in the sense that the producers +produce just the right amount (electrical power cannot really be stocked) for the consumer to consume +and that for each steps. It also supposes that all the "constraints" of the producers. + +These time series are typically generated outside of grid2op, for example using `chronix2grid `_ +python package (or anything else). + + +Formally, we will define these time series as input :math:`\mathcal{X}_t` all these time series at time :math:`t`. These +exogenous data consist of : + +- generator active production (in MW), for each generator +- load active power consumption (in MW), for each loads +- load reactive consumption (in MVAr), for each loads +- \* generator voltage setpoint / target (in kV) + +.. note:: + \* for this last part, this can be adapted "on demand" by the environment through the `voltage controler` module. + But for the sake of modeling, this can be modeled as being external / exogenous data. + +And, to make a parrallel with similar concept in other RL environment, these "time series" can represent the layout of the maze +in pacman, the positions of the platforms in "mario-like" 2d games, the different turns and the width of the route in a car game etc. +This is the "base" of the levels in most games. + +Finally, for most released environment, a lof of different :math:`\mathcal{X}` are available. By default, each time the +environment is "reset" (the user want to move to the next scenario), a new :math:`\mathcal{X}` is used (this behaviour +can be changed, more information on the section :ref:`environment-module-chronics-info` of the documentation). .. _mdp-def: Modeling sequential decisions ------------------------------- -TODO +As we said in introduction of this page, we will model a given scenario in grid2op. We have at our disposal: +- a simulator, which is represented as a function :math:`\text{Sim} : \mathcal{S}_{\text{im}}^{(\text{in})} \to \mathcal{S}_{\text{im}}^{(\text{out})}` +- some time series :math:`\mathcal{X} = \left\{ \mathcal{X}_t \right\}_{1 \leq t \leq T}` -Inputs -~~~~~~~~~~ +And we need to define the MDP through the definition of : -Markov Decision process -~~~~~~~~~~~~~~~~~~~~~~~~ +- :math:`\mathcal{S}`, the "state space" +- :math:`\mathcal{A}`, the "action space" +- :math:`\mathcal{L}_s(s, a)`, sometimes called "transition kernel", is the probability + distribution (over :math:`\mathcal{S}`) that gives the next + state after taking action :math:`a` in state :math:`s` +- :math:`\mathcal{L}_r(s, s', a)`, sometimes called "reward kernel", + is the probability distribution (over :math:`[0, 1]`) that gives + the reward :math:`r` after taking action :math:`a` in state :math:`s` which lead to state :math:`s'` Extensions -----------