Merge pull request #62 from sdu-cfei/issue_57_parallelize

Parallelize GA + add FMPy
sdu-cfei · Oct 22, 2020 · d576cfd · d576cfd
2 parents b2e0ac7 + b6b70ef
commit d576cfd
Show file tree

Hide file tree

Showing 54 changed files with 1,840 additions and 1,335 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,4 @@
+venv/
 .idea/
 *.ipynb
 *.pyc
@@ -6,6 +7,7 @@
 **/workdir/**
 build/
 .vscode/
+*.swp
 
 # Setuptools distribution folder.
 dist/

diff --git a/CHANGES.txt b/CHANGES.txt
@@ -1,36 +1,41 @@
-Changes in v. 0.0.9:
+Changes in v.0.1
+====================
+- parallel genetic algorithm added (based on modestga)
+- FMPy instead of pyFMI
+
+Changes in v.0.0.9:
 ====================
 - it is possible now to estimate just 1 parameter (fixed bug in plot_pop_evo())
 
-Changes in v. 0.0.8:
+Changes in v.0.0.8:
 ====================
 - Version used in the ModestPy paper
 - Added interface to SciPy algorithms
 
-Changes in v. 0.0.7:
+Changes in v.0.0.7:
 ====================
 - added SQP method
 - modified interface of the Estimation class to facilitate multi-algorithm pipelines
 
-Changes in v. 0.0.6:
+Changes in v.0.0.6:
 ====================
 - LHS initialization of GA
 - random seed
 - many small bug fixes
 
-Changes in v. 0.0.5:
+Changes in v.0.0.5:
 ====================
 - Decreased tolerance of CVode solver in PyFMI
 
-Changes in v. 0.0.4:
+Changes in v.0.0.4:
 ====================
 - New pattern search plot (parameter evolution) added to Estimation.py
 - GA/PS default parameters tuned
 
-Changes in v. 0.0.3:
+Changes in v.0.0.3:
 ====================
 - Tolerance criteria for GA and PS exposed in the Estimation API.
 
-Changes in v. 0.0.2:
+Changes in v.0.0.2:
 ====================
 - Estimation class imported directly in __init__.py to allow imports like "from modestpy import Estimation".
diff --git a/README.rst b/README.rst
@@ -14,67 +14,33 @@ Features:
 
 - combination of global and local search methods (genetic algorithm, pattern search, truncated Newton method, L-BFGS-B, sequential least squares),
 - suitable also for non-continuous and non-differentiable models,
-- compatible with both Python 2.7 and 3 (tested up to 3.5).
+- scalable to multiple cores (genetic algorithm from `modestga <https://github.com/krzysztofarendt/modestga>`_),
+- Python 3.
 
-Installation with conda (recommended)
--------------------------------------
+Installation with pip (recommended)
+-----------------------------------
 
-It is now possible to install ModestPy through ``conda``:
+It is now possible install ModestPy with a single command:
 
 ::
 
-   conda config --add channels conda-forge
-   conda install modestpy
-
-Installation with conda and pip
--------------------------------
-
-This procedure has been tested on Debian 9 and Ubuntu 16.04 with Python 3.
-
-It is advised to use ``conda`` to install the required dependencies.
-``modestpy`` itself can be installed using ``pip`` inside the ``conda`` environment.
-
-Create separate environment (optional):
-
-::
-
-    conda create --name modestpy
-    conda activate modestpy
-
-Install dependencies:
-
-::
+    pip install modestpy
 
-    conda install scipy pandas numpy matplotlib
-    conda install -c chria pyfmi
-    conda install -c conda-forge pydoe
-
-Install ``modestpy``:
+Alternatively:
 
 ::
 
-    python -m pip install modestpy
-
-Installation with pip
----------------------
-
-This procedure has been tested on Windows 7 with Python 2.
-
-Install ``pyfmi`` as part of `JModelica <http://www.jmodelica.org/>`__.
-
-To install ``modestpy`` use ``pip`` (other dependencies will be installed automatically):
-
-::
+    pip install https://github.com/sdu-cfei/modest-py/archive/master.zip
 
-    python -m pip install modestpy
+Installation with conda
+-----------------------
 
-To get the latest development version download directly from GitHub repository:
+Conda is installation is less frequently tested, but should work:
 
 ::
 
-    python -m pip install https://github.com/sdu-cfei/modest-py/archive/master.zip
-
-Note, that JModelica installs Python and libraries in a separate directory than the standard Python distribution. Therefore either the path to those libraries needs to be added to PYTHONPATH or ModestPy needs to be installed inside the JModelica distribution.
+   conda config --add channels conda-forge
+   conda install modestpy
 
 Test your installation
 ----------------------
@@ -98,20 +64,25 @@ Usage
 -----
 
 Users are supposed to call only the high level API included in
-``modestpy.Estimation``. The API is fully discussed in `this
-wiki <https://github.com/sdu-cfei/modest-py/wiki/modestpy-API>`__. You
-can also check out this `simple example </examples/simple>`__. The basic
-usage is as follows:
+``modestpy.Estimation``. The API is fully discussed in the `docs <docs/documentation.md>`__.
+You can also check out this `simple example </examples/simple>`__.
+The basic usage is as follows:
 
 .. code:: python
 
-    >>> from modestpy import Estimation
-    >>> session = Estimation(workdir, fmu_path, inp, known, est, ideal)
-    >>> estimates = session.estimate()
-    >>> err, res = session.validate()
+    from modestpy import Estimation
+
+    if __name__ == "__main__":
+        session = Estimation(workdir, fmu_path, inp, known, est, ideal)
+        estimates = session.estimate()
+        err, res = session.validate()
+
+More control is possible via optional arguments, as discussed in the `documentation
+<docs/documentation.md>`__.
 
-More control is possible via optional arguments, as discussed in the `documentation 
-<https://github.com/sdu-cfei/modest-py/wiki/modestpy-API>`__.
+The ``if __name__ == "__main__":`` wrapper is needed on Windows, because ``modestpy``
+relies on ``multiprocessing``. You can find more explanation on why this is needed
+`here <https://docs.python.org/3/library/multiprocessing.html#multiprocessing-programming>`__.
 
 ``modestpy`` automatically saves results in the working
 directory including csv files with estimates and some useful plots,

diff --git a/bin/test.py b/bin/test.py
@@ -1,4 +1,7 @@
 #!/usr/bin/env python
 from modestpy.test import run
+from modestpy.loginit import config_logger
 
-run.tests()
+if __name__ == "__main__":
+    config_logger(filename='unit_tests.log', level='DEBUG')
+    run.tests()
diff --git a/docs/documentation.md b/docs/documentation.md
@@ -0,0 +1,161 @@
+# modestpy
+## Introduction
+
+Users are supposed to use only `modestpy.Estimation` class and its two
+methods `estimate()` and `validate()`. The class defines a single interface
+for different optimization algorithms. Currently, the available algorithms are:
+- parallel genetic algorithm (MODESTGA) - recommended,
+- legacy single-process genetic algorithm (GA),
+- pattern search (PS),
+- SciPy solvers (e.g. 'TNC', 'L-BFGS-B', 'SLSQP').
+
+The methods can be used in a sequence, e.g. MODESTGA+PS (default),
+using the argument `methods`. All estimation settings are set during instantiation.
+Results of estimation and validation are saved in the working directory `workdir`
+(it must exist).
+
+## Learn by examples
+
+First define the following variables:
+
+* `workdir` (`str`) - path to the working directory (it must exist)
+* `fmu_path` (`str`) - path to the FMU compiled for your platform
+* `inp` (`pandas.DataFrame`) - inputs, index given in seconds and named `time`
+* `est` (`dict(str : tuple(float, float, float))`) - dictionary mapping parameter names to tuples (initial guess, lower bound, upper bound)
+* `known` (`dict(str : float)`) - dictionary mapping parameter names to known values
+* `ideal` (`pandas.DataFrame`) - ideal solution (usually measurements), index given in seconds and named `time`
+
+Indexes of `inp` and `ideal` must be equal, i.e. `inp.index == ideal.index` must be `True`.
+Columns in `inp` and `ideal` must have the same names as model inputs and outputs, respectively.
+All model inputs must be present in `inp`, but only chosen outputs may be included in `ideal`.
+Data for each variable present in `ideal` are used to calculate the error function that is minimized by **modestpy**.
+
+Now the parameters can be estimated using default settings:
+
+```
+python
+>>> session = Estimation(workdir, fmu_path, inp, known, est, ideal)
+>>> estimates = session.estimate()  # Returns dict(str: float)
+>>> err, res = session.validate()   # Returns tuple(dict(str: float), pandas.DataFrame)
+```
+
+All results are also saved in `workdir`.
+
+By default all data from `inp` and `ideal` (all rows) are used in both estimation and validation.
+To slice the data into separate learning and validation periods, additional arguments need to be defined:
+
+* `lp_n` (`int`) - number of learning periods, randomly selected within `lp_frame`
+* `lp_len` (`float`) - length of single learning period
+* `lp_frame` (`tuple(float, float)`) - beginning and end of learning time frame
+* `vp` (`tuple(float, float)`) - validation period
+
+Often model parameters are used to define the initial conditions in the model,
+in example initial temperature. The initial values have to be read from the measured data stored in `ideal`.
+You can do this with the optional argument `ic_param`:
+
+* `ic_param` (`dict(str : str)`) - maps model parameters to column names in `ideal`
+
+Estimation algorithms (MODESTGA, PS, SQP) can be tuned by overwriting specific keys in `modestga_opts`, `ps_opts` and `scipy_opts`.
+The default options are:
+
+```
+# Default MODESTGA options
+MODESTGA_OPTS = {
+    'workers': 3,              # CPU cores to use
+    'generations': 50,         # Max. number of generations
+    'pop_size': 30,            # Population size
+    'mut_rate': 0.01,          # Mutation rate
+    'trm_size': 3,             # Tournament size
+    'tol': 1e-3,               # Solution tolerance
+    'inertia': 100,            # Max. number of non-improving generations
+    'ftype': 'RMSE'
+}
+
+# Default PS options
+self.PS_OPTS = {
+    'maxiter':  500,
+    'rel_step': 0.02,
+    'tol':      1e-11,
+    'try_lim':  1000,
+    'ftype':    'RMSE'
+}
+
+# Default SCIPY options
+SCIPY_OPTS = {
+    'solver': 'L-BFGS-B',
+    'options': {'disp': True,
+                'iprint': 2,
+                'maxiter': 150,
+                'full_output': True},
+    'ftype': 'RMSE'
+}
+```
+
+## Docstrings
+
+```python
+class Estimation(object):
+    """Public interface of `modestpy`.
+
+    Index in DataFrames `inp` and `ideal` must be named 'time'
+    and given in seconds. The index name assertion check is
+    implemented to avoid situations in which a user reads DataFrame
+    from a csv and forgets to use `DataFrame.set_index(column_name)`
+    (it happens quite often...).
+
+    Currently available estimation methods:
+        - MODESTGA  - parallel genetic algorithm (default GA in modestpy)
+        - GA_LEGACY - single-process genetic algorithm (legacy implementation, discouraged)
+        - PS        - pattern search (Hooke-Jeeves)
+        - SCIPY     - interface to algorithms available through
+                      scipy.optimize.minimize()
+
+    Parameters:
+    -----------
+    workdir: str
+        Output directory, must exist
+    fmu_path: str
+        Absolute path to the FMU
+    inp: pandas.DataFrame
+        Input data, index given in seconds and named 'time'
+    known: dict(str: float)
+        Dictionary with known parameters (`parameter_name: value`)
+    est: dict(str: tuple(float, float, float))
+        Dictionary defining estimated parameters,
+        (`par_name: (guess value, lo limit, hi limit)`)
+    ideal: pandas.DataFrame
+        Ideal solution (usually measurements),
+        index in seconds and named `time`
+    lp_n: int or None
+        Number of learning periods, one if `None`
+    lp_len: float or None
+        Length of a single learning period, entire `lp_frame` if `None`
+    lp_frame: tuple of floats or None
+        Learning period time frame, entire data set if `None`
+    vp: tuple(float, float) or None
+        Validation period, entire data set if `None`
+    ic_param: dict(str, str) or None
+        Mapping between model parameters used for IC and variables from
+        `ideal`
+    methods: tuple(str, str)
+        List of methods to be used in the pipeline
+    ga_opts: dict
+        Genetic algorithm options
+    ps_opts: dict
+        Pattern search options
+    scipy_opts: dict
+        SciPy solver options
+    ftype: string
+        Cost function type. Currently 'NRMSE' (advised for multi-objective
+        estimation) or 'RMSE'.
+    seed: None or int
+        Random number seed. If None, current time or OS specific
+        randomness is used.
+    default_log: bool
+        If true, use default logging settings. Use false if you want to
+        use own logging.
+    logfile: str
+        If default_log=True, this argument can be used to specify the log
+        file name
+    """
+```
diff --git a/examples/lin/README.md b/examples/lin/README.md
@@ -0,0 +1,3 @@
+The charts in `showcase/` show the behavior of GA and PS when the cost function is convex. The charts were generated by an finding the parameters of the model `resources/lin_model.mo`, but the Python code used to generate these charts is no longer here.
+
+See `examples/simple/` for an example with code.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		The charts in `showcase/` show the behavior of GA and PS when the cost function is convex. The charts were generated by an finding the parameters of the model `resources/lin_model.mo`, but the Python code used to generate these charts is no longer here.

		See `examples/simple/` for an example with code.