Merge pull request #405 from matthiaskoenig/develop

pkdb-v0.6.0
matthiaskoenig · Jun 24, 2019 · 0a2c003 · 0a2c003
2 parents c183053 + 1022a4a
commit 0a2c003
Show file tree

Hide file tree

Showing 111 changed files with 3,398 additions and 1,886 deletions.
diff --git a/.env.local b/.env.local
@@ -7,10 +7,11 @@
 PKDB_DOCKER_COMPOSE_YAML=docker-compose-develop.yml
 PKDB_DJANGO_CONFIGURATION=local
 
-PKDB_API_BASE="http://0.0.0.0:8000"
+PKDB_API_BASE=http://0.0.0.0:8000
+FRONTEND_BASE=http://0.0.0.0:8080
 
 PKDB_SECRET_KEY="cgasj6yjpkagcgasj6yjpkagcgasj6yjpkag"
-PKDB_DEFAULT_PASSWORD="pkdb"
+PKDB_ADMIN_PASSWORD=pkdb_admin
 
 PKDB_DB_NAME=postgres
 PKDB_DB_PASSWORD=postgres

diff --git a/.gitignore b/.gitignore
@@ -16,6 +16,7 @@ __pycache__/
 # Environments
 # ----------------------------
 .env
+.env.production
 frontend/.env.production
 
 # ----------------------------

diff --git a/.travis.yml b/.travis.yml
@@ -12,7 +12,7 @@ env:
   PKDB_API_BASE="http://0.0.0.0:8000"
 
   PKDB_SECRET_KEY="cgasj6yjpkagcgasj6yjpkagcgasj6yjpkag"
-  PKDB_DEFAULT_PASSWORD="pkdb"
+  PKDB_ADMIN_PASSWORD="pkdb"
 
   PKDB_DB_NAME=postgres
   PKDB_DB_PASSWORD=postgres

diff --git a/CURATION.md b/CURATION.md
@@ -20,6 +20,32 @@ the latest data on file changes therefore allowing rapid iteration of curation a
 validation. Information on how to setup the `watch_study` script is provided
 in https://github.com/matthiaskoenig/pkdb_data
 
+As initial setup create a virtual environment with `pkdb_data` installed.
+```
+git clone https://github.com/matthiaskoenig/pkdb_data.git
+cd pkdb_data
+mkvirtualenv pkdb_data --python=python3.6
+pip install -e .
+```
+Next step is to export your credentials via environment variables.
+Create a `.env` file with the following content
+```
+API_BASE=https://develop.pk-db.com
+USER=PKDB_USERNAME
+PASSWORD=PKDB_PASSWORD
+```
+And export the variables via
+```
+set -a && source .env
+```
+To check the environment variables use
+```
+echo $API_BASE
+echo $USER
+echo $PASSWORD
+```
+
+To watch a given study use 
 ```
 # activate virtualenv with watch_study script
 workon pkdb_data
@@ -31,6 +57,9 @@ workon pkdb_data
 (pkdb_data) watch_study -s $STUDYFOLDER
 ```
 
+Here the example output for the successfully uploaded study `Renner2007`. On file changes the study will be reuploaded
+![Interactive curation with watch_study](./docs/curation/watch_study.png "Interactive curation with watch_study")
+
 ## 1. PDF, Reference, Figures & Tables
 
 For upload a certain folder structure and organisation is expected of the `$STUDYFOLDER`.
@@ -39,8 +68,8 @@ The first step is to create the folder and the basic files in the folder
 - folder name is `STUDYNAME`, e.g., `Albert1974`
 - folder contains the pdf as `STUDYNAME.pdf`, e.g., `Albert1974.pdf`
 - folder contains additional files associated with study, i.e.,
-    - tables, named `STUDYNAME_Tab[1-9]*.png`, e.g., `Albert1974_Tab1.png`
-    - figures, named `STUDYNAME_Fig[1-9]*.png`, e.g., `Albert1974_Fig2.png`
+    - tables, named `STUDYNAME_Tab[1-9]*.png`, e.g., `Albert1974_Tab2.png`
+    - figures, named `STUDYNAME_Fig[1-9]*.png`, e.g., `Albert1974_Fig1.png`
     - excel file, named `STUDYNAME.xlsx`, e.g., `Albert1974.xlsx`
 
 In addition the folder can contain data files, 
@@ -52,6 +81,7 @@ Information about the study is stored in the `study.json` which we will create
 in the following steps. Information about the reference for the study is stored
 in the `reference.json` (this file is created automatically and should not be altered).
 
+![Overview study files](./docs/curation/curation_files.png "Overview study files")
 
 ## 2. Initial study information (`study.json`)
 
@@ -72,12 +102,9 @@ contains all the relevant information
     "access": "public || private",
     "creator": "mkoenig",
     "curators": [
-        "mkoenig",
-        "janekg"
+        ["mkoenig", 0.5]
     ],
     "collaborators": [],
-    "substances": [],
-    "keywords": [],
     "descriptions": [],
     "comments": [],
     "groupset": {
@@ -106,9 +133,10 @@ contains all the relevant information
 * Fill in basic information for study, i.e., the `name` field with the `$STUDYNAME`, the `creator` and `curator` and `collaborator` fields with existing users 
 (creator is a single user, whereas curators is a list of users), 
 the `sid` and `reference` field with the `PubmedId` of the study.  
-* Substances which are used in the study should be listed in the `substances`. Substances must be existing substances which can be looked up at https://develop.pk-db.com/#/curation
-* `keywords` relevant for the study should be mentioned in the `keywords` list. Keywords must be existing keywords which can be looked up at https://develop.pk-db.com/#/curation
 * The `reference` field is optional. If no pubmed entry exist for publication a `reference.json` should be build manually (please ask what to do in such a case).
+* The `access` field provides information on who can see the study. `public` provides access to everyone, `private` only to the `creator`, `curators` and `collaborators`.
+* The `curators` is a list which consists of either curator names (e.g. `mkoenig`) or a curator name with a curation score between 0.0 and 5.0 (e.g. `[mkoenig, 3.5]`)
+* The `licence` field provides information on the licence of the publication. This is either `open` in case of Open Access publications or `closed` otherwise. Images and the PDF are only shown publicly if the publication is Open Access.
 
 After this initial information is created in the `study_json` we can start running the `watch_study` script.
 ```
@@ -121,7 +149,8 @@ Descriptions are quotations from the publication to substantiate and support the
 `comments` provide the possibility to store information from individual curators. Examples of comments are stating incorrect units, missing data or strange conversions of data. 
 
 
-## 3. Curation of groups
+## 3. Curation of groups and individuals
+### groups
 The next step is the curation of the `group` information, i.e., which groups where studied. The information is stored in the `groupset` of the following form.
 Retrieve group information from the publication and copy it in the description of the groupset. The top group containing all subjects must be called `all`.
 
@@ -131,7 +160,7 @@ An overview over the available `characteristica` and possible choices is availab
 ```json
 {
   "groupset": {
-    "description": [
+    "descriptions": [
       "The subjects were six healthy volunteers, three males and three females, aged 21.0 ± 0.9 years (range 20 to 22 years) and weighing 63 ± 11 kg (range 50 to 76 kg). All were nonsmokers. Subjects abstained from all methylxanthine containing foods and beverages during the entire period of the study. Compliance with this requirement was assessed by questioning at each presentation for blood sampling or urine delivery."
     ],
     "comments": [],
@@ -141,19 +170,19 @@ An overview over the available `characteristica` and possible choices is availab
         "name": "all",
         "characteristica": [
           {
-            "category": "species",
+            "measurement_type": "species",
             "choice": "homo sapiens"
           },
           {
-            "category": "healthy",
+            "measurement_type": "healthy",
             "choice": "Y"
           },
           {
-            "category": "overnight fast",
+            "measurement_type": "overnight fast",
             "choice": "Y"
           },
           {
-            "category": "fasted",
+            "measurement_type": "fasted",
             "min": "10",
             "max": "14",
             "unit": "hr"
@@ -173,7 +202,7 @@ An overview over the available `characteristica` and possible choices is availab
 All available fields for characteristica on group are:
 ```json
 {
-    "category": "categorial",
+    "measurement_type": "categorial",
     "choice": "categorial|string",
     "count": "int",
     "mean": "double",
@@ -186,9 +215,39 @@ All available fields for characteristica on group are:
     "unit": "categorial"
 }
 ```
+### individuals
+Individuals are curated very similar to groups with the exception that individuals must belong
+to a given group, i.e., the `group` attribute must be set. Individuals are most often defined based on spreadsheet mappings.
+See for instance below individuals which are defined via a table.
+
+```json
+"individuals": [
+      {
+        "name": "col==subject",
+        "group": "col==group",
+        "source": "Akinyinka2000_Tab1.csv",
+        "format": "TSV",
+        "figure": "Akinyinka2000_Tab1.png",
+        "characteristica": [
+          {
+            "measurement_type": "age",
+            "value": "col==age",
+            "unit": "yr"
+          },
+          {
+            "measurement_type": "weight",
+            "value": "col==weight",
+            "unit": "kg"
+          },
+          {
+            "measurement_type": "sex",
+            "choice": "col==sex"
+          }
+        ]
+      }
+    ]
+```
 
-Individuals are curated like groups with the exception that individuals must belong
-to a given group, i.e., the `group` attribute must be set. Individuals are most often defined via excel spreadsheets.
 
 ## 4. Curation of interventions/interventionset
 ```json
@@ -202,7 +261,7 @@ to a given group, i.e., the `group` attribute must be set. Individuals are most
             "route": "iv",
             "form": "solution",
             "application": "single dose",
-            "category": "dosing",
+            "measurement_type": "dosing",
             "value": "0.5",
             "unit": "g/kg",
             "time": "0",
@@ -217,7 +276,7 @@ All available fields for intervention and interventionset are:
 {
 
     "name": "string",
-    "category": "categorial",
+    "measurement_type": "categorial",
 
     "substance": "categorial (substance)",
     "route": "categorial {oral, iv}",
@@ -240,15 +299,41 @@ All available fields for intervention and interventionset are:
 ```
 * TODO: document after next iteration
 
-## 5. Curation of output/outputset
-Data from Figures should be digized using plot digitizer. See guidelines in
+## 5. Curation of outputs and time courses
+The actual data in publication is available either from tables, figures or stated with the text.
+All information should be curated by the means of excel spreadsheets, i.e., data must be digitized and transferred from the
+PDF in a spreadsheet.
 
-- Use Excel (LibreOffice/OpenOffice) spreadsheets to store data, with all digitized data is stored in excel spreadsheets
+- Use Excel (LibreOffice/OpenOffice) spreadsheets to store digitized data
 - change language settings to use US numbers and formats, i.e. ‘.’ separator). Always use points (‘.’) as number separator, never comma (‘,’), i.e. 1.234 instead of 1,234.
-- PlotDigitizer to digitize figures (https://sourceforge.net/projects/plotdigitizer/)
 
-```json
+For all figures and tables from which data is extracted individual images (`png`) must be stored in the study folder, i.e.,
+- tables, named `STUDYNAME_Tab[1-9]*.png`, e.g., `Albert1974_Tab2.png`
+- figures, named `STUDYNAME_Fig[1-9]*.png`, e.g., `Albert1974_Fig1.png`
+Use the screenshot functionality in the PDF viewer and save with image program like `gimp` after removing unnecessary borders of images.
+
+![Overview study files](./docs/curation/curation_files.png "Overview study files")
+
+### Figures
+- Use PlotDigitizer to digitize figures (https://sourceforge.net/projects/plotdigitizer/)
+- Open the image to digitize (`STUDYNAME_Fig[1-9]*.png`)
+- Use the Zoom function to increase the image if necessary (easier to click on data points)
+- First axes have to be calibrated (make sure to set logarithmical axes where necessary); calibration should be done very carfully because it will have a systematic effect (bias) on all digitized data points.
+- Digitize one curve at a time
+- Digitize mean and error separately (the actual error can than be calculated in excel as `abs(mean-error)`)
+- In case of time courses adapt the time points to the actual times reported in the publication
+
+![Figure digitization](./docs/curation/figure_digitization.png "Figure digitization (axes have been calibrated and first mean curve digitized)")
+
+Some tips for digitizion of figures:
+- Check that axes are do not have any breaks or different scales (sometimes an axis is continued or not scaled equally). This will create problems in the digitization.
+- use the same axes calibration for all curves in a figure, i.e., always finish all data from a single figure panel completely in one go (otherwise different bias for the different curves)
+- The left lower corner is not always `(0,0)`, use the minimal x-tick an y-tick with available numerical values
+- set the number of digits to a reasonable value (2-3 digits)
+
+### Tables
 
+```json
 {
         "source": "Akinyinka2000_Tab3.csv",
         "format": "TSV",
@@ -260,26 +345,47 @@ Data from Figures should be digized using plot digitizer. See guidelines in
         ],
         "substance": "paraxanthine",
         "tissue": "col==tissue",
-        "pktype": "cmax || tmax || auc_inf || thalf",
+        "measurement_type": "cmax || tmax || auc_inf || thalf",
         "mean": "col==cmax || col==tmax || col==aucinf || col==thalf",
         "sd": "col==cmax_sd || col==tmax_sd || col==aucinf_sd || col==thalf_sd",
         "unit": "\u00b5g/ml || hr || \u00b5g*hr/ml || hr"
     }
 ```
 
+```json
+{
+    "timecourses": [
+      {
+        "group": "all",
+        "groupby": "intervention",
+        "interventions": "col==intervention",
+        "source": "Albert1974_Fig1.tsv",
+        "format": "TSV",
+        "figure": "Albert1974_Fig1.png",
+        "substance": "paracetamol",
+        "tissue": "plasma",
+        "measurement_type": "concentration",
+        "time": "col==time_min",
+        "time_unit": "min",
+        "mean": "col==apap",
+        "cv": "col==apap_cv",
+        "unit": "µg/ml"
+      }
+    ]
+}
+```
+
 
 ## Units for curation
-Pint units:
-https://github.com/hgrecco/pint/blob/master/pint/default_en_0.6.txt
+Units are crucial information to make sense out of the data set. So many `characteristica`, `interventions` and `outputs/timecourses`
+require to provide unit information. 
+We are using a pints unit model. If units are missing or incorrect the `watch_study` script will report the
+respective issues.
 
 
-## Open issues
-- individual set
-- time course data
-- column data
 
-## Frequently asked questions (FAQ)
-# How to encode multiple substances which are given (e.g., caffeine and acetaminophen are given)?
+# Frequently asked questions (FAQ)
+## How to encode multiple substances which are given (e.g., caffeine and acetaminophen are given)?
 - encode as individual interventions of caffeine and acetaminophen, i.e., single interventions 
 with the respective doses. In the outputs a list of intervention is provided, i.e., in this example the interventions for caffeine and acetaminophen
 

diff --git a/README.md b/README.md
@@ -90,16 +90,30 @@ or to run migrations
 ```bash
 docker-compose run --rm backend python manage.py makemigrations
 ```
+## Authentication data dump
+```bash
+
+docker-compose -f $PKDB_DOCKER_COMPOSE_YAML run --rm backend ./manage.py  dumpdata auth  --indent 2 > ./backend/pkdb_app/fixtures/auth.json
+docker-compose -f $PKDB_DOCKER_COMPOSE_YAML run --rm backend ./manage.py  dumpdata users  --indent 2 > ./backend/pkdb_app/fixtures/users.json
+docker-compose -f $PKDB_DOCKER_COMPOSE_YAML run --rm backend ./manage.py  dumpdata rest_email_auth  --indent 2 > ./backend/pkdb_app/fixtures/rest_email_auth.json
+```
+
+## authentication data load
+```bash
 
+docker-compose -f $PKDB_DOCKER_COMPOSE_YAML run --rm backend ./manage.py  loaddata auth pkdb_app/fixtures/auth.json
+docker-compose -f $PKDB_DOCKER_COMPOSE_YAML run --rm backend ./manage.py  loaddata users pkdb_app/fixtures/users.json
+docker-compose -f $PKDB_DOCKER_COMPOSE_YAML run --rm backend ./manage.py  loaddata rest_email_auth pkdb_app/fixtures/rest_email_auth.json
+```
 ## REST services
 PKDB provides a REST API which allows simple interaction with the database.
 An overview over the REST endpoints is available from
 ```
-http://localhost:8000/api/v1/
+http://localhost:8123/api/v1/
 ```
 
-Elastic Search engine is running on `localhost:9200` but is also reachable via django views.
-General examples can be found here: https://django-elasticsearch-dsl-drf.readthedocs.io/en/0.16.2/basic_usage_examples.html
+The REST API supports elastisearch queries, with examples 
+available from https://django-elasticsearch-dsl-drf.readthedocs.io/en/0.16.2/basic_usage_examples.html
 
 Query examples:
 ```
@@ -120,6 +134,5 @@ http://localhost:8000/api/v1/substances_elastic/suggest/?search:name=cod
 
 ## Fill database
 Database is filled using `pkdb_data` repository at https://github.com/matthiaskoenig/pkdb_data
-
-## Read 
+
 &copy; 2017-2019 Jan Grzegorzewski & Matthias König.
diff --git a/backend/pkdb_app/_version.py b/backend/pkdb_app/_version.py
@@ -1,4 +1,4 @@
 """
 Definition of version string.
 """
-__version__ = "0.5.2"
+__version__ = "0.6.0"