Merge pull request #475 from deeppavlov/dev

Release v1.5.1
deeppavlov · May 29, 2023 · 528e063 · 528e063
2 parents 9b2708f + 633cd43
commit 528e063
Show file tree

Hide file tree

Showing 211 changed files with 786 additions and 314 deletions.
diff --git a/.env b/.env
@@ -20,15 +20,15 @@ TEXT_QA_URL=http://text-qa:8078/model
 BADLIST_ANNOTATOR_URL=http://badlisted-words:8018/badlisted_words_batch
 COMET_ATOMIC_SERVICE_URL=http://comet-atomic:8053/comet
 COMET_CONCEPTNET_SERVICE_URL=http://comet-conceptnet:8065/comet
-MASKED_LM_SERVICE_URL=http://masked-lm:8088/respond
+MASKED_LM_SERVICE_URL=http://masked-lm:8102/respond
 DP_WIKIDATA_URL=http://wiki-parser:8077/model
 DP_ENTITY_LINKING_URL=http://entity-linking:8075/model
 KNOWLEDGE_GROUNDING_SERVICE_URL=http://knowledge-grounding:8083/respond
 WIKIDATA_DIALOGUE_SERVICE_URL=http://wikidata-dial-service:8092/model
 NEWS_API_ANNOTATOR_URL=http://news-api-annotator:8112/respond
 WIKI_FACTS_URL=http://wiki-facts:8116/respond
 FACT_RANDOM_SERVICE_URL=http://fact-random:8119/respond
-INFILLING_SERVICE_URL=http://infilling:8122/respond
+INFILLING_SERVICE_URL=http://infilling:8106/respond
 DIALOGPT_CONTINUE_SERVICE_URL=http://dialogpt:8125/continue
 PROMPT_STORYGPT_SERVICE_URL=http://prompt-storygpt:8127/respond
 STORYGPT_SERVICE_URL=http://storygpt:8126/respond

diff --git a/MODELS.md b/MODELS.md
@@ -11,3 +11,4 @@ Here you may find a list of models that currently available for use in Generativ
 | Open-Assistant Pythia 12B | transformers-lm-oasst12b | [link](https://huggingface.co/OpenAssistant/pythia-12b-sft-v8-7k-steps) | yes                      | 12B                       | 26GB (half-precision)     | 5,120 tokens                   | An open-source English-only instruction-based large language model which is NOT good at answering math and coding questions. NB: free of charge. This model is up and running on our servers and can be used for free.                                                                                                                       |
 | GPT-4                     | openai-api-gpt4          | [link](https://platform.openai.com/docs/models/gpt-4)                   | no (paid access via API) | supposedly, 175B          | - (cannot be run locally) | 8,192 tokens                   | A multilingual instruction-based large language model which is capable of code generation and other complex tasks. More capable than any GPT-3.5 model, able to do more complex tasks, and optimized for chat. NB: paid. You must provide your OpenAI API key to use the model. Your OpenAI account will be charged according to your usage. |
 | GPT-4 32K                 | openai-api-gpt4-32k      | [link](https://platform.openai.com/docs/models/gpt-4)                   | no (paid access via API) | supposedly, 175B          | - (cannot be run locally) | 32,768 tokens                  | A multilingual instruction-based large language model which is capable of code generation and other complex tasks. 	Same capabilities as the base gpt-4 mode but with 4x the context length. NB: paid. You must provide your OpenAI API key to use the model. Your OpenAI account will be charged according to your usage.                   |
+| GPT-JT 6B                 | transformers-lm-gptjt    | [link](https://huggingface.co/togethercomputer/GPT-JT-6B-v1)            | yes                      | 6B                        | 26GB                      | 2,048 tokens                   | An open-source English-only large language model which was fine-tuned for instruction following but is NOT capable of code generation. NB: free of charge. This model is up and running on our servers and can be used for free.                                                                                                             |
diff --git a/README.md b/README.md
@@ -260,22 +260,24 @@ Dream Architecture is presented in the following image:
 | Wiki Facts                  | 1.7 GB RAM             | model that extracts related facts from Wikipedia and WikiHow pages                                                                                                                                                             |
 
 ## Services
-| Name                   | Requirements            | Description                                                                                                                                                                                                                                       |
-|------------------------|-------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| DialoGPT               | 1.2 GB RAM, 2.1 GB GPU  | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (for example, `microsoft/DialoGPT-small` with 0.2-0.5 sec on GPU)                                          |
-| DialoGPT Persona-based | 1.2 GB RAM, 2.1 GB GPU  | generative service based on Transformers generative model, the model was pre-trained on the PersonaChat dataset to generate a response conditioned on a several sentences of the socialbot's persona                                              |
-| Image Captioning       | 4 GB RAM, 5.4 GB GPU    | creates text representation of a received image                                                                                                                                                                                                   |
-| Infilling              | 1  GB RAM, 1.2 GB GPU   | (turned off but the code is available) generative service based on Infilling model, for the given utterance returns utterance where `_` from original text is replaced with generated tokens                                                      |
-| Knowledge Grounding    | 2 GB RAM, 2.1 GB GPU    | generative service based on BlenderBot architecture providing a response to the context taking into account an additional text paragraph                                                                                                          |
-| Masked LM              | 1.1 GB RAM, 1 GB GPU    | (turned off but the code is available)                                                                                                                                                                                                            |
-| Seq2seq Persona-based  | 1.5 GB RAM, 1.5 GB GPU  | generative service based on Transformers seq2seq model, the model was pre-trained on the PersonaChat dataset to generate a response conditioned on a several sentences of the socialbot's persona                                                 |
-| Sentence Ranker        | 1.2 GB RAM, 2.1 GB GPU  | ranking model given as `PRETRAINED_MODEL_NAME_OR_PATH` which for a pair os sentences returns a float score of correspondence                                                                                                                      |
-| StoryGPT               | 2.6 GB RAM, 2.15 GB GPU | generative service based on fine-tuned GPT-2, for the given set of keywords returns a short story using the keywords                                                                                                                              |
-| GPT-3.5                | 100 MB RAM              | generative service based on OpenAI API service, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, `text-davinci-003` is used.                                                          |
-| ChatGPT                | 100 MB RAM              | generative service based on OpenAI API service, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, `gpt-3.5-turbo` is used.                                                             |
-| Prompt StoryGPT        | 3 GB RAM, 4 GB GPU      | generative service based on fine-tuned GPT-2, for the given topic represented by one noun returns short story on a given topic                                                                                                                    |
-| GPT-J 6B               | 1.5 GB RAM, 24.2 GB GPU | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, [GPT-J model](https://huggingface.co/EleutherAI/gpt-j-6B) is used.        |
-| BLOOMZ 7B              | 2.5 GB RAM, 29 GB GPU   | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, [BLOOMZ-7b1 model](https://huggingface.co/bigscience/bloomz-7b1) is used. |
+| Name                   | Requirements            | Description                                                                                                                                                                                                                                           |
+|------------------------|-------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| DialoGPT               | 1.2 GB RAM, 2.1 GB GPU  | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (for example, `microsoft/DialoGPT-small` with 0.2-0.5 sec on GPU)                                              |
+| DialoGPT Persona-based | 1.2 GB RAM, 2.1 GB GPU  | generative service based on Transformers generative model, the model was pre-trained on the PersonaChat dataset to generate a response conditioned on a several sentences of the socialbot's persona                                                  |
+| Image Captioning       | 4 GB RAM, 5.4 GB GPU    | creates text representation of a received image                                                                                                                                                                                                       |
+| Infilling              | 1  GB RAM, 1.2 GB GPU   | (turned off but the code is available) generative service based on Infilling model, for the given utterance returns utterance where `_` from original text is replaced with generated tokens                                                          |
+| Knowledge Grounding    | 2 GB RAM, 2.1 GB GPU    | generative service based on BlenderBot architecture providing a response to the context taking into account an additional text paragraph                                                                                                              |
+| Masked LM              | 1.1 GB RAM, 1 GB GPU    | (turned off but the code is available)                                                                                                                                                                                                                |
+| Seq2seq Persona-based  | 1.5 GB RAM, 1.5 GB GPU  | generative service based on Transformers seq2seq model, the model was pre-trained on the PersonaChat dataset to generate a response conditioned on a several sentences of the socialbot's persona                                                     |
+| Sentence Ranker        | 1.2 GB RAM, 2.1 GB GPU  | ranking model given as `PRETRAINED_MODEL_NAME_OR_PATH` which for a pair os sentences returns a float score of correspondence                                                                                                                          |
+| StoryGPT               | 2.6 GB RAM, 2.15 GB GPU | generative service based on fine-tuned GPT-2, for the given set of keywords returns a short story using the keywords                                                                                                                                  |
+| GPT-3.5                | 100 MB RAM              | generative service based on OpenAI API service, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, `text-davinci-003` is used.                                                              |
+| ChatGPT                | 100 MB RAM              | generative service based on OpenAI API service, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, `gpt-3.5-turbo` is used.                                                                 |
+| Prompt StoryGPT        | 3 GB RAM, 4 GB GPU      | generative service based on fine-tuned GPT-2, for the given topic represented by one noun returns short story on a given topic                                                                                                                        |
+| GPT-J 6B               | 1.5 GB RAM, 24.2 GB GPU | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, [GPT-J model](https://huggingface.co/EleutherAI/gpt-j-6B) is used.            |
+| BLOOMZ 7B              | 2.5 GB RAM, 29 GB GPU   | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, [BLOOMZ-7b1 model](https://huggingface.co/bigscience/bloomz-7b1) is used.     |
+| GPT-JT 6B              | 2.5 GB RAM, 25.1 GB GPU | generative service based on Transformers generative model, the model is set in docker compose argument `PRETRAINED_MODEL_NAME_OR_PATH` (in particular, in this service, [GPT-JT model](https://huggingface.co/togethercomputer/GPT-JT-6B-v1) is used. |
+
 
 ## Skills
 | Name                               | Requirements              | Description                                                                                                                                                                                                                                                   |

diff --git a/annotators/fact_retrieval_rus/service_configs/fact-retrieval-ru/service.yml b/annotators/fact_retrieval_rus/service_configs/fact-retrieval-ru/service.yml
@@ -9,13 +9,13 @@ compose:
       CONFIG: fact_retrieval_rus.json
       COMMIT: c8264bf82eaa3ed138395ab68f71d47a4175f2fc
       TOP_N: 20
-      SERVICE_PORT: 8130
+      SERVICE_PORT: 8110
       SRC_DIR: annotators/fact_retrieval_rus
       CUDA_VISIBLE_DEVICES: '0'
       FLASK_APP: server
     context: ./
     dockerfile: annotators/fact_retrieval_rus/Dockerfile
-  command: flask run -h 0.0.0.0 -p 8130
+  command: flask run -h 0.0.0.0 -p 8110
   environment:
   - CUDA_VISIBLE_DEVICES=0
   - FLASK_APP=server
@@ -29,5 +29,5 @@ compose:
   - ./annotators/fact_retrieval_rus:/src
   - ~/.deeppavlov:/root/.deeppavlov
   ports:
-  - 8130:8130
+  - 8110:8110
 proxy: null
diff --git a/annotators/spacy_annotator/requirements.txt b/annotators/spacy_annotator/requirements.txt
@@ -7,4 +7,6 @@ spacy==3.2.0
 typer==0.4.1
 click<=8.0.4
 jinja2<=3.0.3
-Werkzeug<=2.0.3
+Werkzeug<=2.0.3
+typing-inspect==0.8.0
+typing_extensions==4.5.0
diff --git a/annotators/toxic_classification_ru/service_configs/toxic-classification-ru/service.yml b/annotators/toxic_classification_ru/service_configs/toxic-classification-ru/service.yml
@@ -8,12 +8,12 @@ compose:
   build:
     context: ./annotators/toxic_classification_ru/
     args:
-      SERVICE_PORT: 8126
+      SERVICE_PORT: 8118
       PRETRAINED_MODEL_NAME_OR_PATH: s-nlp/russian_toxicity_classifier
       LANGUAGE: RU
       CUDA_VISIBLE_DEVICES: '0'
       FLASK_APP: server
-  command: flask run -h 0.0.0.0 -p 8126
+  command: flask run -h 0.0.0.0 -p 8118
   environment:
   - CUDA_VISIBLE_DEVICES=0
   - FLASK_APP=server
@@ -27,5 +27,5 @@ compose:
   - ./annotators/toxic_classification_ru:/src
   - ~/.deeppavlov/cache:/root/.cache
   ports:
-  - 8126:8126
+  - 8118:8118
 proxy: null
diff --git a/assistant_dists/dream/dev.yml b/assistant_dists/dream/dev.yml
@@ -446,14 +446,14 @@ services:
       - "./services/infilling:/src"
       - "~/.deeppavlov:/root/.deeppavlov"
     ports:
-      - 8139:8139
+      - 8106:8106
   masked-lm:
     volumes:
       - "./services/masked_lm:/src"
       - "./common:/src/common"
       - "~/.deeppavlov/cache:/root/.cache"
     ports:
-      - 8141:8141
+      - 8102:8102
   dff-template-skill:
     volumes:
       - "./skills/dff_template_skill:/src"

diff --git a/assistant_dists/dream/docker-compose.override.yml b/assistant_dists/dream/docker-compose.override.yml
@@ -1355,9 +1355,9 @@ services:
     build:
       context: ./services/infilling/
       args:
-        SERVICE_PORT: 8139
+        SERVICE_PORT: 8106
         SERVICE_NAME: infilling
-    command: flask run -h 0.0.0.0 -p 8139
+    command: flask run -h 0.0.0.0 -p 8106
     environment:
       - CUDA_VISIBLE_DEVICES=0
       - FLASK_APP=server
@@ -1373,10 +1373,10 @@ services:
     build:
       context: ./services/masked_lm/
       args:
-        SERVICE_PORT: 8141
+        SERVICE_PORT: 8102
         SERVICE_NAME: masked_lm
         PRETRAINED_MODEL_NAME_OR_PATH: "bert-base-uncased"
-    command: flask run -h 0.0.0.0 -p 8141
+    command: flask run -h 0.0.0.0 -p 8102
     environment:
       - CUDA_VISIBLE_DEVICES=0
       - FLASK_APP=server

diff --git a/assistant_dists/dream_multimodal/dev.yml b/assistant_dists/dream_multimodal/dev.yml
@@ -75,5 +75,5 @@ services:
       - "./skills/dff_image_skill:/src"
       - "./common:/src/common"
     ports:
-      - 8188:8188
+      - 8124:8124
 version: "3.7"
diff --git a/assistant_dists/dream_multimodal/docker-compose.override.yml b/assistant_dists/dream_multimodal/docker-compose.override.yml
@@ -4,7 +4,7 @@ services:
     environment:
       WAIT_HOSTS: "dff-program-y-skill:8008, sentseg:8011, convers-evaluation-selector:8009, 
           dff-intent-responder-skill:8012, intent-catcher:8014, badlisted-words:8018,
-          spelling-preprocessing:8074, dialogpt:8125, sentence-ranker:8128, image-captioning:8123, dff-image-skill:8188"
+          spelling-preprocessing:8074, dialogpt:8125, sentence-ranker:8128, image-captioning:8123, dff-image-skill:8124"
       WAIT_HOSTS_TIMEOUT: ${WAIT_TIMEOUT:-1200}
       HIGH_PRIORITY_INTENTS: 1
       RESTRICTION_FOR_SENSITIVE_CASE: 1
@@ -216,12 +216,12 @@ services:
     env_file: [.env]
     build:
       args:
-        SERVICE_PORT: 8188
+        SERVICE_PORT: 8124
         SERVICE_NAME: dff_image_skill
         LANGUAGE: EN
       context: .
       dockerfile: ./skills/dff_image_skill/Dockerfile
-    command: gunicorn --workers=1 server:app -b 0.0.0.0:8188 --reload
+    command: gunicorn --workers=1 server:app -b 0.0.0.0:8124 --reload
     deploy:
       resources:
         limits:

diff --git a/assistant_dists/dream_multimodal/pipeline_conf.json b/assistant_dists/dream_multimodal/pipeline_conf.json
@@ -318,7 +318,7 @@
                 "connector": {
                     "protocol": "http",
                     "timeout": 2.0,
-                    "url": "http://dff-image-skill:8188/respond"
+                    "url": "http://dff-image-skill:8124/respond"
                 },
                 "dialog_formatter": "state_formatters.dp_formatters:dff_image_skill_formatter",
                 "response_formatter": "state_formatters.dp_formatters:skill_with_attributes_formatter_service",

diff --git a/assistant_dists/dream_persona_prompted/README.md b/assistant_dists/dream_persona_prompted/README.md
@@ -76,10 +76,10 @@ If one wants to create a new prompted distribution (distribution containing prom
    to an unused one.
 3. Choose the generative service to be used. For that one needs to:
    1. in `dream/assistant_dists/dream_custom_prompted/` folder in files `docker-compose.override.yml`, `dev.yml` 
-   replace `transformers-lm-gptj` container description to a new one. 
+   replace `transformers-lm-gptjt` container description to a new one. 
    In particular, one may replace in `PRETRAINED_MODEL_NAME_OR_PATH` parameter 
-   a utilized Language Model (LM) `GPT-J` with another one from `Transformers` library. 
-   Please change a port (`8130` for `transformers-lm-gptj`) to unused ones. 
+   a utilized Language Model (LM) `GPT-JT` with another one from `Transformers` library. 
+   Please change a port (`8161` for `transformers-lm-gptjt`) to unused ones. 
    2. in all prompted skills' containers descriptions change `GENERATIVE_SERVICE_URL` to your generative model. 
    Take into account that the service name is constructed as `http://<container-name>:<port>/<endpoint>`. 
 4. For each prompted skill, one needs to create an input state formatter. To do that, one needs to:
@@ -99,7 +99,7 @@ If one wants to create a new prompted distribution (distribution containing prom
                 "connector": {
                     "protocol": "http",
                     "timeout": 4.5,
-                    "url": "http://dff-dream-persona-gpt-j-prompted-skill:8134/respond"
+                    "url": "http://dff-dream-persona-gpt-jt-prompted-skill:8134/respond"
                 },
                 "dialog_formatter": {
                     "name": "state_formatters.dp_formatters:dff_prompted_skill_formatter",

diff --git a/assistant_dists/dream_persona_prompted/cpu.yml b/assistant_dists/dream_persona_prompted/cpu.yml
@@ -8,7 +8,7 @@ services:
     environment:
       DEVICE: cpu
       CUDA_VISIBLE_DEVICES: ""
-  transformers-lm-gptj:
+  transformers-lm-gptjt:
     environment:
       DEVICE: cpu
       CUDA_VISIBLE_DEVICES: ""