Skip to content

Commit

Permalink
Merge pull request #216 from deeppavlov/dev
Browse files Browse the repository at this point in the history
Release v0.4.1
  • Loading branch information
dilyararimovna authored Nov 22, 2022
2 parents 4d5cb36 + c0588b9 commit 5c36b99
Show file tree
Hide file tree
Showing 24 changed files with 906 additions and 196 deletions.
28 changes: 14 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ This is a generative-based socialbot that uses [English DialoGPT model](https://

### Dream Russian

Russian version of DeepPavlov Dream Socialbot. This is a generative-based socialbot that uses [Russian DialoGPT model](https://huggingface.co/Grossmend/rudialogpt3_medium_based_on_gpt2) to generate most of the responses. It also contains intent catcher and responder components to cover special user requests.
Russian version of DeepPavlov Dream Socialbot. This is a generative-based socialbot that uses [Russian DialoGPT by DeepPavlov](https://huggingface.co/DeepPavlov/rudialogpt3_medium_based_on_gpt2_v2) to generate most of the responses. It also contains intent catcher and responder components to cover special user requests.
[Link to the distribution.](https://github.com/deeppavlov/dream/tree/main/assistant_dists/dream_russian)

# Quick Start
Expand Down Expand Up @@ -301,23 +301,23 @@ Dream Architecture is presented in the following image:

## Annotators

| Name | Requirements | Description |
|------------------------|--------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Badlisted words | 50 MiB RAM | detects obscene Russian words from the badlist |
| Entity detection | 3 GiB RAM | extracts entities and their types from utterances |
| Entity linking | 500 MiB RAM, ?? GiB GPU | finds Wikidata entity ids for the entities detected with Entity Detection |
| Intent catcher | 900 MiB RAM | classifies user utterances into a number of predefined intents which are trained on a set of phrases and regexps |
| NER | 1.7 GiB RAM, 4.9 Gib GPU | extracts person names, names of locations, organizations from uncased text using ruBert-based (pyTorch) model |
| Sentseg | 2.4 GiB RAM, 4.9 Gib GPU | recovers punctuation using ruBert-based (pyTorch) model and splits into sentences |
| Spacy Annotator | 250 MiB RAM | token-wise annotations by Spacy |
| Spelling preprocessing | 4.4 GiB RAM | Russian Levenshtein correction model |
| Wiki parser | 100 MiB RAM | extracts Wikidata triplets for the entities detected with Entity Linking |
| DialogRPT | 3.8 GiB RAM, 2 GiB GPU | DialogRPT model which is based on Russian DialoGPT (see https://huggingface.co/Grossmend/rudialogpt3_medium_based_on_gpt2) and fine-tuned on Russian Pikabu Comment sequences |
| Name | Requirements | Description |
|------------------------|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Badlisted words | 50 MiB RAM | detects obscene Russian words from the badlist |
| Entity detection | 3 GiB RAM | extracts entities and their types from utterances |
| Entity linking | 500 MiB RAM, ?? GiB GPU | finds Wikidata entity ids for the entities detected with Entity Detection |
| Intent catcher | 900 MiB RAM | classifies user utterances into a number of predefined intents which are trained on a set of phrases and regexps |
| NER | 1.7 GiB RAM, 4.9 Gib GPU | extracts person names, names of locations, organizations from uncased text using ruBert-based (pyTorch) model |
| Sentseg | 2.4 GiB RAM, 4.9 Gib GPU | recovers punctuation using ruBert-based (pyTorch) model and splits into sentences |
| Spacy Annotator | 250 MiB RAM | token-wise annotations by Spacy |
| Spelling preprocessing | 4.4 GiB RAM | Russian Levenshtein correction model |
| Wiki parser | 100 MiB RAM | extracts Wikidata triplets for the entities detected with Entity Linking |
| DialogRPT | 3.8 GiB RAM, 2 GiB GPU | DialogRPT model which is based on [Russian DialoGPT by DeepPavlov](https://huggingface.co/DeepPavlov/rudialogpt3_medium_based_on_gpt2_v2) and fine-tuned on Russian Pikabu Comment sequences |

## Skills & Services
| Name | Requirements | Description |
|------------------------|---------------------------|-------------------------------------------------------------------------------------------------------------------------------------|
| DialoGPT | 2.8 GiB RAM, 2 GiB GPU | Russian DialoGPT model https://huggingface.co/Grossmend/rudialogpt3_medium_based_on_gpt2 |
| DialoGPT | 2.8 GiB RAM, 2 GiB GPU | [Russian DialoGPT by DeepPavlov](https://huggingface.co/DeepPavlov/rudialogpt3_medium_based_on_gpt2_v2) |
| Dummy Skill | a part of agent container | a fallback skill with multiple non-toxic candidate responses and random Russian questions |
| Personal Info skill | 40 MiB RAM | queries and stores user's name, birthplace, and location |
| DFF Generative skill | 50 MiB RAM | **[New DFF version]** generative skill which uses DialoGPT service to generate 3 different hypotheses |
Expand Down
20 changes: 10 additions & 10 deletions README_ru.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ Deepy GoBot Base содержит аннотатор исправления оп

### Dream Russian
Русскоязычная версия DeepPavlov Dream Socialbot. Данная версия основана на нейросетевой генерации с использованием
[Russian DialoGPT модели](https://huggingface.co/Grossmend/rudialogpt3_medium_based_on_gpt2).
[Russian DialoGPT модели by DeepPavlov](https://huggingface.co/DeepPavlov/rudialogpt3_medium_based_on_gpt2_v2).
Дистрибутив также содержит компоненты для детектирования запросов пользователя и выдачи специальных ответов на них.
[Link to the distribution.](https://github.com/deeppavlov/dream/tree/main/assistant_dists/dream_russian)

Expand Down Expand Up @@ -199,15 +199,15 @@ docker-compose -f docker-compose.yml -f assistant_dists/dream/docker-compose.ove
| DialogRPT | 3.9 GiB RAM, 2.2 GiB GPU | Сервис оценки вероятности реплики понравиться пользователю (updown) на основе ранжирующей модели DialogRPT, которая дообучена на основе генеративной модели Russian DialoGPT на комментариев с сайта Пикабу. |

## Навыки и Сервисы (Skills & Services)
| Name | Requirements | Description |
|----------------------|---------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DialoGPT | 2.8 GiB RAM, 2.2 GiB GPU | Сервис генерации реплики по текстовому контексту диалога на основе предобученной модели Russian [DialoGPT](https://huggingface.co/Grossmend/rudialogpt3_medium_based_on_gpt2) |
| Dummy Skill | a part of agent container | Навык для генерации ответов-заглушек и выдачис лучайных вопросов из базы в каечстве linking-questions. |
| Personal Info Skill | 40 MiB RAM | Сценарный навык для извлечения и запоминания основной личной информации о пользователе. |
| DFF Generative Skill | 50 MiB RAM | **[New DFF version]** навык, выдающий 5 гипотез, выданных сервисом DialoGPT |
| DFF Intent Responder | 50 MiB RAM | **[New DFF version]** Сценарный навык на основе DFF для ответа на специальные намерения пользователя. |
| DFF Program Y Skill | 80 MiB RAM | **[New DFF version]** Сценарный навык на основе DFF для ответа на общие вопросы в виде AIML компоненты. |
| DFF Friendship Skill | 70 MiB RAM | **[New DFF version]** Сценарный навык на основе DFF приветственной части диалога с пользователем. |
| Name | Requirements | Description |
|----------------------|---------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DialoGPT | 2.8 GiB RAM, 2.2 GiB GPU | Сервис генерации реплики по текстовому контексту диалога на основе предобученной модели [Russian DialoGPT by DeepPavlov](https://huggingface.co/DeepPavlov/rudialogpt3_medium_based_on_gpt2_v2) |
| Dummy Skill | a part of agent container | Навык для генерации ответов-заглушек и выдачис лучайных вопросов из базы в каечстве linking-questions. |
| Personal Info Skill | 40 MiB RAM | Сценарный навык для извлечения и запоминания основной личной информации о пользователе. |
| DFF Generative Skill | 50 MiB RAM | **[New DFF version]** навык, выдающий 5 гипотез, выданных сервисом DialoGPT |
| DFF Intent Responder | 50 MiB RAM | **[New DFF version]** Сценарный навык на основе DFF для ответа на специальные намерения пользователя. |
| DFF Program Y Skill | 80 MiB RAM | **[New DFF version]** Сценарный навык на основе DFF для ответа на общие вопросы в виде AIML компоненты. |
| DFF Friendship Skill | 70 MiB RAM | **[New DFF version]** Сценарный навык на основе DFF приветственной части диалога с пользователем. |


# Публикации
Expand Down
5 changes: 3 additions & 2 deletions assistant_dists/dream_russian/docker-compose.override.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ services:
dff-friendship-skill:8086, entity-detection:8103, dialogpt:8091,
dff-template-skill:8120, spacy-annotator:8125, dialogrpt:8122, toxic-classification:8126"
WAIT_HOSTS_TIMEOUT: ${WAIT_TIMEOUT:-480}
LANGUAGE: RU

dff-program-y-skill:
env_file: [.env]
Expand Down Expand Up @@ -317,7 +318,7 @@ services:
context: ./services/dialogpt_RU/
args:
SERVICE_PORT: 8091
PRETRAINED_MODEL_NAME_OR_PATH: "Grossmend/rudialogpt3_medium_based_on_gpt2"
PRETRAINED_MODEL_NAME_OR_PATH: DeepPavlov/rudialogpt3_medium_based_on_gpt2_v2
LANGUAGE: RU
command: flask run -h 0.0.0.0 -p 8091
environment:
Expand Down Expand Up @@ -354,7 +355,7 @@ services:
args:
SERVICE_PORT: 8122
PRETRAINED_MODEL_FNAME: dialogrpt_ru_ckpt_v0.pth
TOKENIZER_NAME_OR_PATH: "Grossmend/rudialogpt3_medium_based_on_gpt2"
TOKENIZER_NAME_OR_PATH: DeepPavlov/rudialogpt3_medium_based_on_gpt2_v2
command: flask run -h 0.0.0.0 -p 8122
environment:
- CUDA_VISIBLE_DEVICES=0
Expand Down
2 changes: 1 addition & 1 deletion assistant_dists/dream_russian/pipeline_conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,7 @@
"dff_generative_skill": {
"connector": {
"protocol": "http",
"timeout": 2,
"timeout": 4,
"url": "http://dff-generative-skill:8092/respond"
},
"dialog_formatter": "state_formatters.dp_formatters:dff_generative_skill_formatter",
Expand Down
4 changes: 2 additions & 2 deletions assistant_dists/dream_russian/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,9 @@ services:
- CUDA_VISIBLE_DEVICES=7
dialogpt:
environment:
- CUDA_VISIBLE_DEVICES=7
- CUDA_VISIBLE_DEVICES=6
dialogrpt:
environment:
- CUDA_VISIBLE_DEVICES=7
- CUDA_VISIBLE_DEVICES=6
dff-template-skill:
version: '3.7'
4 changes: 0 additions & 4 deletions common/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -764,10 +764,8 @@ def get_topics(annotated_utterance, probs=False, default_probs=None, default_lab
answer_probs, answer_labels = default_probs, default_labels

if probs:
logger.info(f"Result in get_topics: {answer_probs}")
return answer_probs
else:
logger.info(f"Result in get_topics: {answer_labels}")
return answer_labels


Expand Down Expand Up @@ -862,10 +860,8 @@ def get_intents(annotated_utterance, probs=False, default_probs=None, default_la
answer_probs, answer_labels = default_probs, default_labels

if probs:
logger.info(f"Result in get_intents: {answer_probs}")
return answer_probs
else:
logger.info(f"Result in get_intents: {answer_labels}")
return answer_labels


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@
"I didn't get it. Sorry",
]
LANGUAGE = getenv("LANGUAGE", "EN")
GREETING_FIRST = int(getenv("GREETING_FIRST", 1))


@app.route("/respond", methods=["POST"])
Expand Down Expand Up @@ -366,7 +367,7 @@ def select_response(candidates, scores, confidences, is_toxics, dialog, all_prev
best_human_attributes = best_candidate.get("human_attributes", {})
best_bot_attributes = best_candidate.get("bot_attributes", {})

if len(dialog["bot_utterances"]) == 0 and greeting_spec[LANGUAGE] not in best_text:
if len(dialog["bot_utterances"]) == 0 and greeting_spec[LANGUAGE] not in best_text and GREETING_FIRST:
# add greeting to the first bot uttr, if it's not already included
best_text = f"{HI_THIS_IS_DREAM[LANGUAGE]} {best_text}"

Expand Down
3 changes: 3 additions & 0 deletions services/dialogpt_RU/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ RUN pip install -r /src/requirements.txt

COPY . /src

RUN python -c "from transformers import AutoTokenizer; AutoTokenizer.from_pretrained('${PRETRAINED_MODEL_NAME_OR_PATH}');"
RUN python -c "from transformers import AutoModelForCausalLM; AutoModelForCausalLM.from_pretrained('${PRETRAINED_MODEL_NAME_OR_PATH}');"

HEALTHCHECK --interval=5s --timeout=90s --retries=3 CMD curl --fail 127.0.0.1:${SERVICE_PORT}/healthcheck || exit 1


Expand Down
2 changes: 1 addition & 1 deletion services/dialogpt_RU/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
transformers==4.0.1
transformers==4.11.0
sentencepiece==0.1.94
flask==1.1.1
gunicorn==19.9.0
Expand Down
Loading

0 comments on commit 5c36b99

Please sign in to comment.