Skip to content

Commit

Permalink
Merge pull request #14 from deeppavlovteam/dev
Browse files Browse the repository at this point in the history
Release v0.3.0
  • Loading branch information
dilyararimovna authored Sep 15, 2022
2 parents 2b50d6d + 56c2c5b commit 3e3dd71
Show file tree
Hide file tree
Showing 112 changed files with 2,510 additions and 1,050 deletions.
1 change: 1 addition & 0 deletions .env
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,4 @@ WIKI_FACTS_URL=http://wiki-facts:8116/respond
FACT_RANDOM_SERVICE_URL=http://fact-random:8119/respond
INFILLING_SERVICE_URL=http://infilling:8122/respond
DIALOGPT_SERVICE_URL=http://dialogpt:8091/respond
DIALOGPT_CONTINUE_SERVICE_URL=http://dialogpt:8125/continue
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@

#### Create a new issue

First, make sure the issue doesn't exist [in the list](https://github.com/deepmipt/dream/issues) yet. If a related issue doesn't exist, you can [open a new one](https://github.com/deepmipt/dream/issues/new).
First, make sure the issue doesn't exist [in the list](https://github.com/deeppavlovteam/dream/issues) yet. If a related issue doesn't exist, you can [open a new one](https://github.com/deeppavlovteam/dream/issues/new).


#### Solve an issue

Scan through our [existing issues](https://github.com/deepmipt/dream/issues) to find one that interests you. You can narrow down the search using `labels` as filters. If you find an issue to work on, you are welcome to open a PR with a fix.
Scan through our [existing issues](https://github.com/deeppavlovteam/dream/issues) to find one that interests you. You can narrow down the search using `labels` as filters. If you find an issue to work on, you are welcome to open a PR with a fix.


#### Fork and make changes
Expand Down
162 changes: 84 additions & 78 deletions README.md

Large diffs are not rendered by default.

36 changes: 18 additions & 18 deletions README_ru.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,20 +55,20 @@ Deepy GoBot Base содержит аннотатор исправления оп
Мини-версия DeepPavlov Dream Socialbot.
Данная версия основана на нейросетевой генерации с использованием [English DialoGPT модели](https://huggingface.co/microsoft/DialoGPT-medium).
Дистрибутив также содержит компоненты для детектирования запросов пользователя и выдачи специальных ответов на них.
[Link to the distribution.](https://github.com/deepmipt/dream/tree/main/assistant_dists/dream_mini)
[Link to the distribution.](https://github.com/deeppavlovteam/dream/tree/main/assistant_dists/dream_mini)

### Dream Russian
Русскоязычная версия DeepPavlov Dream Socialbot. Данная версия основана на нейросетевой генерации с использованием
[Russian DialoGPT модели](https://huggingface.co/Grossmend/rudialogpt3_medium_based_on_gpt2).
Дистрибутив также содержит компоненты для детектирования запросов пользователя и выдачи специальных ответов на них.
[Link to the distribution.](https://github.com/deepmipt/dream/tree/main/assistant_dists/dream_russian)
[Link to the distribution.](https://github.com/deeppavlovteam/dream/tree/main/assistant_dists/dream_russian)

# Quick Start

### Склонируйте репозиторий

```
git clone https://github.com/deepmipt/dream.git
git clone https://github.com/deeppavlovteam/dream.git
```


Expand Down Expand Up @@ -184,27 +184,27 @@ docker-compose -f docker-compose.yml -f assistant_dists/dream/docker-compose.ove

## Аннотаторы (Annotators)

| Name | Requirements | Description |
|------------------------|--------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Badlisted words | 50 MiB RAM | Аннотатор детекции нецензурных слов основан на лемматизации с помощью pymorphy2 и проверки по словарю нецензурных слов. |
| Entity Detection | 3 GiB RAM | Аннотатор извлечения не именованных сущностей и определения их типа для русского языка нижнего регистра на основе на основе нейросетевой модели ruBERT (PyTorch). |
| Entity Linking | 300 MiB RAM | Аннотатор связывания (нахождения Wikidata id) сущностей, извлеченных с помощью Entity detection, на основе дистиллированной модели ruBERT. |
| Intent Catcher | 1.8 MiB RAM, 4.9 Gib GPU | Аннотатор детектирования специальных намерений пользователя на основе многоязычной модели Universal Sentence Encoding. |
| NER | 1.8 GiB RAM, 4.9 Gib GPU | Аннотатор извлечения именованных сущностей для русского языка нижнего регистра на основе нейросетевой модели Conversational ruBERT (PyTorch). |
| Sentseg | 2.4 GiB RAM, 4.9 Gib GPU | Аннотатор восстановления пунктуации для русского языка нижнего регистра на основе нейросетевой модели ruBERT (PyTorch). Модель обучена на русскоязычных субтитрах. |
| Spacy Annotator | 250 MiB RAM | Аннотатор токенизации и аннотирования токенов на основе библиотеки spacy и входящей в нее модели “ru_core_news_sm”. |
| Spelling Preprocessing | 4.4 GiB RAM | Аннотатор исправления опечаток и грамматических ошибок на основе модели расстояния Левенштейна. Используется предобученная модель из библиотеки DeepPavlov. |
| Toxic Classification | 1.9 GiB RAM, 1.2 Gib GPU | Классификатор токсичности для фильтрации реплик пользователя [от Сколтеха](https://huggingface.co/SkolkovoInstitute/russian_toxicity_classifier) |
| Wiki Parser | 100 MiB RAM | Аннотатор извлечения триплетов из Wikidata для сущностей, извлеченных с помощью Entity detection. |
| DialogRPT | 3.9 GiB RAM, 2 GiB GPU | Сервис оценки вероятности реплики понравиться пользователю (updown) на основе ранжирующей модели DialogRPT, которая дообучена на основе генеративной модели Russian DialoGPT на комментариев с сайта Пикабу. |
| Name | Requirements | Description |
|------------------------|--------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Badlisted words | 50 MiB RAM | Аннотатор детекции нецензурных слов основан на лемматизации с помощью pymorphy2 и проверки по словарю нецензурных слов. |
| Entity Detection | 3 GiB RAM | Аннотатор извлечения не именованных сущностей и определения их типа для русского языка нижнего регистра на основе на основе нейросетевой модели ruBERT (PyTorch). |
| Entity Linking | 300 MiB RAM | Аннотатор связывания (нахождения Wikidata id) сущностей, извлеченных с помощью Entity detection, на основе дистиллированной модели ruBERT. |
| Intent Catcher | 1.8 GiB RAM, 5 Gib GPU | Аннотатор детектирования специальных намерений пользователя на основе многоязычной модели Universal Sentence Encoding. |
| NER | 1.8 GiB RAM, 5 Gib GPU | Аннотатор извлечения именованных сущностей для русского языка нижнего регистра на основе нейросетевой модели Conversational ruBERT (PyTorch). |
| Sentseg | 2.4 GiB RAM, 5 Gib GPU | Аннотатор восстановления пунктуации для русского языка нижнего регистра на основе нейросетевой модели ruBERT (PyTorch). Модель обучена на русскоязычных субтитрах. |
| Spacy Annotator | 250 MiB RAM | Аннотатор токенизации и аннотирования токенов на основе библиотеки spacy и входящей в нее модели “ru_core_news_sm”. |
| Spelling Preprocessing | 4.5 GiB RAM | Аннотатор исправления опечаток и грамматических ошибок на основе модели расстояния Левенштейна. Используется предобученная модель из библиотеки DeepPavlov. |
| Toxic Classification | 1.9 GiB RAM, 1.3 Gib GPU | Классификатор токсичности для фильтрации реплик пользователя [от Сколтеха](https://huggingface.co/SkolkovoInstitute/russian_toxicity_classifier) |
| Wiki Parser | 100 MiB RAM | Аннотатор извлечения триплетов из Wikidata для сущностей, извлеченных с помощью Entity detection. |
| DialogRPT | 3.9 GiB RAM, 2.2 GiB GPU | Сервис оценки вероятности реплики понравиться пользователю (updown) на основе ранжирующей модели DialogRPT, которая дообучена на основе генеративной модели Russian DialoGPT на комментариев с сайта Пикабу. |

## Навыки и Сервисы (Skills & Services)
| Name | Requirements | Description |
|----------------------|---------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DialoGPT | 2.8 GiB RAM, 2 GiB GPU | Сервис генерации реплики по текстовому контексту диалога на основе предобученной модели Russian [DialoGPT](https://huggingface.co/Grossmend/rudialogpt3_medium_based_on_gpt2) |
| DialoGPT | 2.8 GiB RAM, 2.2 GiB GPU | Сервис генерации реплики по текстовому контексту диалога на основе предобученной модели Russian [DialoGPT](https://huggingface.co/Grossmend/rudialogpt3_medium_based_on_gpt2) |
| Dummy Skill | a part of agent container | Навык для генерации ответов-заглушек и выдачис лучайных вопросов из базы в каечстве linking-questions. |
| Personal Info Skill | 40 MiB RAM | Сценарный навык для извлечения и запоминания основной личной информации о пользователе. |
| DFF Generative Skill | 50 MiB RAM | **[New DFF version]** навык, выдающий 5 гипотез, выданных сервисом DialoGPT |
| DFF Generative Skill | 50 MiB RAM | **[New DFF version]** навык, выдающий 5 гипотез, выданных сервисом DialoGPT |
| DFF Intent Responder | 50 MiB RAM | **[New DFF version]** Сценарный навык на основе DFF для ответа на специальные намерения пользователя. |
| DFF Program Y Skill | 80 MiB RAM | **[New DFF version]** Сценарный навык на основе DFF для ответа на общие вопросы в виде AIML компоненты. |
| DFF Friendship Skill | 70 MiB RAM | **[New DFF version]** Сценарный навык на основе DFF приветственной части диалога с пользователем. |
Expand Down
2 changes: 1 addition & 1 deletion annotators/ConversationEvaluator/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
FROM deeppavlov/base-gpu:0.12.0
RUN pip install git+https://github.com/deeppavlovteam/[email protected]

ARG CONFIG
ARG DATA_URL=http://files.deeppavlov.ai/alexaprize_data/cobot_conveval2.tar.gz
Expand All @@ -19,5 +20,4 @@ RUN pip install -r requirements.txt
COPY annotators/ConversationEvaluator/ ./
COPY common/ common/

RUN python -m deeppavlov install $CONFIG
CMD gunicorn --workers=1 --bind 0.0.0.0:8004 --timeout=300 server:app
2 changes: 2 additions & 0 deletions annotators/ConversationEvaluator/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,5 @@ cachetools==4.0.0
blinker==1.4
jinja2<=3.0.3
Werkzeug<=2.0.3
git+https://github.com/deeppavlovteam/bert.git@feat/multi_gpu
tensorflow==1.15.5
2 changes: 2 additions & 0 deletions annotators/DeepPavlovEmotionClassification/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,5 @@ sentry-sdk==0.13.0
gunicorn==19.9.0
jinja2<=3.0.3
Werkzeug<=2.0.3
git+https://github.com/deeppavlovteam/bert.git@feat/multi_gpu
tensorflow==1.15.5
2 changes: 2 additions & 0 deletions annotators/DeepPavlovFactoidClassification/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ requests==2.23.0
gunicorn==19.9.0
jinja2<=3.0.3
Werkzeug<=2.0.3
git+https://github.com/deeppavlovteam/bert.git@feat/multi_gpu
tensorflow==1.15.5
2 changes: 2 additions & 0 deletions annotators/DeepPavlovSentimentClassification/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,5 @@ sentry-sdk==0.13.0
gunicorn==19.9.0
jinja2<=3.0.3
Werkzeug<=2.0.3
git+https://github.com/deeppavlovteam/bert.git@feat/multi_gpu
tensorflow==1.15.5
2 changes: 2 additions & 0 deletions annotators/DeepPavlovToxicClassification/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ requests==2.23.0
gunicorn==19.9.0
jinja2<=3.0.3
Werkzeug<=2.0.3
git+https://github.com/deeppavlovteam/bert.git@feat/multi_gpu
tensorflow==1.15.5
2 changes: 1 addition & 1 deletion annotators/IntentCatcherTransformers/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
FROM deeppavlov/base-gpu:0.17.2
RUN pip install git+https://github.com/deeppavlovteam/[email protected]

RUN apt-key del 7fa2af80 && \
rm -f /etc/apt/sources.list.d/cuda*.list && \
Expand All @@ -23,7 +24,6 @@ COPY ./common/ ./common/
COPY annotators/IntentCatcherTransformers/ /src
WORKDIR /src

RUN python -m deeppavlov install ${CONFIG_NAME}
RUN python -m deeppavlov download ${CONFIG_NAME}
RUN python train_model_if_not_exist.py

Expand Down
6 changes: 5 additions & 1 deletion annotators/IntentCatcherTransformers/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,8 @@ pandas==0.25.3
huggingface-hub==0.0.8
datasets==1.11.0
scikit-learn==0.21.2
xeger==0.3.5
xeger==0.3.5
transformers==4.6.0
torch==1.6.0
torchvision==0.7.0
cryptography==2.8
4 changes: 2 additions & 2 deletions annotators/NER_deeppavlov/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
FROM deeppavlov/base-gpu:1.0.0rc1
RUN pip install git+https://github.com/deeppavlovteam/[email protected]

ARG CONFIG
ARG PORT
Expand All @@ -9,13 +10,12 @@ ENV CONFIG=$CONFIG
ENV PORT=$PORT

COPY ./annotators/NER_deeppavlov/requirements.txt /src/requirements.txt
RUN pip install -r /src/requirements.txt
RUN pip install --upgrade pip && pip install -r /src/requirements.txt

COPY $SRC_DIR /src

WORKDIR /src

RUN python -m deeppavlov install $CONFIG
RUN python -m deeppavlov download $CONFIG

CMD gunicorn --workers=1 --timeout 500 server:app -b 0.0.0.0:8021
8 changes: 7 additions & 1 deletion annotators/NER_deeppavlov/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,10 @@ gunicorn==19.9.0
requests==2.22.0
itsdangerous==2.0.1
jinja2<=3.0.3
Werkzeug<=2.0.3
Werkzeug<=2.0.3
transformers==4.6.0
torch==1.6.0
torchvision==0.7.0
cryptography==2.8
datasets==1.11.0
huggingface-hub==0.0.8
6 changes: 1 addition & 5 deletions annotators/combined_classification/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
FROM deeppavlov/base-gpu:0.12.1
RUN pip install git+https://github.com/deeppavlovteam/[email protected]

#RUN rm DeepPavlov

Expand All @@ -9,24 +10,19 @@ RUN git clone https://github.com/dimakarp1996/DeepPavlov.git
WORKDIR /base/DeepPavlov
RUN git checkout pal-bert+ner


ARG CONFIG

ARG PORT
ENV CONFIG=$CONFIG
ENV PORT=$PORT


#RUN pip install -r requirements.txt
WORKDIR /src
RUN mkdir common


COPY annotators/combined_classification/ ./
COPY common/ common/
RUN ls /tmp

#RUN python -m deeppavlov install $CONFIG
RUN pip install -r requirements.txt
ARG DATA_URL=http://files.deeppavlov.ai/alexaprize_data/pal_bert_7in1/model.pth.tar
ADD $DATA_URL /tmp
Expand Down
2 changes: 1 addition & 1 deletion annotators/dialog_breakdown/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
FROM deeppavlov/base-gpu:0.12.0
RUN pip install git+https://github.com/deeppavlovteam/[email protected]

ARG CONFIG
ARG PORT
Expand All @@ -18,5 +19,4 @@ COPY common/ common/

RUN sed -i "s|$SED_ARG|g" "$CONFIG"

RUN python -m deeppavlov install $CONFIG
CMD gunicorn --workers=1 --bind 0.0.0.0:8082 --timeout=300 server:app
16 changes: 9 additions & 7 deletions annotators/dialog_breakdown/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
gunicorn==19.9.0
sentry-sdk[flask]==0.14.1
flask==1.1.1
itsdangerous==2.0.1
requests==2.22.0
jinja2<=3.0.3
Werkzeug<=2.0.3
gunicorn==19.9.0
sentry-sdk[flask]==0.14.1
flask==1.1.1
itsdangerous==2.0.1
requests==2.22.0
jinja2<=3.0.3
Werkzeug<=2.0.3
git+https://github.com/deeppavlovteam/bert.git@feat/multi_gpu
tensorflow==1.15.5
4 changes: 2 additions & 2 deletions annotators/emotion_classification_deepy/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
FROM deeppavlov/base-gpu:0.12.0
RUN pip install git+https://github.com/deeppavlovteam/[email protected]

WORKDIR /app
COPY . .

RUN python -m deeppavlov install emo_bert.json && \
python -m deeppavlov download emo_bert.json
RUN python -m deeppavlov download emo_bert.json
3 changes: 3 additions & 0 deletions annotators/emotion_classification_deepy/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,6 @@ sentry-sdk==0.13.0
gunicorn==19.9.0
jinja2<=3.0.3
Werkzeug<=2.0.3
git+https://github.com/deeppavlovteam/bert.git@feat/multi_gpu
tensorflow==1.15.5

1 change: 1 addition & 0 deletions annotators/entity_detection/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
FROM deeppavlov/base-gpu:0.12.1
RUN pip install git+https://github.com/deeppavlovteam/[email protected]

RUN apt-get update && apt-get install git -y

Expand Down
Loading

0 comments on commit 3e3dd71

Please sign in to comment.