diff --git a/README.md b/README.md
index f49afc6..ce0c86e 100644
--- a/README.md
+++ b/README.md
@@ -2,8 +2,8 @@
## 中文预训练BERT-wwm(Pre-Trained Chinese BERT with Whole Word Masking)
在自然语言处理领域中,预训练模型(Pre-trained Models)已成为非常重要的基础技术。
-为了进一步促进中文信息处理的研究发展,我们发布了基于全词遮罩(Whole Word Masking)技术的中文预训练模型BERT-wwm,以及与此技术密切相关的模型:BERT-wwm-ext,RoBERTa-wwm-ext,RoBERTa-wwm-ext-large。
+为了进一步促进中文信息处理的研究发展,我们发布了基于全词遮罩(Whole Word Masking)技术的中文预训练模型BERT-wwm,以及与此技术密切相关的模型:BERT-wwm-ext,RoBERTa-wwm-ext,RoBERTa-wwm-ext-large。
@@ -134,9 +134,9 @@ PyTorch版本则包含`pytorch_model.bin`, `bert_config.json`, `vocab.txt`文件
- [**DRCD**:篇章片段抽取型阅读理解(繁体中文)](https://github.com/DRCSolutionService/DRCD)
- [**CJRC**: 法律阅读理解(简体中文)](http://cail.cipsc.org.cn)
- [**XNLI**:自然语言推断](https://github.com/google-research/bert/blob/master/multilingual.md)
+- [**ChnSentiCorp**:情感分析](https://github.com/pengming617/bert_classification)
- [**LCQMC**:句对匹配](http://icrc.hitsz.edu.cn/info/1037/1146.htm)
- [**BQ Corpus**:句对匹配](http://icrc.hitsz.edu.cn/Article/show/175.html)
-- [**NER**:中文命名实体识别](http://sighan.cs.uchicago.edu/bakeoff2006/)
- [**THUCNews**:篇章级文本分类](http://thuctc.thunlp.org)
@@ -197,6 +197,18 @@ PyTorch版本则包含`pytorch_model.bin`, `bert_config.json`, `vocab.txt`文件
| **RoBERTa-wwm-ext-large** | **82.1 (81.3)** | **81.2 (80.6)** |
+### 情感分析:ChnSentiCorp
+| :------- | :---------: | :---------: |
+| BERT | 94.7 (94.3) | 95.0 (94.7) |
+| ERNIE | 95.4 (94.8) | 95.4 **(95.3)** |
+| **BERT-wwm** | 95.1 (94.5) | 95.4 (95.0) |
+| **BERT-wwm-ext** | 95.4 (94.6) | 95.3 (94.7) |
+| **RoBERTa-wwm-ext** | 95.0 (94.6) | 95.6 (94.8) |
+| **RoBERTa-wwm-ext-large** | **95.8 (94.9)** | **95.8** (94.9) |
### 句对分类:LCQMC, BQ Corpus
@@ -226,32 +238,18 @@ PyTorch版本则包含`pytorch_model.bin`, `bert_config.json`, `vocab.txt`文件
| **RoBERTa-wwm-ext-large** | 86.3 **(85.7)** | **85.8 (84.9)** |
-### 命名实体识别:人民日报、MSRA-NER
-| 模型 | 人民日报 | MSRA-NER |
-| :------- | :---------: | :---------: |
-| BERT | 95.2 (94.9) | 95.3 (94.9) |
-| ERNIE | **95.7 (94.5)** | **95.4 (95.1)** |
-| **BERT-wwm** | 95.3 (95.1) | **95.4 (95.1)** |
### 篇章级文本分类:THUCNews
| 模型 | 开发集 | 测试集 |
| :------- | :---------: | :---------: |
-| BERT | 97.7 (97.4) | **97.8 (97.6)** |
+| BERT | 97.7 (97.4) | 97.8 (97.6) |
| ERNIE | 97.6 (97.3) | 97.5 (97.3) |
-| **BERT-wwm** | **98.0 (97.6)** | **97.8 (97.6)** |
+| **BERT-wwm** | 98.0 (97.6) | 97.8 (97.6) |
+| **BERT-wwm-ext** | 97.7 (97.5) | 97.7 (97.5) |
+| **RoBERTa-wwm-ext** | 98.3 (97.9) | 97.7 (97.5) |
+| **RoBERTa-wwm-ext-large** | 98.3 (97.7) | 97.8 (97.6) |
## 使用建议
diff --git a/README_EN.md b/README_EN.md
index 4147e4d..b0cfb66 100644
--- a/README_EN.md
+++ b/README_EN.md
@@ -118,9 +118,9 @@ We experiment on several Chinese datasets, including sentence-level to document-
- [**DRCD**:Span-Extraction Machine Reading Comprehension (Traditional Chinese)](https://github.com/DRCSolutionService/DRCD)
- [**CJRC**: Chinese Judiciary Reading Comprehension](http://cail.cipsc.org.cn)
- [**XNLI**:Natural Langauge Inference](https://github.com/google-research/bert/blob/master/multilingual.md)
+- [**ChnSentiCorp**:Sentiment Analysis](https://github.com/pengming617/bert_classification)
- [**LCQMC**:Sentence Pair Matching](http://icrc.hitsz.edu.cn/info/1037/1146.htm)
- [**BQ Corpus**:Sentence Pair Matching](http://icrc.hitsz.edu.cn/Article/show/175.html)
-- [**NER**:Chinese Named Entity Recognition](http://sighan.cs.uchicago.edu/bakeoff2006/)
- [**THUCNews**:Document-level Text Classification](http://thuctc.thunlp.org)
**Note: To ensure the stability of the results, we run 10 times for each experiment and report maximum and average scores.**
@@ -178,6 +178,19 @@ We use XNLI data for testing NLI task.
| **RoBERTa-wwm-ext** | 80.0 (79.2) | 78.8 (78.3) |
| **RoBERTa-wwm-ext-large** | **82.1 (81.3)** | **81.2 (80.6)** |
+### ChnSentiCorp
+We use ChnSentiCorp data for testing sentiment analysis.
+| Model | Development | Test |
+| :------- | :---------: | :---------: |
+| BERT | 94.7 (94.3) | 95.0 (94.7) |
+| ERNIE | 95.4 (94.8) | 95.4 **(95.3)** |
+| **BERT-wwm** | 95.1 (94.5) | 95.4 (95.0) |
+| **BERT-wwm-ext** | 95.4 (94.6) | 95.3 (94.7) |
+| **RoBERTa-wwm-ext** | 95.0 (94.6) | 95.6 (94.8) |
+| **RoBERTa-wwm-ext-large** | **95.8 (94.9)** | **95.8** (94.9) |
### Sentence Pair Matching:LCQMC, BQ Corpus
#### LCQMC
@@ -203,18 +216,6 @@ We use XNLI data for testing NLI task.
| **RoBERTa-wwm-ext-large** | 86.3 **(85.7)** | **85.8 (84.9)** |
-Other experiments
-### NER
-We use People's Daily and MSRA-NER data for testing Chinese NER.
-| Model | People's Daily | MSRA |
-| :------- | :---------: | :---------: |
-| BERT | 95.2 (94.9) | 95.3 (94.9) |
-| ERNIE | 95.7 (94.5) | 95.4 (95.1) |
-| **BERT-wwm** | 95.3 (95.1) | 95.4 (95.1) |
### THUCNews
Released by Tsinghua University, which contains news in 10 categories.
@@ -223,6 +224,9 @@ Released by Tsinghua University, which contains news in 10 categories.
| BERT | 97.7 (97.4) | 97.8 (97.6) |
| ERNIE | 97.6 (97.3) | 97.5 (97.3) |
| **BERT-wwm** | 98.0 (97.6) | 97.8 (97.6) |
+| **BERT-wwm-ext** | 97.7 (97.5) | 97.7 (97.5) |
+| **RoBERTa-wwm-ext** | 98.3 (97.9) | 97.7 (97.5) |
+| **RoBERTa-wwm-ext-large** | 98.3 (97.7) | 97.8 (97.6) |
diff --git a/pics/header.png b/pics/header.png
index 5edbbf2..acc79cc 100644
Binary files a/pics/header.png and b/pics/header.png differ