SOM-NCSCM

Chinese Sentence Compression Dataset and the neural Chinese SC model. (EMNLP2021 long paper & oral)

PDF: https://aclanthology.org/2021.emnlp-main.33/

Chinese Sentence Compression Dataset

In the folder: ./Chinese SC dataset, there is a Chinese parallel SC dataset in the telecommunication domain.

Several personal privacy information and domain-relative sensitive information were masked by using special tokens. (More details can be found in our paper:))

And we will continue improving and expanding the Chinese SC dataset.

The SOM-NCSCM.

This is a neural Chinese SC model enhanced with a Self-Origanizing Map (SOM).

We will provide A BASIC PRELIMINARY VERSION of codes soon. (Well, it's not difficult to build this model:) If any problem, just email us or open an issue.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SOM-NCSCM

Chinese Sentence Compression Dataset

The SOM-NCSCM.

Files

README.md

Latest commit

History

README.md

File metadata and controls

SOM-NCSCM

Chinese Sentence Compression Dataset

The SOM-NCSCM.