The paper has been accepted in NeurIPS 2024 (Dataset & Benchmark Track). paper
☠️ Warning: The samples presented by this paper may be considered offensive or vulgar.
The opinions and findings contained in the samples of our presented dataset should not be interpreted as representing the views expressed or implied by the authors. We acknowledge the risk of malicious actors attempting to reverse-engineer memes. We sincerely hope that users will employ the dataset responsibly and appropriately, avoiding misuse or abuse. We believe the benefits of our proposed resources outweigh the associated risks. All resources are intended solely for scientific research and are prohibited from commercial use.
To adapt to the Chinese online environment, we introduce the definition of Chinese harmful memes:
Chinese harmful memes are multimodal units consisting of an image and Chinese inline text that have the potential to cause harm to an individual, an organization, a community, a social group, or society as a whole. These memes can range from offense or joking that perpetuate harmful stereotypes towards specific social entities, to memes that are more subtle and general but still have the potential to cause harm. It is important to note that Chinese harmful memes can be created and spread intentionally or unintentionally. They often reflect and reinforce underlying negative values and cultural attitudes on the Chinese Internet, which are detrimental from legal or moral perspectives.
According to the definition, we identify the most common harmful types of memes on Chinese platforms, including targeted harmful, general offense, sexual innuendo, and dispirited culture. We focus on these harmful types when constructing the dataset.
During the annotation, we label memes from two aspects: harmful types (i.e., the above four types) and modality combination (i.e., analyzing toxicity through fused or independent features, including Text-Image Fusion, Harmful Text, and Harmful Image). Finally, we present the ToxiCN MM dataset, which contains 12,000 samples.
Considering the potential risk of abuse, please fill out the following form to request the datasets: https://forms.gle/UN61ZNfTgMZKfMrv7. After we get your request, we will send the dataset to your email as soon as possible.
The dataset labels and captions generated by GPT-4V have been saved as train_data_discription.json
and test_data_discription.json
in the ./data/
directory. Here we simply describe each fine-grain label.
Label | Description |
---|---|
label | Identify if a meme is Harmful (1) or Non-harmful (0). |
type | Non-harmful: 0, Targeted Harmful: 1, Sexual Innuendo: 2, General Offense: 3, Dispirited Culture: 4 |
modal | Non-harmful / Text-Image Fusion: [0, 0], Only Harmful Text: [1, 0], Only Harmful Image: [0, 1], Harmful Text & Image: [1, 1] |
We present a Multimodal Knowledge Enhancement Detector for effective detection. It incorporates contextual information of meme content to enhance the detector's understanding of Chinese memes generated by the LLM. The requirements.txt
file lists the specific dependencies of the project.
This work is licensed under a Creative Commons Attribution- NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0).
If you want to use the resources, please cite the following paper. The camera-ready version of the paper will be released after the conference:
@article{lu2024towards,
title={Towards Comprehensive Detection of Chinese Harmful Memes},
author={Lu, Junyu and Xu, Bo and Zhang, Xiaokun and Wang, Hongbo and Zhu, Haohao and Zhang, Dongyu and Yang, Liang and Lin, Hongfei},
journal={arXiv preprint arXiv:2410.02378},
year={2024}
}