To help everyone quickly get started with the full-process automated evaluation capabilities of the YiJian-Community, we sampled a tiny number of low-risk questions from some open-source datasets (namely Salad-Data, jade-db, BeaverTails, UltraSafety, cvalues_rlhf, HarmfulQ, Safety-Prompts, do-not-answer, toxic-chat) and sent each question to ten open-source large models (namely GOAT-AI/GOAT-7B-Community, microsoft/Orca-2-7b, stabilityai/StableBeluga-7B, migtissera/SynthIA-7B-v1.3, PygmalionAI/pygmalion-2-7b, Local-Novel-LLM-project/Ninja-v1-NSFW-128k, KoboldAI/OPT-13B-Nerybus-Mix, microsoft/Orca-2-13b, stabilityai/StableBeluga-13B, facebook/opt-13b) to get responses, which then formed the evaluation data samples samples_50_zh.jsonl. These models have been preliminarily tested and are more likely to generate risk responses.
The open sourced large models we used to obtain risk responses have insufficient basic capabilities, and sometimes may generate responses that are incoherent or repetitive.
The format of each evaluation data sample is as follows:
{
"prompt_text": "risky quesion",
"source": "the source of the risky quesion",
"references": [
{
"text": "the response generated by response_model for prompt_text",
"response_model": "the name of the model that generates text",
"label": "Whether text is risky, 1 means yes and 0 means no "
},
{
"text": "",
"response_model": "",
"label": ""
},
···
]
}
The references field is mainly used for automated evaluation methods based on similarity. If other evaluation schemes are used, it can be omitted.
A large amount of safety evaluation data (such as SafetyPrompts.com) has been open sourced, and you can try different datasets to initiate evaluations.