数据标注问题 #5

12138yx · 2023-09-19T01:00:23Z

您好，我看到有些数据的注释和回归标签的内容完全相反或不一致，这是什么原因呢？
例如：annotations=Positive；regression_labels=-0.4

Columbine21 · 2023-09-19T03:03:00Z

您好，请问是哪条样本呢（样本id是多少），我们检查一下。

12138yx · 2023-09-19T03:22:29Z

监督数据集中的：
video_0001,0013,有人假扮我，我要把事情弄清楚。,-0.2,-0.2,0.0,0.0,Neutral,train
video_0002,0002,这菜还没凉，要不吃点？,0.2,0.2,0.0,0.0,Neutral,train
video_0002,0005,你们就一个球都别想进,-0.6,-0.6,-0.4,-0.4,Neutral,train
video_0002,0027,我工作以来从来没有算错过任何一笔账，我不会因为你而破例,-0.6,-0.4,-0.6,-0.4,Positive,train
....
这样的数据在训练集中有174条，验证集中有61条，测试集中有65条
with open("./dataset/ch-sims2/unaligned-001.pkl", "rb") as f:
data = pickle.load(f)

annotations = data['valid']['annotations']
regression_labels = data['valid']['regression_labels']

for i in range(len(annotations)):
if (annotations[i]=='Positive' and regression_labels[i]>0) or
(annotations[i]=='Negative' and regression_labels[i]<0) or
(annotations[i] == 'Neutral' and regression_labels[i] == 0):
pass
else:
print(annotations[i])
print(regression_labels[i])
print(i)

Columbine21 · 2023-09-19T11:48:28Z

非常感谢您的反馈，如果上述问题给您带来的困惑，我们很抱歉。
这个问题是由于我们同学的疏忽，统计annotations的时候是所有标注者的少数服从多数的投票结果，而regression_labels 是去掉最高分、最低分的均值，所以导致了部分数据两种标签不一致的情况；论文中的所有实验都只是用了regression_labels，因为通常来说把情感分析当作回归任务能取得更好的效果。

我们已经更新了google drive 上的数据（调整annotations与原有regression_labels的符号保持一致），百度云盘的数据等我们论文一作（已经毕业）从新上传。

如果还有什么其他问题欢迎随时issue提出，不胜感谢。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

数据标注问题 #5

数据标注问题 #5

12138yx commented Sep 19, 2023

Columbine21 commented Sep 19, 2023

12138yx commented Sep 19, 2023

Columbine21 commented Sep 19, 2023

数据标注问题 #5

数据标注问题 #5

Comments

12138yx commented Sep 19, 2023

Columbine21 commented Sep 19, 2023

12138yx commented Sep 19, 2023

Columbine21 commented Sep 19, 2023