Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

数据标注问题 #5

Open
12138yx opened this issue Sep 19, 2023 · 3 comments
Open

数据标注问题 #5

12138yx opened this issue Sep 19, 2023 · 3 comments

Comments

@12138yx
Copy link

12138yx commented Sep 19, 2023

您好,我看到有些数据的注释和回归标签的内容完全相反或不一致,这是什么原因呢?
例如:annotations=Positive;regression_labels=-0.4

@Columbine21
Copy link
Collaborator

您好,请问是哪条样本呢(样本id是多少),我们检查一下。

@12138yx
Copy link
Author

12138yx commented Sep 19, 2023

监督数据集中的:
video_0001,0013,有人假扮我,我要把事情弄清楚。,-0.2,-0.2,0.0,0.0,Neutral,train
video_0002,0002,这菜还没凉,要不吃点?,0.2,0.2,0.0,0.0,Neutral,train
video_0002,0005,你们就一个球都别想进,-0.6,-0.6,-0.4,-0.4,Neutral,train
video_0002,0027,我工作以来从来没有算错过任何一笔账,我不会因为你而破例,-0.6,-0.4,-0.6,-0.4,Positive,train
....
这样的数据在训练集中有174条,验证集中有61条,测试集中有65条
with open("./dataset/ch-sims2/unaligned-001.pkl", "rb") as f:
data = pickle.load(f)

annotations = data['valid']['annotations']
regression_labels = data['valid']['regression_labels']

for i in range(len(annotations)):
if (annotations[i]=='Positive' and regression_labels[i]>0) or
(annotations[i]=='Negative' and regression_labels[i]<0) or
(annotations[i] == 'Neutral' and regression_labels[i] == 0):
pass
else:
print(annotations[i])
print(regression_labels[i])
print(i)

@Columbine21
Copy link
Collaborator

非常感谢您的反馈,如果上述问题给您带来的困惑,我们很抱歉。
这个问题是由于我们同学的疏忽,统计annotations的时候是所有标注者的少数服从多数的投票结果,而regression_labels 是去掉最高分、最低分的均值,所以导致了部分数据两种标签不一致的情况;论文中的所有实验都只是用了regression_labels,因为通常来说把情感分析当作回归任务能取得更好的效果。

我们已经更新了google drive 上的数据(调整annotations与原有regression_labels的符号保持一致),百度云盘的数据等我们论文一作(已经毕业)从新上传。

如果还有什么其他问题欢迎随时issue提出,不胜感谢。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants