You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a very interesting dataset, thank you for sharing. I have a few questions:
In the filtered datasets, does the 'score' column correspond to the softmax output for the predicted label?
For the training datasets, I assume the final digits in the filenames just indicate the seeds?
Regarding seeds, were those used to for splitting the data into train and test set? In other words, the union of train and test still always contains the same sentences which were manually annotated?
Am I right in assuming that the datasets on HF correspond to one of the 'lab-manual-split-combine-train-XXXX.xlsx' and 'lab-manual-split-combine-test-XXXX.xlsx' datasets? Which seed exactly?
I am assuming I can use the URL as a unique document identifier? For press conferences, for example, I find 63 unique URLs which corresponds to '# Files' presented in the paper.
Finally, when I locally concatenate all filtered speeches/mm/pc, I actually find a significant number of duplicate sentences, often in documents with varying time stamps. My hunch is that there are simply some sentences that actually reappear in multiple documents. For example, it's reasonable to assume that the below sentence gets recycled regularly. But still I wanted to ask for view on this.
"The Federal Open Market Committee seeks monetary and financial conditions that will foster price stability and promote sustainable growth in output."
Sorry for the long list of questions and apologies if I have missed something obvious in some cases.
Many thanks for your help!
The text was updated successfully, but these errors were encountered:
Hi there,
This is a very interesting dataset, thank you for sharing. I have a few questions:
Sorry for the long list of questions and apologies if I have missed something obvious in some cases.
Many thanks for your help!
The text was updated successfully, but these errors were encountered: