Skip to content
This repository was archived by the owner on May 11, 2021. It is now read-only.

Should there be so many unlabelled areas? #4

Open
metazool opened this issue Oct 2, 2018 · 3 comments
Open

Should there be so many unlabelled areas? #4

metazool opened this issue Oct 2, 2018 · 3 comments

Comments

@metazool
Copy link
Collaborator

metazool commented Oct 2, 2018

I'm close to having a debugged version of blackstack working in docker containers on our fork of the project at https://github.com/BritishGeologicalSurvey/blackstack now.

However I'm seeing the annotator server fall over because the SQL JOIN is returning empty, the cause is that many of the areas in my set of test documents have not got labels - about 80% of them. Some of the test docs are intentionally low quality. Is this expected and is the best thing to do to limit the query to only labelled areas, or to track down the cause of no area labelling?

@jczaplew
Copy link
Contributor

jczaplew commented Oct 3, 2018

Is the join you are referring to this one - https://github.com/UW-Deepdive-Infrastructure/blackstack/blob/master/annotator/server.py#L197 ?

@metazool
Copy link
Collaborator Author

metazool commented Oct 5, 2018

That's the one, it's the ON area_labels.area_id line that it fails to return data on ( without querying for the label and only up to the first join it ran ok)

I had roughly 1k labels for roughly 5k areas in the test set...

@jczaplew
Copy link
Contributor

jczaplew commented Oct 5, 2018

Interesting...that query is intended to fetch only the areas that have been labeled in order to show p values while tagging areas. Since it isn't a functional piece of the annotator it is safe to remove it if it is being problematic.

I'm assuming the labels you have for areas are stored in the area_labels table?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants