Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: extractFromTemplate and value.extractCategories GREL functions produce empty columns #61

Open
trnstlntk opened this issue Oct 1, 2022 · 5 comments
Labels
bug Something isn't working expression language Support for scripting languages (GREL, Python…)

Comments

@trnstlntk
Copy link
Contributor

I have been trying the extractFromTemplate and value.extractCategories GREL functions in various projects. Both work well in the GREL preview dialog window:

image
image

But then after clicking OK, in the project itself, both produce an empty column. I haven't been able to get it to work in any project for now, but just for testing purposes, here's a project in which it went wrong:
Barbalissos.openrefine.tar.gz

@trnstlntk trnstlntk added bug Something isn't working expression language Support for scripting languages (GREL, Python…) labels Oct 1, 2022
@trnstlntk trnstlntk moved this to To be triaged in Structured Data on Commons Oct 1, 2022
@trnstlntk trnstlntk moved this from To be triaged to 🪲 Bugs in Structured Data on Commons Oct 1, 2022
@wetneb
Copy link
Member

wetneb commented Oct 1, 2022

This is because your expression returns an array, not a single value, and arrays are silently discarded when creating columns out of expressions: OpenRefine/OpenRefine#1088

@wetneb
Copy link
Member

wetneb commented Oct 1, 2022

(see also OpenRefine/OpenRefine#4823, which would be one of my preferred ways to improve this)

Concretely, what you can do on your side is use extractTemplate(value, "Information", "Description")[0].

@wetneb
Copy link
Member

wetneb commented Oct 1, 2022

Or we could decide that this extractFromTemplate function should not return an array, but only its first result. That makes it impossible to fetch other results, in cases where there are more than one matches, so from a programmer's perspective it is a bit disappointing, but perhaps you want to prioritize having a simpler expression.

You could do the same extractCategories and it would only return the first category of the page - that sounds even worse than for extractFromTemplate since files routinely have multiple categories and there is no reason why the first one should be more interesting than the others, so intuitively it is worth explaining to users that arrays exist and how to deal with them, but that's my very biased programmer perspective :-P

@trnstlntk
Copy link
Contributor Author

Or we could decide that this extractFromTemplate function should not return an array, but only its first result. That makes it impossible to fetch other results, in cases where there are more than one matches, so from a programmer's perspective it is a bit disappointing, but perhaps you want to prioritize having a simpler expression.

You could do the same extractCategories and it would only return the first category of the page - that sounds even worse than for extractFromTemplate since files routinely have multiple categories and there is no reason why the first one should be more interesting than the others, so intuitively it is worth explaining to users that arrays exist and how to deal with them, but that's my very biased programmer perspective :-P

I'm finally getting around to documenting this. I will go for the pragmatic approach, providing end users with easy-to-reuse recipes, as I'm noticing that onboarding / learning the whole OpenRefine workflow is already pretty challenging for average Wikimedians.

As an exercise, I tried to come up a workaround myself which will be helpful for others too, but I'm not sure yet if I found the smartest solution. Is something like value.extractCategories()[0,10].toString() a decent workaround, or would you recommend something even nicer? (The 0-10 to catch a lot of values; and the toString to circumvent the 'OpenRefine won't do arrays in cells' issue.)

@wetneb
Copy link
Member

wetneb commented Nov 21, 2022

I would recommend more something like value.extractCategories().join('#') which should join categories with a # symbol between them, such as Category:Art#Category:Spain#Category:Blue. Then, users can easily split those values into multiple cells / columns using the corresponding functions in OpenRefine.

@trnstlntk trnstlntk moved this from 🪲 Bugs to 2023-24 grant - candidates for (bug) fixes in Structured Data on Commons Sep 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working expression language Support for scripting languages (GREL, Python…)
Projects
Status: 2023-24 grant - candidates for (bug) fixes
Development

No branches or pull requests

2 participants