-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: Support query filter and scoring #366
Conversation
Codecov Report
@@ Coverage Diff @@
## main #366 +/- ##
==========================================
- Coverage 94.47% 94.25% -0.23%
==========================================
Files 39 39
Lines 3114 3131 +17
Branches 311 312 +1
==========================================
+ Hits 2942 2951 +9
- Misses 144 151 +7
- Partials 28 29 +1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
I can see this is a draft so I haven't looked at these changes in depth, but if you could also rebase your commits to follow the Angular commit message convention as described in #364 (review), that would be great |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had a look through this PR and made some comments, mostly small changes but also one or two about implementation.
To me, it doesn't look worrying, although the o.summary
definitely needs a lookup to the mapping file. In the collaboration meeting, I think you were unsure whether the changes would be controversial or not, is there any particular part of the implementation that could cause issues between facilities?
I haven't been able to test these changes because I'm not sure what values I should use for the config. Would you be able to provide me with some example values to use please as a starting point? Would I be able to use an existing scoring server, or do I need to setup my own?
scoring_enabled: StrictBool | ||
scoring_server: StrictStr | ||
scoring_group: StrictStr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add example values to config.json.example
please? Keeping the scoring disabled might be best in case of someone using DataGateway API only, not the search API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my config:
"scoring_enabled": true,
"scoring_server": "http://dau-dm-01:9000/score?limit=2000",
"scoring_group": "investigation"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is your own scoring server? Does this mean I'll need to setup my own instance to test your changes? Is there a PaNOSC scoring server that I could use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there is not as far as I know and take into account that you need to install and populate with your own data in order to calculate the weights.
It is not a big deal but has to be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good to know, I will setup my own and test this branch when I get a chance
entities = get_search( | ||
entity_name, | ||
filters, | ||
"LOWER(o.summary) like '%" + query.lower() + "%'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this line and line 56 (where you pass "investigations"
to get_score()
), I guess this only works on /documents for now? Have you got a plan of how to make it work for the other endpoints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right and honestly I do not know. There are several reason:
- I focused on making https://data.panosc.eu/ to work. Surprisingly I did discover that only use
/documents
endpoint. So, I did not consider to add something that is not used - When I tried to use the scoring app with the datasets, it did not work when calculating the score (my guess is because of the number of datasets). I created an issue:
Compute gets stuck panosc-eu/panosc-search-scoring#9
So, yes, it might be needed in the future but it very uncertain and first we would need the scoring app to work at the level of datasets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were slightly concerned about the way the score calculation behaves and that it might cause performance issues with large volumes of data. It would be good to confirm (from someone who wrote the scoring software perhaps?) which endpoint(s) the scoring needs to work on.
panosc_entity_name, icat_field_name = mappings.get_icat_mapping | ||
( | ||
panosc_entity_name, | ||
split_field, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for me: formatting change within a file that has functionality changes
@MRichards99 There are some linting errors with a unclear error message. I understand and share your concerns about the performance, that is why I would be happy to keep the scoring at the level of the investigations only for the moment. Cheers, |
@antolinos this error message is from flake8 saying that the code doesn't meet the code formatting standards set out by the formatter black. If you have the Nox sessions working, you can run |
Thanks for the info. I ran black and lint so the format should be fine. However, lot of problems starting by the db_generator. |
Superseded by #399 |
This PR will close #365
Description
Enter a description of the changes here
Testing Instructions
Add a set up instructions describing how the reviewer should test the code
icatdb Generator Script Consistency Test
CI job fails, is this because of a deliberate change made to the script to change generated data (which isn't actually a problem) or is here an underlying issue with the changes made?fix:
,feat:
orBREAKING CHANGE:
so a release is automatically made via GitHub Actions upon merge?Agile Board Tracking
Connect to #{issue number}