Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Support query filter and scoring #366

Closed
wants to merge 20 commits into from

Conversation

antolinos
Copy link
Collaborator

This PR will close #365

Description

Enter a description of the changes here

Testing Instructions

Add a set up instructions describing how the reviewer should test the code

  • Review code
  • Check GitHub Actions build
  • If icatdb Generator Script Consistency Test CI job fails, is this because of a deliberate change made to the script to change generated data (which isn't actually a problem) or is here an underlying issue with the changes made?
  • Review changes to test coverage
  • Does this change mean a new patch, minor or major version should be made? If so, does one of the commit messages feature fix:, feat: or BREAKING CHANGE: so a release is automatically made via GitHub Actions upon merge?
  • {more steps here}

Agile Board Tracking

Connect to #{issue number}

@codecov
Copy link

codecov bot commented May 19, 2022

Codecov Report

Merging #366 (676adfd) into main (cfa1996) will decrease coverage by 0.22%.
The diff coverage is 52.94%.

❗ Current head 676adfd differs from pull request most recent head df0ad83. Consider uploading reports for the commit df0ad83 to get more accurate results

@@            Coverage Diff             @@
##             main     #366      +/-   ##
==========================================
- Coverage   94.47%   94.25%   -0.23%     
==========================================
  Files          39       39              
  Lines        3114     3131      +17     
  Branches      311      312       +1     
==========================================
+ Hits         2942     2951       +9     
- Misses        144      151       +7     
- Partials       28       29       +1     
Impacted Files Coverage Δ
...gateway_api/src/search_api/query_filter_factory.py 97.01% <0.00%> (-2.23%) ⬇️
...atagateway_api/src/datagateway_api/icat/filters.py 98.28% <60.00%> (-1.13%) ⬇️
datagateway_api/src/search_api/filters.py 96.87% <60.00%> (-2.03%) ⬇️
datagateway_api/src/common/filters.py 98.00% <75.00%> (-2.00%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@MRichards99
Copy link
Collaborator

I can see this is a draft so I haven't looked at these changes in depth, but if you could also rebase your commits to follow the Angular commit message convention as described in #364 (review), that would be great

@antolinos antolinos changed the title Draft: Issue 365 Draft: Support query filter and scoring Jun 16, 2022
Copy link
Collaborator

@MRichards99 MRichards99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a look through this PR and made some comments, mostly small changes but also one or two about implementation.

To me, it doesn't look worrying, although the o.summary definitely needs a lookup to the mapping file. In the collaboration meeting, I think you were unsure whether the changes would be controversial or not, is there any particular part of the implementation that could cause issues between facilities?

I haven't been able to test these changes because I'm not sure what values I should use for the config. Would you be able to provide me with some example values to use please as a starting point? Would I be able to use an existing scoring server, or do I need to setup my own?

datagateway_api/src/api_start_utils.py Outdated Show resolved Hide resolved
datagateway_api/src/api_start_utils.py Show resolved Hide resolved
Comment on lines +136 to +138
scoring_enabled: StrictBool
scoring_server: StrictStr
scoring_group: StrictStr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add example values to config.json.example please? Keeping the scoring disabled might be best in case of someone using DataGateway API only, not the search API.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my config:

    "scoring_enabled": true,
    "scoring_server": "http://dau-dm-01:9000/score?limit=2000",
    "scoring_group": "investigation"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is your own scoring server? Does this mean I'll need to setup my own instance to test your changes? Is there a PaNOSC scoring server that I could use?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is not as far as I know and take into account that you need to install and populate with your own data in order to calculate the weights.
It is not a big deal but has to be done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good to know, I will setup my own and test this branch when I get a chance

datagateway_api/src/datagateway_api/icat/filters.py Outdated Show resolved Hide resolved
entities = get_search(
entity_name,
filters,
"LOWER(o.summary) like '%" + query.lower() + "%'",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this line and line 56 (where you pass "investigations" to get_score()), I guess this only works on /documents for now? Have you got a plan of how to make it work for the other endpoints?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right and honestly I do not know. There are several reason:

  1. I focused on making https://data.panosc.eu/ to work. Surprisingly I did discover that only use /documents endpoint. So, I did not consider to add something that is not used
  2. When I tried to use the scoring app with the datasets, it did not work when calculating the score (my guess is because of the number of datasets). I created an issue:
    Compute gets stuck panosc-eu/panosc-search-scoring#9

So, yes, it might be needed in the future but it very uncertain and first we would need the scoring app to work at the level of datasets

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were slightly concerned about the way the score calculation behaves and that it might cause performance issues with large volumes of data. It would be good to confirm (from someone who wrote the scoring software perhaps?) which endpoint(s) the scoring needs to work on.

datagateway_api/src/search_api/panosc_mappings.py Outdated Show resolved Hide resolved
datagateway_api/src/search_api/helpers.py Outdated Show resolved Hide resolved
datagateway_api/src/resources/search_api_endpoints.py Outdated Show resolved Hide resolved
datagateway_api/src/search_api/helpers.py Outdated Show resolved Hide resolved
Comment on lines 194 to 197
panosc_entity_name, icat_field_name = mappings.get_icat_mapping
(
panosc_entity_name,
split_field,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for me: formatting change within a file that has functionality changes

@antolinos antolinos changed the title Draft: Support query filter and scoring Support query filter and scoring Sep 5, 2022
@antolinos
Copy link
Collaborator Author

@MRichards99 There are some linting errors with a unclear error message.
What does it mean BLK100 Black would make changes.?

I understand and share your concerns about the performance, that is why I would be happy to keep the scoring at the level of the investigations only for the moment.
Let me know what needs to be changed.

Cheers,
Alex

@antolinos antolinos changed the title Support query filter and scoring Draft: Support query filter and scoring Sep 5, 2022
@antolinos antolinos changed the title Draft: Support query filter and scoring Support query filter and scoring Sep 6, 2022
@antolinos antolinos marked this pull request as draft September 7, 2022 07:05
@antolinos antolinos changed the title Support query filter and scoring Draft: Support query filter and scoring Sep 7, 2022
@MRichards99
Copy link
Collaborator

@MRichards99 There are some linting errors with a unclear error message. What does it mean BLK100 Black would make changes.?

I understand and share your concerns about the performance, that is why I would be happy to keep the scoring at the level of the investigations only for the moment. Let me know what needs to be changed.

Cheers, Alex

@antolinos this error message is from flake8 saying that the code doesn't meet the code formatting standards set out by the formatter black. If you have the Nox sessions working, you can run nox -s black to get Black to correctly format the code (it will make changes automatically). It only takes a few seconds to run so if you're not using the nox sessions, I would be happy to run the formatter on your branch and push the changes. Just let me know if/when you'd like me to do that?

@antolinos
Copy link
Collaborator Author

@MRichards99

Thanks for the info. I ran black and lint so the format should be fine. However, lot of problems starting by the db_generator.
Do you know what am I missing?

@VKTB
Copy link
Contributor

VKTB commented Feb 24, 2023

Superseded by #399

@VKTB VKTB closed this Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support query filter
3 participants