Draft: Support query filter and scoring #366

antolinos · 2022-05-19T13:19:35Z

This PR will close #365

Description

Enter a description of the changes here

Testing Instructions

Add a set up instructions describing how the reviewer should test the code

Review code
Check GitHub Actions build
If icatdb Generator Script Consistency Test CI job fails, is this because of a deliberate change made to the script to change generated data (which isn't actually a problem) or is here an underlying issue with the changes made?
Review changes to test coverage
Does this change mean a new patch, minor or major version should be made? If so, does one of the commit messages feature fix:, feat: or BREAKING CHANGE: so a release is automatically made via GitHub Actions upon merge?
{more steps here}

Agile Board Tracking

Connect to #{issue number}

codecov · 2022-05-19T13:34:53Z

Codecov Report

Merging #366 (676adfd) into main (cfa1996) will decrease coverage by 0.22%.
The diff coverage is 52.94%.

❗ Current head 676adfd differs from pull request most recent head df0ad83. Consider uploading reports for the commit df0ad83 to get more accurate results

@@            Coverage Diff             @@
##             main     #366      +/-   ##
==========================================
- Coverage   94.47%   94.25%   -0.23%     
==========================================
  Files          39       39              
  Lines        3114     3131      +17     
  Branches      311      312       +1     
==========================================
+ Hits         2942     2951       +9     
- Misses        144      151       +7     
- Partials       28       29       +1

Impacted Files	Coverage Δ
...gateway_api/src/search_api/query_filter_factory.py	`97.01% <0.00%> (-2.23%)`	⬇️
...atagateway_api/src/datagateway_api/icat/filters.py	`98.28% <60.00%> (-1.13%)`	⬇️
datagateway_api/src/search_api/filters.py	`96.87% <60.00%> (-2.03%)`	⬇️
datagateway_api/src/common/filters.py	`98.00% <75.00%> (-2.00%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

MRichards99 · 2022-05-23T14:18:22Z

I can see this is a draft so I haven't looked at these changes in depth, but if you could also rebase your commits to follow the Angular commit message convention as described in #364 (review), that would be great

MRichards99

I've had a look through this PR and made some comments, mostly small changes but also one or two about implementation.

To me, it doesn't look worrying, although the o.summary definitely needs a lookup to the mapping file. In the collaboration meeting, I think you were unsure whether the changes would be controversial or not, is there any particular part of the implementation that could cause issues between facilities?

I haven't been able to test these changes because I'm not sure what values I should use for the config. Would you be able to provide me with some example values to use please as a starting point? Would I be able to use an existing scoring server, or do I need to setup my own?

datagateway_api/src/api_start_utils.py

MRichards99 · 2022-07-19T15:00:19Z

datagateway_api/src/common/config.py

+    scoring_enabled: StrictBool
+    scoring_server: StrictStr
+    scoring_group: StrictStr


Could you add example values to config.json.example please? Keeping the scoring disabled might be best in case of someone using DataGateway API only, not the search API.

This is my config:

"scoring_enabled": true, "scoring_server": "http://dau-dm-01:9000/score?limit=2000", "scoring_group": "investigation"

I guess this is your own scoring server? Does this mean I'll need to setup my own instance to test your changes? Is there a PaNOSC scoring server that I could use?

No, there is not as far as I know and take into account that you need to install and populate with your own data in order to calculate the weights.
It is not a big deal but has to be done.

That's good to know, I will setup my own and test this branch when I get a chance

datagateway_api/src/datagateway_api/icat/filters.py

MRichards99 · 2022-07-19T15:01:45Z

datagateway_api/src/resources/search_api_endpoints.py

+                entities = get_search(
+                    entity_name,
+                    filters,
+                    "LOWER(o.summary) like '%" + query.lower() + "%'",


Looking at this line and line 56 (where you pass "investigations" to get_score()), I guess this only works on /documents for now? Have you got a plan of how to make it work for the other endpoints?

You are right and honestly I do not know. There are several reason:

I focused on making https://data.panosc.eu/ to work. Surprisingly I did discover that only use /documents endpoint. So, I did not consider to add something that is not used

When I tried to use the scoring app with the datasets, it did not work when calculating the score (my guess is because of the number of datasets). I created an issue:
Compute gets stuck panosc-eu/panosc-search-scoring#9

So, yes, it might be needed in the future but it very uncertain and first we would need the scoring app to work at the level of datasets

We were slightly concerned about the way the score calculation behaves and that it might cause performance issues with large volumes of data. It would be good to confirm (from someone who wrote the scoring software perhaps?) which endpoint(s) the scoring needs to work on.

datagateway_api/src/search_api/panosc_mappings.py

datagateway_api/src/search_api/helpers.py

datagateway_api/src/resources/search_api_endpoints.py

datagateway_api/src/search_api/helpers.py

MRichards99 · 2022-07-19T15:39:41Z

datagateway_api/src/search_api/filters.py

+                panosc_entity_name, icat_field_name = mappings.get_icat_mapping
+                (
+                    panosc_entity_name,
+                    split_field,


Note for me: formatting change within a file that has functionality changes

antolinos · 2022-09-05T13:34:03Z

@MRichards99 There are some linting errors with a unclear error message.
What does it mean BLK100 Black would make changes.?

I understand and share your concerns about the performance, that is why I would be happy to keep the scoring at the level of the investigations only for the moment.
Let me know what needs to be changed.

Cheers,
Alex

MRichards99 · 2022-09-08T11:00:11Z

@MRichards99 There are some linting errors with a unclear error message. What does it mean BLK100 Black would make changes.?

I understand and share your concerns about the performance, that is why I would be happy to keep the scoring at the level of the investigations only for the moment. Let me know what needs to be changed.

Cheers, Alex

@antolinos this error message is from flake8 saying that the code doesn't meet the code formatting standards set out by the formatter black. If you have the Nox sessions working, you can run nox -s black to get Black to correctly format the code (it will make changes automatically). It only takes a few seconds to run so if you're not using the nox sessions, I would be happy to run the formatter on your branch and push the changes. Just let me know if/when you'd like me to do that?

antolinos · 2022-09-08T12:48:40Z

@MRichards99

Thanks for the info. I ran black and lint so the format should be fine. However, lot of problems starting by the db_generator.
Do you know what am I missing?

VKTB · 2023-02-24T11:20:43Z

Superseded by #399

antolinos added 3 commits May 19, 2022 15:18

If query param is passed in the include then it is ignored

c856c9a

Some linting problems

2cebc30

Fixing several millions of linting problems

676adfd

antolinos added 7 commits May 24, 2022 09:24

Working on scoring

49a7e8e

Implementing scoring

d6b1344

Replaced endpoint from documents to Documents

0a44ab9

Added logs

464d021

Added logs

19d6ee9

Lower the query

a3ef96d

Removed itemIds

34849df

antolinos changed the title ~~Draft: Issue 365~~ Draft: Support query filter and scoring Jun 16, 2022

MRichards99 requested changes Jul 19, 2022

View reviewed changes

antolinos added 5 commits September 1, 2022 10:19

Merged

348c8da

Added config example for scoring

df25da2

Propotype for supporting the panosc query filter with the scoring server

2bcab82

Fixed some linting errors

9d2a987

Fixed some linting errors

4b0a8b4

antolinos changed the title ~~Draft: Support query filter and scoring~~ Support query filter and scoring Sep 5, 2022

antolinos changed the title ~~Support query filter and scoring~~ Draft: Support query filter and scoring Sep 5, 2022

Added scoring limit

b9f4672

antolinos changed the title ~~Draft: Support query filter and scoring~~ Support query filter and scoring Sep 6, 2022

Add missing ScoringQueryFilter

846a702

antolinos marked this pull request as draft September 7, 2022 07:05

antolinos changed the title ~~Support query filter and scoring~~ Draft: Support query filter and scoring Sep 7, 2022

Fixed problem with users

671c5e9

Run black

1ad330b

Run black and lint and black

df0ad83

VKTB closed this Feb 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Support query filter and scoring #366

Draft: Support query filter and scoring #366

antolinos commented May 19, 2022

codecov bot commented May 19, 2022 •

edited

Loading

MRichards99 commented May 23, 2022

MRichards99 left a comment

MRichards99 Jul 19, 2022

antolinos Jul 28, 2022

MRichards99 Aug 15, 2022

antolinos Sep 1, 2022

MRichards99 Sep 1, 2022

MRichards99 Jul 19, 2022

antolinos Jul 28, 2022

MRichards99 Aug 15, 2022

MRichards99 Jul 19, 2022

antolinos commented Sep 5, 2022

MRichards99 commented Sep 8, 2022

antolinos commented Sep 8, 2022

VKTB commented Feb 24, 2023

Draft: Support query filter and scoring #366

Draft: Support query filter and scoring #366

Conversation

antolinos commented May 19, 2022

Description

Testing Instructions

Agile Board Tracking

codecov bot commented May 19, 2022 • edited Loading

Codecov Report

MRichards99 commented May 23, 2022

MRichards99 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antolinos commented Sep 5, 2022

MRichards99 commented Sep 8, 2022

antolinos commented Sep 8, 2022

VKTB commented Feb 24, 2023

codecov bot commented May 19, 2022 •

edited

Loading