Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude contact cards (and other non-editable blocks) from HIX #3281

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

charludo
Copy link
Contributor

@charludo charludo commented Dec 9, 2024

Short description

The contact cards introduced in #3169 have a severe negative impact on the HIX score. This PR excludes them from the HIX calculations.

Proposed changes

  • before sending text to Textlab, remove all divs with contenteditable="false".
  • this currently only affects contacts, but makes it simple to exclude other blocks in the future, should the need arise

Side effects

  • html.text_content() was previously only used to check for empty pages. For convenience, I have changed the code to send the result of this operation to textlab instead of the raw HTML - but I am uncertain if there has been a decision against this in the past when you added the code in question @david-venhoff - are my changes OK?

Resolved issues

Fixes: #3268


Pull Request Review Guidelines

@david-venhoff
Copy link
Member

@david-venhoff - are my changes OK?

I fear that this might change the hix scores of pages again. The last time we did this our service team had quite a nightmare dealing with municipalities that suddenly could not translate their pages anymore because the hix score was too low.
If we do this, we should probably at least do some tests that this change does not decrease the hix score in comparison to right now.

@JoeyStk JoeyStk self-assigned this Jan 5, 2025
Copy link
Contributor

@JoeyStk JoeyStk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a lot of context about this piece of code. I hope @david-venhoff knows more about it. For what I can see this piece of code looks good and the logic itself makes sense, but I might not think of all possible side effects :/

@charludo
Copy link
Contributor Author

charludo commented Jan 7, 2025

@david-venhoff - are my changes OK?

I fear that this might change the hix scores of pages again. The last time we did this our service team had quite a nightmare dealing with municipalities that suddenly could not translate their pages anymore because the hix score was too low. If we do this, we should probably at least do some tests that this change does not decrease the hix score in comparison to right now.

That's a very valid concern.

I'm not sure how else to progress though, tbh. Maybe passing tostring(html) would be the better option, but that is not guaranteed to achieve text == tostrong(fromstring(text)), afaik.

Removing the divs in question without parsing the HTML is an entirely differnet can of worms.

Frankly I don't think we have a choice but to risk slightly changing HIX scores :( (With prior testing still, of course)

@MizukiTemma MizukiTemma added the deadline Needs to be fixed in the given time label Jan 7, 2025
@charludo charludo force-pushed the fix/exclude-contacts-from-hix branch 4 times, most recently from 51c3185 to 7c9d876 Compare January 11, 2025 07:59
@charludo charludo force-pushed the fix/exclude-contacts-from-hix branch from 7c9d876 to 88d84e1 Compare January 17, 2025 07:38
@charludo
Copy link
Contributor Author

I think we need @osmers to chime in on this 😅
In short: the issue is that we must remove contact cards, otherwise the HIX score worsens a lot; but the only way to reliably do so can lead to very slight changes in the HTML content we send to Textlab, and so could result in slightly changed HIX scores compared to right now, including pages not containing any contact cards. I really cannot say if that change in score would be an increase or decrease.

@osmers
Copy link

osmers commented Jan 28, 2025

@charludo can I already test this in the test cms? I would then just go through each page for a region and see how much the HIX-score changes, even without inserting contact cards. Though I think it would make sense to also test it with inserted contact cards just to make sure (bcs there municipalities will expect an increase in the HIX-score).
As David said, we definitely need to test this and if the tests show rather bad results, in the sense of much lower HIX-scores, we need to rethink how to do this...

@charludo
Copy link
Contributor Author

@charludo can I already test this in the test cms? I would then just go through each page for a region and see how much the HIX-score changes, even without inserting contact cards. Though I think it would make sense to also test it with inserted contact cards just to make sure (bcs there municipalities will expect an increase in the HIX-score). As David said, we definitely need to test this and if the tests show rather bad results, in the sense of much lower HIX-scores, we need to rethink how to do this...

Not really, no, since this is not merged yet :/

Contact Cards should definitely improve HIX compared to plain-text contact details though, yes.

@osmers
Copy link

osmers commented Jan 28, 2025

Could we merge it but possibly block/exclude it from the next release? Then Salua and me can test it :)

@david-venhoff
Copy link
Member

david-venhoff commented Jan 28, 2025

It is also possible to test without merging by creating a branch that ends in -publish-dev-package:

- publish-package:
name: publish-dev-package
context: pypi-test
filters:
branches:
only:
- develop
- /.*-publish-dev-package/

@charludo
Copy link
Contributor Author

It is also possible to test without merging by creating a branch that ends in -publish-dev-package:

- publish-package:
name: publish-dev-package
context: pypi-test
filters:
branches:
only:
- develop
- /.*-publish-dev-package/

Huh, I had no idea!

The build was successfull, but I cannot dinf where I can actually access it 🤔 Do we have automated deploy for these?

@david-venhoff
Copy link
Member

I think it will be deployed on the cms test server, when it updates the next time. I think there is also a way to update the test server manually, so that it runs the pr immediately

@charludo
Copy link
Contributor Author

I think it will be deployed on the cms test server, when it updates the next time. I think there is also a way to update the test server manually, so that it runs the pr immediately

Ohhhh so this will take this build just because it's the newest? But there's no way to switch between versions on the test server?

@david-venhoff
Copy link
Member

Yeah, that is my understanding

@charludo charludo force-pushed the fix/exclude-contacts-from-hix branch from 88d84e1 to edd67bd Compare January 31, 2025 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deadline Needs to be fixed in the given time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exclude contact card from HIX score calculation
6 participants