Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyphenate addresses for search #247

Merged
merged 1 commit into from
May 23, 2024
Merged

Conversation

mhieta
Copy link
Collaborator

@mhieta mhieta commented May 20, 2024

Description

Some finnish addresses are not found, when searched with certain words.

For example kello search word should return kellokalliontie, but it does return it at all. But when the search word is hyphenated according to finnish grammar to kel and lo syllables internally, then the search returns more accurate values, including kellokalliontie.

This however slows the search, since almost all finnish addresses have tie or katu in them and searching with them is a heavy operation. Therefore this PR adds a generic feature to exclude pre-defined words from search. tie and katu can be added to excluded words through a separate fixture (services/fixtures/exclusion_words.json).

Breakdown

Requirements

  1. requirements.in
  2. requirements.txt
    • Bump django-munigeo from v0.2.76 to v0.2.83.

Search

  1. services/fixtures/exclusion_words.json
    • Fixture of words to be excluded in search.
  2. services/migrations/0117_exclusionword.py
  3. services/models/init.py
  4. services/models/search_rule.py
    • Add model for storing the words for exclusion.
  5. services/search/api.py
    • Return bad request and message if exclusion word found in search query.
  6. services/search/utils.py
    • Add function that checks if exclusion word is in search query and function that gets attr recursively by following foreign key relations.

Indexing

  1. services/management/commands/index_search_columns.py
    • Hyphenate Finnish addresses and add parameters to choose the addresses for hyphenation by addresses modified_at timestamp.
  2. services/search/constants.py
    • Add constant HYPHENATE_ADDRESSES_MODIFIED_WITHIN_DAYS.

Tests

  1. services/search/tests/conftest.py
    • Add address fixture Kellonsoittajankatu and hyphenate addresses.
  2. services/search/tests/test_api.py
    • Test excluded words and hyphenated addresses.

Also add feature to exclude words from search due to performance issues.
@mhieta mhieta requested a review from japauliina May 20, 2024 11:42
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 75.24752% with 25 lines in your changes are missing coverage. Please review.

Project coverage is 72.39%. Comparing base (2e9caa4) to head (0ccbdd8).
Report is 3 commits behind head on develop.

Files Patch % Lines
...rvices/management/commands/index_search_columns.py 54.83% 12 Missing and 2 partials ⚠️
services/search/utils.py 60.00% 10 Missing ⚠️
services/models/search_rule.py 88.88% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #247      +/-   ##
===========================================
- Coverage    72.73%   72.39%   -0.35%     
===========================================
  Files          232      234       +2     
  Lines         7299     7443     +144     
  Branches      1123     1138      +15     
===========================================
+ Hits          5309     5388      +79     
- Misses        1805     1866      +61     
- Partials       185      189       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@japauliina
Copy link
Collaborator

Could you still write a short description of the issue this change is fixing? This seems to be a quite special case and I would like to understand the context a bit better.

@mhieta
Copy link
Collaborator Author

mhieta commented May 23, 2024

@japauliina Updated PR description.

@mhieta mhieta merged commit 42c4657 into develop May 23, 2024
2 checks passed
@mhieta mhieta deleted the feature/hyphenate-addresses branch May 23, 2024 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants