Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL Search string fixes in the directory #823

Open
pgulley opened this issue Oct 15, 2024 · 1 comment
Open

URL Search string fixes in the directory #823

pgulley opened this issue Oct 15, 2024 · 1 comment
Assignees
Labels
directory documentation Improvements or additions to documentation

Comments

@pgulley
Copy link
Member

pgulley commented Oct 15, 2024

@philbudne Made a summary histogram of the different formats we see present in the url_search_string field in the directory:

1562705 rows where url_search_string is NULL
18 rows where url_search_string is empty string
7 rows where url_search_string starts with "http"
62 rows where url_search_string starts with "*"
212 rows where url_search string doesn't start with http or *

We should decide on a standard format we want those to appear in, (probably: scheme/do.ma.in[/path] with a wildcard in some non-zero position of path), document it somewhere, and enforce that standard across the directory. This will involve some additional validation in web-search to enforce going forward, and a sweep across the ~300 entries to try and bring them up-to-date. Thinking now that this is a good 'ticketing' test case.

@pgulley
Copy link
Member Author

pgulley commented Oct 17, 2024

Current consideration is that we want that scheme to be set as do.ma.in[/path] and omit the scheme in the database, instead preferring the scheme to be interpolated in web-search (per #822)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
directory documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants