Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove source URL validation in scraper #1056

Merged
merged 2 commits into from
Nov 25, 2024
Merged

Conversation

fbacall
Copy link
Member

@fbacall fbacall commented Nov 21, 2024

Summary of changes

  • Removes the validate_url call when running a source through a Scraper.
  • Ensures no external network requests are made when running tests.
  • Adds a with_net_connection test method to allow real network calls in tests if needed (currently used to test private_address_check is working).

Motivation and context

A user created a source with a URL that returned a 301 redirect. This worked fine when testing, but was considered invalid when running through the scraper. The URL format is already validated when creating a source, and any issues when resolving the URL should be raised when running the actual Ingestor.

Checklist

  • I have read and followed the CONTRIBUTING guide.
  • I confirm that I have the authority necessary to make this contribution on behalf of its copyright owner and agree
    to license it to the TeSS codebase under the
    BSD license.

It was disallowing URLs with redirects. Any errors resolving the URL will be raised during scraping anyway.
@fbacall fbacall merged commit 2fa8e11 into master Nov 25, 2024
11 checks passed
@fbacall fbacall deleted the remove-source-url-validation branch November 25, 2024 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant