Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject 'bare' Twitter URLs #645

Open
anjackson opened this issue Feb 5, 2020 · 3 comments
Open

Reject 'bare' Twitter URLs #645

anjackson opened this issue Feb 5, 2020 · 3 comments
Assignees

Comments

@anjackson
Copy link
Contributor

anjackson commented Feb 5, 2020

Many Twitter URLs have been added with no trailing slash, which scopes in all of Twitter. We do not want this, so need to refuse to accept them. The downstream python-w3act code has been modified to drop them.

The W3ACT editor should also reject this kind of URL. We should also consider doing the same for other platforms (Facebook, others?).

NOTE that this scoping issue applies to the URL path, and any query parameters should be dropped before determining the scope. e.g. we see https://www.twitter.com/name?lang=en in seeds.

@nicolabingham
Copy link

@anjackson, yes please.

@anjackson
Copy link
Contributor Author

I've updated the ticket description to reflect the fact that regular expressions are a poor way to deal with this problem, and the URL should be properly parsed to determine the final scope.

@anjackson
Copy link
Contributor Author

anjackson commented Dec 1, 2020

Noting that as per Twitter docs, usernames match [0-9a-zA-Z_]+, and if the Twitter URL is in the basic form https://twitter.com/username then W3ACT could just add in the trailing slash automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants