Reject 'bare' Twitter URLs #645

anjackson · 2020-02-05T14:13:52Z

Many Twitter URLs have been added with no trailing slash, which scopes in all of Twitter. We do not want this, so need to refuse to accept them. The downstream python-w3act code has been modified to drop them.

The W3ACT editor should also reject this kind of URL. We should also consider doing the same for other platforms (Facebook, others?).

NOTE that this scoping issue applies to the URL path, and any query parameters should be dropped before determining the scope. e.g. we see https://www.twitter.com/name?lang=en in seeds.

The text was updated successfully, but these errors were encountered:

nicolabingham · 2020-02-06T08:52:48Z

@anjackson, yes please.

anjackson · 2020-02-07T12:44:29Z

I've updated the ticket description to reflect the fact that regular expressions are a poor way to deal with this problem, and the URL should be properly parsed to determine the final scope.

anjackson · 2020-12-01T16:38:43Z

Noting that as per Twitter docs, usernames match [0-9a-zA-Z_]+, and if the Twitter URL is in the basic form https://twitter.com/username then W3ACT could just add in the trailing slash automatically.

anjackson assigned min2ha Feb 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reject 'bare' Twitter URLs #645

Reject 'bare' Twitter URLs #645

anjackson commented Feb 5, 2020 •

edited

Loading

nicolabingham commented Feb 6, 2020

anjackson commented Feb 7, 2020

anjackson commented Dec 1, 2020 •

edited

Loading

Reject 'bare' Twitter URLs #645

Reject 'bare' Twitter URLs #645

Comments

anjackson commented Feb 5, 2020 • edited Loading

nicolabingham commented Feb 6, 2020

anjackson commented Feb 7, 2020

anjackson commented Dec 1, 2020 • edited Loading

anjackson commented Feb 5, 2020 •

edited

Loading

anjackson commented Dec 1, 2020 •

edited

Loading