-
Notifications
You must be signed in to change notification settings - Fork 126
HTTP Error, Gives 404 but the URL is working #98
Comments
Hello @sagefuentes, I'm dealing with the exact same issue, I also have been downloading tweets for the past weeks and it suddenly stops working giving me error 404 with a valid link. I've no idea what might be the cause... |
I am dealing with the same issue here. This is something new today and is caused by some changes/bugs on Twitter server side. If using the command with debug=True, the URL used to get tweets is no longer available. Seeking for solution now. |
Also started having the same issue today. |
I'm having the same issue as well! Does anyone have a solution for it? |
Yes. I am having the same issue. Guess everyone are having the issue. |
I'm not sure if it is related to this issue, but some of the
|
same |
Seems to be a "bigger" problem? Also other scrappers have problems. |
Here is debug enabled. It shows the actual url being called, and it seems that twitter has removed the
|
Same problem, damn |
I forked and created a branch to allow a user-specified UA, using samples from my current browser doesn't fix the problem. I notice the search and referrer URL shown in $ GetOldTweets3 --username twitter --debug
/home/inactivist/.local/bin/GetOldTweets3 --username twitter --debug
GetOldTweets3 0.0.11
Downloading tweets...
https://twitter.com/i/search/timeline?f=tweets&vertical=news&q=from%3Atwitter&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Host: twitter.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:63.0) Gecko/20100101 Firefox/63.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
X-Requested-With: XMLHttpRequest
Referer: https://twitter.com/i/search/timeline?f=tweets&vertical=news&q=from%3Atwitter&src=typd&&include_available_features=1&include_entities=1&max_position=&reset_error_state=false
Connection: keep-alive
An error occured during an HTTP request: HTTP Error 404: Not Found
Try to open in browser: https://twitter.com/search?q=%20from%3Atwitter&src=typd $ curl -I https://twitter.com/i/search/timeline
HTTP/2 404
[snip] EDIT The url used for the internal search, and the one shown in the exception message, aren't the same... |
I tried replacing https://twitter.com/i/search/timeline with https://twitter.com/search?. 404 error is gone but now there is 400 bad request error. |
Unfortunately i have same problem, i hope we find a solution as soon as possible. |
Switching to |
Same thing for me. I get an error 404 but the URL is working. |
I have same issue |
I am experiencing the same issue. Any plan to fix the issue? |
same issue, somebody help. |
Same issue. The same code was working a day back now its giving error 404 with a valid link |
Maybe it has something to do with this: https://blog.twitter.com/developer/en_us/topics/tips/2020/understanding-the-new-tweet-payload.html |
I am having the same issue. It was more robust than Tweepy. I hope we find a solution as soon as possible. |
Unfortunately twitter api does not fully meet our need, because we need to full history search without any limitations. You can search only 5000 tweet in a month with twitter api. |
I have same issue. Need some help here |
I see! I'm fairly new to scrapping, but I'm working on a end of course thesis about sentiment analysis and could really use some newer tweets to help me out. I've been tinkering with GOT3's code a bit and got it to read the HTML of the search timeline, however it's mostly unformatted. Like I said, I have little experience with scrapping so I'm really struggling to format it correctly. However, I will note my changes, for reference and for someone with more experience to pick-up if they so wish:
Edit: Forgot to say this. Sometimes the application gives me a 400: Bad Request, I run it again, and it outputs the HTML like said before. |
Thanks you so much @sufyanhamid I'm happy if it helped.
With this query, you can collect tweets within 5 miles, surrounding the point coordinate you specify. As far as I know, you can write till 15 miles. |
@burakoglakci Thanks for reply. One more thing that what is the query to fetch the Number of( Comments, Retweets, Likes ) also where I can learn that how to write the query using snstwitter. Kindly share this point as well. |
@burakoglakci Thank you for sharing your code!But when I run it,the error below happened.My computer is in China,and I can get to tweet only by using VPN.Could you help me figure it out? Error retrieving https://twitter.com/search?f=live&lang=en&q=deprem+%2B+place%3A5e02a0f0d91c76d2+%2B+since%3A2020-10-31+until%3A2020-11-03+-filter%3Alinks+-filter%3Areplies&src=spelling_expansion_revert_click: ConnectTimeout(MaxRetryError("HTTPSConnectionPool(host='twitter.com', port=443): Max retries exceeded with url: /search?f=live&lang=en&q=deprem+%2B+place%3A5e02a0f0d91c76d2+%2B+since%3A2020-10-31+until%3A2020-11-03+-filter%3Alinks+-filter%3Areplies&src=spelling_expansion_revert_click (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x0000020FFB1C8D30>, 'Connection to twitter.com timed out. (connect timeout=10)'))")), retrying |
Anyone have a tip for getting all the tweets in an individual's timeline? Have managed to get user tweets (thank you @burakoglakci for your example) but would like to get the tweets the user retweets as well (tweet.retweetedTweet didn't get it). And for any other noobish coders out there, just in case this helps.
|
Hi! |
I found another simple alternative in case people are having trouble with snscrape. It involves the requests and bs4 (Beautiful Soup) libraries contents = requests.get("https://mobile.twitter.com/username") tweets = soup.find_all("tr", {"class":"tweet-container"}) This will give you a list with the html for, if my counting is correct, the last 20 tweets from that account. Obviously, this will not be very useful if you need more than that, but if you don't, then this should work until GOT3 is fixed. A few things to note: 1. You have to use the mobile link. It does not work with the normal link. (This code can still be ran on a desktop computer even with the mobile link) 2. You can use .text to print/store the tweet to a variable without all the html code. As you can see, this code is very bare bones, so feel free to play around with it and add anything I missed or that you think would be useful. |
@Woolwit Thanks for share the more attributes of a tweets. Kindly also share the code/qurey of that how we can get the no.likes, no.retweets, no.comments. |
Arizona USA id: a612c69b44b2e5da Florida USA id: 4ec01c9dbc693497 |
What is the code to get tweet likes and ,.retweet count? |
change from:@Username -> keywords:#hashtag to search by keyword as opposed to username Thanks to all who made this code available! smooth program and helpful for current project! |
|
@burakoglakci:
@burakoglakci: |
first, use snscrape to collect the tweets you want, including tweet id and links. you can collect your tweets in csv or txt file. Then collect tweet objects using this code. |
Thanks for your help @burakoglakci , I'd be lost without this.
I do not get the retweets / replies / likes made by the account. Only its own created tweets. Is there a way to scrape the whole thing ? Would you have a list of the additional parameters which I could add to the scraping ? |
@DV777 Yes, the parameters attached to tweepy apply to tweets that have already been scraped. On snscrape if you remove the |
@burakoglakci is theier any way to find the longitude and latitude of tweets using snscrape!! |
I just used snscrape to get tweets for individual user accounts, filtering by like count. See code here: https://github.com/elizabethhh/Twitter-Data-Mining-Astro/blob/main/testastroold.py. |
can it get over 200k tweets? |
Error retrieving https://twitter.com/search?f=live&lang=en&q=from%3A%40GeminiTerms+%2B+since%3A2015-12-02+until%3A2020-11-10-filter%3Areplies&src=spelling_expansion_revert_click: ReadTimeout(ReadTimeoutError("HTTPSConnectionPool(host='twitter.com', port=443): Read timed out. (read timeout=10)")), retrying |
I don't recommend using Tweepy with snscrape, it's not really efficient, you're basically scraping twice. When you scrape with snscrape there's a tweet object you can interact with that has a lot of information that will cover most use cases. I wouldn't recommend using tweepy's api.statuses_lookup unless you need specific information only offered through tweepy. For those still unsure about using snscrape I did write an article for scraping with snscrape that I hope clears up any confusion about using that library, there's also python scripts and Jupyter notebooks I've created to build off of. I also have a picture in the article showing all the information accessible in snscrape's tweet object. |
Brilliant, thank you Martin! |
So anyway to get historical tweets for a hashtag ? like the most popular hashtag for the word ripple as an example from 2015 ? Tweepy have a limit of one week depth, i tried GOT but i have the same issue as here (404) anyone has another solution to build a database from historical tweets ? :) thanks ! |
Yes, refer to my article as I mentioned above where I cover the basics of using snscrape instead because GetOldTweets3 is basically obsolete due to changes in Twitter's API https://medium.com/better-programming/how-to-scrape-tweets-with-snscrape-90124ed006af In regards to your specific use case, with snscrape you just put whatever query you want inside the quotes inside the TwitterSearchScraper method and adjust the since and until operators to whatever time range you'd want. I created a code snippet for you below. You can take out to i>500 if you don't want to restrict the amount of tweets you want but just want every single tweet.
|
Hello, Thank's for your precious answer ! :) i tried your code and i still get a bug, but now it seems to be on my internet config ? do you have an idea to fix it ? the error msg : Also when i tried this code on another laptop that works even if it's the same config Thank's a lot ! |
Hey! For the ones struggling to use snscrape, I put together a little library to download tweets using snscrape/tweepy according to customizable queries. Although it's still a work in progress, check this repo if you want to give it a try :) |
Hello, There import requests
headers = {
'Connection': 'keep-alive',
'rtt': '300',
'downlink': '0.4',
'ect': '3g',
'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="97", "Chromium";v="97"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-User': '?1',
'Sec-Fetch-Dest': 'document',
'Accept-Language': 'en-US,en;q=0.9,ko;q=0.8',
}
response = requests.get(
'https://www.amazon.de/sp?marketplaceID=A1PA6795UKMFR9&seller=A135E02VGPPVQ&isAmazonFulfilled=1&ref=dp_merchant_link',
headers=headers
)
print(response.status_code) # 404 I really will appreciate if I can get any help from you. |
|
I am having the same issue, does anyone have a solution for it? |
I am having Twitter API errors today, though the usernames I'm searching for appear to be working. Any solutions? I work in R/rtweet, specifically using the tweetbotornot2 package. |
Hi, I had a script running over the past weeks and earlier today it stopped working. I keep receiving HTTPError 404, but the provided link in the errors still brings me to a valid page.
Code is (all mentioned variables are established and the error specifically happens with the Manager when I check via debugging):
tweetCriteria = got.manager.TweetCriteria().setQuerySearch(term)\ .setMaxTweets(max_count)\ .setSince(begin_timeframe)\ .setUntil(end_timeframe) scraped_tweets = got.manager.TweetManager.getTweets(tweetCriteria)
The error message for this is the standard 404 error
"An error occured during an HTTP request: HTTP Error 404: Not Found
Try to open in browser:" followed by the valid link
As I have changed nothing about the folder, I am wondering if something has happened with my configurations more so than anything else, but wondering if others are experiencing this.
The text was updated successfully, but these errors were encountered: