-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why are the tweets hydrated less than read? #67
Comments
You are not doing anything wrong. This is the expected behavior. I am currently using Hydrator for a project and about ~50% of the tweet IDs are unable to be hydrated. Could be a variety of reasons - the user could have deleted their tweet, their account could have got suspended, or Twitter took their tweet down etc. |
50% is high though. Delete rates I've seen are usually less than 20%. But I guess it is dataset dependent. One thing to make sure is that you haven't corrupted the ids by opening them with Excel and saving it. Excel can't deal with the large integers and they overflow so that the last two digits are always zero. So take a look at your ids and make sure they don't all end in zero. |
Thanks Ed. I think for me it's my specific dataset - I'm working with COVID tweets and others doing the same have noticed a rather low hydration rate due to misinformation being deleted etc. I do most of my work in R / Python so I processed the IDs there to avoid Excel causing any weirdness, so that doesn't apply to my case but maybe it could be a factor in @santoshbs's low hyrdation rate. |
Oh, that's possible yeah. Which dataset are you working with? I can test to make sure things look ok. |
hello @edsu , is that normal? what should I do? |
@AsmaZbt does the It is normal for the hydrated number to be less than than the tweet ids read. The discrepancy reflects the number of tweets that have been deleted or protected since the dataset was created. |
@edsu Thank you so much for the quick reply it's not a problem for me to get less data, I when I noted that the hydrated number is less than the tweet ids read, I understood that the missed tweets are deleted or protected. However, I need absolutely the correct IDs to match them with my DATA. please, is there any solution? |
@edsu I checked the value of the '.csv' hydrated file with panda , and the surprise is that the IDs do not end with 0000 but the real IDs. So I think the problem is in EXCEL because when I open the file with excel I don't see the same values in the dataframe (that's strange). However, while doing a lot of execution with the Hydrator tools, I release that each time I do it, I get more tweets thank you so much for sharing this beautiful work |
Does the number always go up? Excel does overflow the tweet ids so do be careful how you use it! |
Hi @edsu, yes I checked the IDs in the hydrated file and they match perfectly the correct IDs. thank you much Yes, the number always goes up. I don't understand why! best regards |
Can you share the tweet id dataset? I can take a look to see what might be happening. |
I just started using Hydrator. After first run I see that number of tweets hydrated (~20,000 tweets) is way less than the total tweets to hydrate and read by the Hydrator app (~5 million tweet ids). Not sure why is this happening. Am I doing anything wrong?
The text was updated successfully, but these errors were encountered: