Why are the tweets hydrated less than read? #67

santoshbs · 2020-11-03T11:35:48Z

I just started using Hydrator. After first run I see that number of tweets hydrated (~20,000 tweets) is way less than the total tweets to hydrate and read by the Hydrator app (~5 million tweet ids). Not sure why is this happening. Am I doing anything wrong?

gavinrozzi · 2020-12-02T22:39:57Z

You are not doing anything wrong. This is the expected behavior. I am currently using Hydrator for a project and about ~50% of the tweet IDs are unable to be hydrated. Could be a variety of reasons - the user could have deleted their tweet, their account could have got suspended, or Twitter took their tweet down etc.

edsu · 2020-12-02T23:37:56Z

50% is high though. Delete rates I've seen are usually less than 20%. But I guess it is dataset dependent. One thing to make sure is that you haven't corrupted the ids by opening them with Excel and saving it. Excel can't deal with the large integers and they overflow so that the last two digits are always zero. So take a look at your ids and make sure they don't all end in zero.

gavinrozzi · 2020-12-02T23:46:06Z

Thanks Ed. I think for me it's my specific dataset - I'm working with COVID tweets and others doing the same have noticed a rather low hydration rate due to misinformation being deleted etc. I do most of my work in R / Python so I processed the IDs there to avoid Excel causing any weirdness, so that doesn't apply to my case but maybe it could be a factor in @santoshbs's low hyrdation rate.

edsu · 2020-12-03T00:02:52Z

Oh, that's possible yeah. Which dataset are you working with? I can test to make sure things look ok.

AsmaZbt · 2020-12-28T15:30:36Z

50% is high though. Delete rates I've seen are usually less than 20%. But I guess it is dataset dependent. One thing to make sure is that you haven't corrupted the ids by opening them with Excel and saving it. Excel can't deal with the large integers and they overflow so that the last two digits are always zero. So take a look at your ids and make sure they don't all end in zero.

hello @edsu ,
I have an issue that I don't understand, can you help me, please?
I got the IDs tweets from a CSV file, using data frame (python) then I created a .'txt' file that contains each line a Tweet ID
I checked the IDs they are correct in the TXT file and don't end with Zero.
but after using the Hydrator, I got 33 125 tweets from a total of 55 000 IDs.
and after checking the given CSV file, All the IDs end with Four "4" or Five "5" ZEROs.

is that normal? what should I do?

edsu · 2020-12-28T17:01:38Z

@AsmaZbt does the i.d_str property in your hydrated data also have zeros at the end? Javascript does not handle long integers well (see #25) so the .id value will often be incorrect.

It is normal for the hydrated number to be less than than the tweet ids read. The discrepancy reflects the number of tweets that have been deleted or protected since the dataset was created.

AsmaZbt · 2020-12-28T18:34:58Z

@AsmaZbt does the i.d_str property in your hydrated data also have zeros at the end? Javascript does not handle long integers well (see #25) so the .id value will often be incorrect.

It is normal for the hydrated number to be less than than the tweet ids read. The discrepancy reflects the number of tweets that have been deleted or protected since the dataset was created.

@edsu Thank you so much for the quick reply
@edsu I'm not sure what do you mean by i.d_str but in my hydrated data I found "in_reply_to_status_id" and yes they end with a 0000.

it's not a problem for me to get less data, I when I noted that the hydrated number is less than the tweet ids read, I understood that the missed tweets are deleted or protected. However, I need absolutely the correct IDs to match them with my DATA.

please, is there any solution?

AsmaZbt · 2020-12-28T22:12:46Z

@edsu I checked the value of the '.csv' hydrated file with panda , and the surprise is that the IDs do not end with 0000 but the real IDs. So I think the problem is in EXCEL because when I open the file with excel I don't see the same values in the dataframe (that's strange).

However, while doing a lot of execution with the Hydrator tools, I release that each time I do it, I get more tweets
the first execution, I got 33 125
and on the five execution, I got 33 168. what does it mean ??!

thank you so much for sharing this beautiful work

edsu · 2020-12-29T03:58:20Z

Does the number always go up? Excel does overflow the tweet ids so do be careful how you use it!

AsmaZbt · 2020-12-29T13:37:44Z

Hi @edsu, yes I checked the IDs in the hydrated file and they match perfectly the correct IDs. thank you much

Yes, the number always goes up. I don't understand why!

best regards

edsu · 2020-12-29T14:11:41Z

Can you share the tweet id dataset? I can take a look to see what might be happening.

edsu closed this as completed Sep 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are the tweets hydrated less than read? #67

Why are the tweets hydrated less than read? #67

santoshbs commented Nov 3, 2020

gavinrozzi commented Dec 2, 2020

edsu commented Dec 2, 2020

gavinrozzi commented Dec 2, 2020

edsu commented Dec 3, 2020

AsmaZbt commented Dec 28, 2020

edsu commented Dec 28, 2020 •

edited

Loading

AsmaZbt commented Dec 28, 2020

AsmaZbt commented Dec 28, 2020 •

edited

Loading

edsu commented Dec 29, 2020

AsmaZbt commented Dec 29, 2020

edsu commented Dec 29, 2020

Why are the tweets hydrated less than read? #67

Why are the tweets hydrated less than read? #67

Comments

santoshbs commented Nov 3, 2020

gavinrozzi commented Dec 2, 2020

edsu commented Dec 2, 2020

gavinrozzi commented Dec 2, 2020

edsu commented Dec 3, 2020

AsmaZbt commented Dec 28, 2020

edsu commented Dec 28, 2020 • edited Loading

AsmaZbt commented Dec 28, 2020

AsmaZbt commented Dec 28, 2020 • edited Loading

edsu commented Dec 29, 2020

AsmaZbt commented Dec 29, 2020

edsu commented Dec 29, 2020

edsu commented Dec 28, 2020 •

edited

Loading

AsmaZbt commented Dec 28, 2020 •

edited

Loading