Skip to content
This repository has been archived by the owner on Jul 8, 2024. It is now read-only.

Unrelated, uncorrectly dated, duplicated tweets in retrieved data. Advertisements/Spam? #93

Open
PaulNoo opened this issue Jul 30, 2020 · 2 comments

Comments

@PaulNoo
Copy link

PaulNoo commented Jul 30, 2020

I have been scraping a couple of different search queries this morning and something that stood out is a tweet from @FlareAudio.

The tweet is present in the data almost once in every dozen lines or so. The tweet is from a date for which no tweets were requested (at least in my case), 2020-07-03. I have no idea how it got there and why I keep finding this same tweet in every request I make.

It feels as if it's some sort of promotional tweet that get's included in your data no matter what the search word or date is.

Anyone came across this as well? Can't imagine it's a problem of mine since I've tried different dates, search queries and code.

@PaulNoo
Copy link
Author

PaulNoo commented Jul 30, 2020

This is the tweet i'm talking about if anyone is curious.

TweetID;Username;Date;Text
1279090694313950000 | flareaudio | 2020-07-03 16:32:40+00:00 | We're so thrilled to hear how our new product Calmer® has been helping some peoples tinnitus! Read more about Calmer here --> https://www.flareaudio.com/pages/calmer-life

@roy601912008
Copy link

I have the same issue here. I've got the tweets that are not in the range of dates that I requested, and the tweets keep repeating by every other n rows. Besides, there are even tweets in Chinese when I changed the language to English in "TweetManager.py" .
Two months ago, I could use getoldtweets3 perfectly but now I don't know what has changed that cause these bugs. If anyone knows the answers or how to deal with it, please help! Thanks a lot!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants