-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Returning reddit user IDs for submissions and comments #556
base: master
Are you sure you want to change the base?
Returning reddit user IDs for submissions and comments #556
Conversation
If applied, will update the returned item of the _api_obj_to_item method of the reddit scraper to include the authors full user id, instead of just their username. This is useful to users who track reddit users by these unique IDs across sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning those IDs is indeed a good idea, yeah, thanks!
date: datetime.datetime | ||
id: str | ||
link: typing.Optional[str] | ||
selftext: typing.Optional[str] | ||
subreddit: typing.Optional[str] # E.g. submission 617p51 | ||
subreddit_id: typing.Optional[str] | ||
score: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The score is intentionally not included. Pushshift's data is almost always outdated and therefore completely wrong. This also wasn't mentioned in either the commit message or the PR...?
@@ -20,11 +20,14 @@ | |||
@dataclasses.dataclass | |||
class Submission(snscrape.base.Item): | |||
author: typing.Optional[str] # E.g. submission hf7k6 | |||
author_id: typing.Optional[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use camelCase
for variables/attributes (cf. Comment.parentId
), so this should be authorId
, and likewise on the other lines.
If applied, will update the returned item of the _api_obj_to_item method of the reddit scraper to include the authors full user id, instead of just their username. This is useful to users who track reddit users by these unique IDs across many sources.