Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with numeric subreddits in GDPR mode #80

Open
timmc opened this issue Jul 8, 2023 · 1 comment
Open

Error with numeric subreddits in GDPR mode #80

timmc opened this issue Jul 8, 2023 · 1 comment

Comments

@timmc
Copy link

timmc commented Jul 8, 2023

shreddit reliably chokes on parsing comments.csv when the subreddit field is all-numeric, e.g. for /r/404 or /r/2012. For example, if I change ...,technology,... to ...,2012,... in the first record of my comments.csv in the GDPR export, I get the following error:

  2023-07-08T21:34:11.153874Z  INFO  Shredding Comments...
    at src/main.rs:52

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error(Deserialize { pos: Some(Position { byte: 63, line: 1, record: 1 }), err: DeserializeError { field: None, kind: Message("data did not match any variant of untagged enum Source") } })', src/sources/gdpr.rs:17:41
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::result::unwrap_failed
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1687:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1089:23
   4: shreddit::sources::gdpr::list::{{closure}}
             at ./src/sources/gdpr.rs:17:39
   5: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:310:13
...

Maybe the csv Reader is producing a numeric type when there's an all-numeric sequence, and then deserialize fails because the types don't match?

My current workaround is to remove the subreddit: String field from the Gdpr enum variant in comment.rs, as it's not currently used for anything. Removing the offending lines from the .csv should also work.

@timmc
Copy link
Author

timmc commented Jul 8, 2023

Note that this problem doesn't happen with the id field, which can also be all-numeric and is also supposed to be a String. This might have something to do with serde and enums.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant