Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R sample returns always the same sample #34

Open
piccolbo opened this issue Nov 13, 2014 · 4 comments
Open

R sample returns always the same sample #34

piccolbo opened this issue Nov 13, 2014 · 4 comments
Labels
Milestone

Comments

@piccolbo
Copy link
Contributor

it's supposed to be uniform sampling with or without replacement.

@piccolbo piccolbo added the bug label Nov 13, 2014
@piccolbo piccolbo modified the milestone: API-cleanup Nov 18, 2014
@piccolbo piccolbo self-assigned this Nov 19, 2014
@piccolbo
Copy link
Contributor Author

So the problem here is that getRandomSample in Java and scala requires a seed, like takeSample in spark. But that's not how sample should work because it's supposed to provide independence between calls when seed is NOT provided. Furthermore the R function provides a default seed = 123 hence the repeated results. I have no doubts this should be randomized, even if sample doesn't have a method for data frames, it's clear how it works on vectors. The question is, should we fix this also in Java/scala or not? When the seed is a mandatory argument, this is not a bug, maybe a missing feature. This poses the larger question of how much we want to align the APIs between languages.

@piccolbo
Copy link
Contributor Author

Any comments @nhanitvn ?

piccolbo added a commit that referenced this issue Nov 21, 2014
@piccolbo
Copy link
Contributor Author

please @nhanitvn review

@ctn
Copy link
Contributor

ctn commented Jun 4, 2015

@nhanitvn please review & close as appropriate.

@ctn ctn unassigned piccolbo Jun 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants