You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So the problem here is that getRandomSample in Java and scala requires a seed, like takeSample in spark. But that's not how sample should work because it's supposed to provide independence between calls when seed is NOT provided. Furthermore the R function provides a default seed = 123 hence the repeated results. I have no doubts this should be randomized, even if sample doesn't have a method for data frames, it's clear how it works on vectors. The question is, should we fix this also in Java/scala or not? When the seed is a mandatory argument, this is not a bug, maybe a missing feature. This poses the larger question of how much we want to align the APIs between languages.
it's supposed to be uniform sampling with or without replacement.
The text was updated successfully, but these errors were encountered: