-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: global RNG access #58
Conversation
THis is useful for creating new functions that are controlled by a single seed. It should also be possible to use this for parallel computation.
Explanation: `Rcpp::XPtr` registers a finalizer that deletes the underlying pointer once R's gc collects the pointer. This may delete dqrng's internal RNG.
7b79e3b
to
f8346b9
Compare
I am not sure why the CI tests are suddenly failing here. Anyway, right now I am wondering why you advice against wrapping the raw point into a smart pointer. It is clear that one must not do that with a |
The vignette building in check might not have BH present.
This appears to happen because the new chunk is evaluated (contrary the the few chunks before). This seems to occur as the vignette building environment in the check does not have the BH dependency available. I will change it so that it is also not executed. Edit: If you actually want to execute those chunks, you might need to add some additional packages to |
A shared pointer also represents ownership of the underlying object; hence the underlying object is deleted once all ownership claims are gone. In this case, the RNG should be owned only by the dqrng package, and it must not be deleted because some last For background, from https://en.cppreference.com/w/cpp/memory/shared_ptr:
|
This answer was for the current implementation. Of course, if the original pointer would be the first |
An easier and more natural way might also be if |
This has made me think quite a lot about how the RNGs are currently stored (which is good!), and it seems like things can become quite complex.
In addition one needs to define the use-case properly: With the current "experts only" implementation one can support users that write their own C++ code, understand what is going on and have control over the execution. However, I would like to also support package authors that write code that needs a fast (potentially parallel) RNG. Currently these users have to create their own RNGs and care for properly seeding them. I would like to offer them a mechanism to easily integrate with. So on the one hand would be clear guidance. In order to protect from the dangers of somebody calling #include <Rcpp.h>
// [[Rcpp::depends(dqrng)]]
#include <dqrng.h>
auto rng1 = dqrng::get_rng();
// [[Rcpp::export]]
void foo() {
Rcpp::Rcout << (*rng1)() << std::endl;
}
// [[Rcpp::export]]
void bar() {
auto rng2 = dqrng::get_rng();
Rcpp::Rcout << (*rng2)() << std::endl;
} For the second type of problems (do not modify or delete ...), it might make sense to provide a wrapper class that makes this impossible or at least hard to do. And document that as the only supported way to use dqrng's RNG. Something along these lines: #include <Rcpp.h>
// [[Rcpp::depends(dqrng)]]
#include <dqrng.h>
namespace dqrng {
namespace external {
class random_64bit_generator : public dqrng::random_64bit_generator {
private:
Rcpp::XPtr<dqrng::random_64bit_generator> gen;
public:
random_64bit_generator() : gen(dqrng::get_rng()) {};
virtual result_type operator() () {return gen->operator()();}
virtual void seed(result_type seed) {throw std::runtime_error("Seed handling not supported for this class!");}
virtual void seed(result_type seed, result_type stream) {throw std::runtime_error("Seed handling not supported for this class!");}
};
};
};
// [[Rcpp::export]]
void baz() {
dqrng::external::random_64bit_generator rng3;
Rcpp::Rcout << rng3() << std::endl;
} Notes:
This brings me to the final point: The sampling methods from Any comments? |
I will have to think about this with some time on my hand, but at the top of my head I would try to find out first if 6. is really true. I do not know the implementation well, but I would hope that when the Xptr is copy-constructed, it will not register the finalized again. Edit: Context |
I share your assessments in the beginning, except for no. 6 for which I am not sure (see above). A few comments: Storing the RNGThe RNG is currently stored as Supporting parallel RNG usage
I am not sure if the global RNG access is really that useful writing parallel sampling algorithms; looking at the sample stream-individual seeding is nothing you could get rid of, right? Guidance and wrapperI do not see a usecase for which client-code that would call
Idea: Write templated sampling algorithm and inject base-distributions (uniform, exponential, ...) such that it is simple to use it alongside existing frameworks (base R, dqrng, ...). #include <Rcpp.h>
// [[Rcpp::depends(dqrng)]]
#include <dqrng.h>
namespace internal {
template <typename _RealType, typename _UniformRealDistribution>
class pareto_distribution {
public:
using result_type = _RealType;
class param_type {
// implementation
};
explicit pareto_distribution(const _RealType alpha,
const _RealType lower_bound)
: parm_{alpha, lower_bound} {
// implementation
}
// implementation
template <typename _Engine>
result_type operator()(_Engine&& engine) {
return (*this)(std::forward<_Engine>(engine), parm_);
}
template <typename _Engine>
result_type operator()(_Engine&& engine, const param_type& parm) {
return parm.lower_bound_ /
std::pow(unit_uniform_real_dist_(std::forward<_Engine>(engine)),
1. / parm.alpha_);
}
private:
param_type parm_{};
_UniformRealDistribution unit_uniform_real_dist_{_RealType{0}, _RealType{1}};
// implementation
};
class r_uniform_real_distribution {
// implementation to sample via R with an std-like interface
};
class r_engine {
// implementation for an std-like interface for R engine
};
};
// [[Rcpp::export]]
Rcpp::NumericVector rpareto_r(const double alpha, const double lower_bound) {
using pareto_distribution = internal:: pareto_distribution<double, internal::r_uniform_real_distribution>;
std::unique_ptr rng = std::make_unique(engine{});
// wrapper-implementation
}
// [[Rcpp::export]]
Rcpp::NumericVector rpareto_dqrng(const double alpha, const double lower_bound) {
using pareto_distribution = internal:: pareto_distribution<double, dqrng::r_uniform_distribution>;
Rcpp::Xptr rng = dqrng::get_rng();
// wrapper-implementation
} Problem: Currently, this would require duplicating all of dqrng's sampling algorithms and seeding interface to obtain the functionality. In addition, this could not be done by different packages at the same time without copying each other. Getting a global access would solve this. How this is done is irrelevant; we only need to pass the underlying generator into the Call to sampling routines in
|
Another thought: As your proposed wrapper encapsulates implementation, how to store the RNG internally and how to improve the user experience for setting up and seeding the RNG in parallel usage are sort of independent of exposing the RNG for external use. Maybe this should be discussed in separate issues? |
Thanks @hsloot! You are right that these other things should be separate issues. So what do we need here:
BTW, I had another look at 6. from above and it looks like you might be correct. When an #include <Rcpp.h>
// [[Rcpp::depends(dqrng)]]
#include <dqrng_generator.h>
Rcpp::XPtr<dqrng::random_64bit_generator> rng(new dqrng::random_64bit_wrapper<>());
Rcpp::XPtr<dqrng::random_64bit_generator> get_ptr() {
return rng;
}
// [[Rcpp::export]]
void foo() {
Rcpp::XPtr<dqrng::random_64bit_generator> lrng(get_ptr());
Rcpp::Rcout << (*lrng.checked_get())() << std::endl;
}
// [[Rcpp::export]]
void bar() {
Rcpp::Rcout << (*rng.checked_get())() << std::endl;
}
// [[Rcpp::export]]
void baz(bool finalizer) {
Rcpp::XPtr<dqrng::random_64bit_generator> lrng(get_ptr().get(), finalizer);
Rcpp::Rcout << (*lrng.checked_get())() << std::endl;
} Calling So storing internally in a |
As you have to know that they are the same, I find it more clear to only use one type throughout so that this dependence is clear.
More or less only renaming a few things. But I would try to simplify it and remove this distribution caller approach and just duplicate a bit of code. |
The proposed changes to the dqrng-sample algorithm are not really part of any documented public API, right? Can I just change them or do we have to be careful with revdeps? |
The sampling methods are used in https://daqana.github.io/dqrng/articles/parallel.html#pcg-multiple-streams-with-rcppparallel. So at least this would need to be updated. But I think we should leave those changes to the "usability enhancements". For now it would be great if you could include some tests and add yourself as contributor to |
The new commits contain:
Still pending:
|
The reason why this weird fix is needed is a template vs. polymorphism issue; see #65. |
Great find w.r.t. the boost specialization. Will have to look at this in more detail tomorrow. |
This looks great. Thanks a lot! |
Maybe consider putting all boost modifications in a single header and introduce a flag to activate/deactivate them? |
What I have now:
With these modifications one can use #include <Rcpp.h>
// [[Rcpp::depends(dqrng, BH)]]
#include <dqrng.h>
#include <boost/random/exponential_distribution.hpp>
// [[Rcpp::export(rng = false)]]
Rcpp::NumericVector dqrexp_boost(const std::size_t n, const double rate = 1.0) {
using dist_t = boost::random::exponential_distribution<double>;
using parm_t = typename dist_t::param_type;
const auto parm = parm_t{rate};
auto dist = dist_t{};;
auto out = Rcpp::NumericVector(Rcpp::no_init(n));
auto engine = dqrng::random_64bit_accessor{};
std::generate(out.begin(), out.end(), [&dist, &parm, &engine]() {
return dist(engine, parm);
});
return out;
} And get the unmodified exponential distribution from boost. I feel uneasy about putting the the constructor into #ifndef dqrng_H
#define dqrng_H
#include "dqrng_RcppExports.h"
namespace dqrng {
random_64bit_accessor::random_64bit_accessor() : gen(dqrng::get_rng().get()) {}
} // namespace dqrng
#endif // dqrng_H But this was the only way I found to get out of the circular dependency w/o introducing even more header files. I can add these directly here in the PR if that's ok with you. |
* random_64bit_generator and random_64bit_accessor w/o the constructor definition go to dqrng_types.h * this gets automatically included in dqrng_RcppExports.h. * constructor definition for random_64bit_accessor goes to dqrng.h. * boost modifications go to dqrng_distributions.h
Thanks again @hsloot! |
Provide access to dqrng's RNG for expert users:
dqrng:: get_rng()
(Rcpp::Xptr
needs to be constructed without registering a finalizer to avoid accidentally deleting the global RNG); also slightly improve the implementationClose #41