grf: generalized random forests

This repository is in an 'beta' state, and is actively under development. We expect to make continual improvements to performance and usability.

Authors

This package is written and maintained by Julie Tibshirani ([email protected]), Susan Athey, and Stefan Wager.

The repository first started as a fork of the ranger repository -- we owe a great deal of thanks to the ranger authors for their useful and free package.

Installation

The latest release of the package can be installed through CRAN:

install.packages("grf")

Any published release can also be installed from source:

install.packages("https://raw.github.com/swager/grf/master/releases/grf_0.9.3.tar.gz", repos = NULL, type = "source")

Note that to install from source, a compiler that implements C++11 is required (clang 3.3 or higher, or g++ 4.8 or higher). If installing on Windows, the RTools toolchain is also required.

Usage Examples

library(grf)

# Generate data.
n = 2000; p = 10
X = matrix(rnorm(n*p), n, p)
X.test = matrix(0, 101, p)
X.test[,1] = seq(-2, 2, length.out = 101)

# Perform treatment effect estimation.
W = rbinom(n, 1, 0.5)
Y = pmax(X[,1], 0) * W + X[,2] + pmin(X[,3], 0) + rnorm(n)
tau.forest = causal_forest(X, Y, W)
tau.hat = predict(tau.forest, X.test)
plot(X.test[,1], tau.hat$predictions, ylim = range(tau.hat$predictions, 0, 2), xlab = "x", ylab = "tau", type = "l")
lines(X.test[,1], pmax(0, X.test[,1]), col = 2, lty = 2)

# Estimate the conditional average treatment effect on the full sample (CATE).
estimate_average_effect(tau.forest, target.sample = "all")

# Estimate the conditional average treatment effect on the treated sample (CATT).
# Here, we don't expect much difference between the CATE and the CATT, since
# treatment assignment was randomized.
estimate_average_effect(tau.forest, target.sample = "treated")

# Add confidence intervals for heterogeneous treatment effects; growing more trees is now recommended.
tau.forest = causal_forest(X, Y, W, num.trees = 4000)
tau.hat = predict(tau.forest, X.test, estimate.variance = TRUE)
sigma.hat = sqrt(tau.hat$variance.estimates)
plot(X.test[,1], tau.hat$predictions, ylim = range(tau.hat$predictions + 1.96 * sigma.hat, tau.hat$predictions - 1.96 * sigma.hat, 0, 2), xlab = "x", ylab = "tau", type = "l")
lines(X.test[,1], tau.hat$predictions + 1.96 * sigma.hat, col = 1, lty = 2)
lines(X.test[,1], tau.hat$predictions - 1.96 * sigma.hat, col = 1, lty = 2)
lines(X.test[,1], pmax(0, X.test[,1]), col = 2, lty = 1)

For examples on how to use other types of forest, including those for quantile regression and causal effect estimation using instrumental variables, please see the documentation directory.

Developing

In addition to providing out-of-the-box forests for quantile regression and causal effect estimation, grf provides a framework for creating forests tailored to new statistical tasks. If you'd like to develop using grf, please consult the development guide.

References

Susan Athey, Julie Tibshirani and Stefan Wager. Generalized Random Forests, 2016. [arxiv]

Name		Name	Last commit message	Last commit date
Latest commit History 1,100 Commits
benchmarks		benchmarks
core		core
documentation		documentation
experiments		experiments
r-package		r-package
releases		releases
.travis.yml		.travis.yml
COPYING		COPYING
DEVELOPING.md		DEVELOPING.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

grf: generalized random forests

Authors

Installation

Usage Examples

Developing

References

About

Releases

Packages

Languages

License

rinafriedberg/grf

Folders and files

Latest commit

History

Repository files navigation

grf: generalized random forests

Authors

Installation

Usage Examples

Developing

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages