R2PMML

R package for converting R models to PMML

Features

This package supersedes the standard pmml package:

It produces valid and standards-compliant PMML markup.
It supports several model types (eg. gbm, iForest, ranger, xgb.Booster) that are not supported by the standard pmml package.
It is extremely fast and memory efficient. For example, it can convert a typical randomForest model to a PMML file in a few seconds time, whereas the standard pmml package requires several hours to do the same.

Prerequisites

Java 1.7 or newer. The Java executable must be available on system path.

Installation

Installing the package from its GitHub repository using the devtools package:

library("devtools")

install_git("git://github.com/jpmml/r2pmml.git")

Usage

Base functionality

Loading the package:

library("r2pmml")

Training and exporting a simple randomForest model:

library("randomForest")
library("r2pmml")

data(iris)

# Train a model using raw Iris dataset
iris.rf = randomForest(Species ~ ., data = iris, ntree = 7)
print(iris.rf)

# Export the model to PMML
r2pmml(iris.rf, "iris_rf.pmml")

Data pre-processing

The r2pmml function takes an optional argument preProcess, which associates the model with data pre-processing transformations.

Training and exporting a more sophisticated randomForest model:

library("caret")
library("randomForest")
library("r2pmml")

data(iris)

# Create a preprocessor
iris.preProcess = preProcess(iris, method = c("range"))

# Use the preprocessor to transform raw Iris dataset to pre-processed Iris dataset
iris.transformed = predict(iris.preProcess, newdata = iris)

# Train a model using pre-processed Iris dataset
iris.rf = randomForest(Species ~., data = iris.transformed, ntree = 7)
print(iris.rf)

# Export the model to PMML.
# Pass the preprocessor as the `preProcess` argument
r2pmml(iris.rf, "iris_rf.pmml", preProcess = iris.preProcess)

Model formulae

Alternatively, it is possible to associate lm, glm and randomForest models with data pre-processing transformations via model formulae.

Supported model formula features:

Interaction terms.
base::I(..) function terms:
- Logical operators &, | and !.
- Relational operators ==, !=, <, <=, >= and >.
- Arithmetic operators +, -, *, /, and %.
- Exponentiation operators ^ and **.
- The is.na function.
- Arithmetic functions abs, ceiling, exp, floor, log, log10, round and sqrt.
base::cut() and base::ifelse() function terms.
plyr::revalue() and plyr::mapvalues() function terms.

Training and exporting a glm model:

library("plyr")
library("r2pmml")

# Load and prepare the Auto-MPG dataset
auto = read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data", quote = "\"", header = FALSE, na.strings = "?", row.names = NULL, col.names = c("mpg", "cylinders", "displacement", "horsepower", "weight", "acceleration", "model_year", "origin", "car_name"))
auto$origin = as.factor(auto$origin)
auto$car_name = NULL
auto = na.omit(auto)

# Train a model
auto.glm = glm(mpg ~ (. - horsepower - weight - origin) ^ 2 + I(displacement / cylinders) + cut(horsepower, breaks = c(0, 50, 100, 150, 200, 250)) + I(log(weight)) + revalue(origin, replace = c("1" = "US", "2" = "Europe", "3" = "Japan")), data = auto)

# Export the model to PMML
r2pmml(auto.glm, "auto_glm.pmml")

Package `ranger`

Training and exporting a ranger model:

library("ranger")
library("r2pmml")

data(iris)

# Train a model.
# Keep the forest data structure by specifying `write.forest = TRUE`
iris.ranger = ranger(Species ~ ., data = iris, num.trees = 7, write.forest = TRUE)
print(iris.ranger)

# Export the model to PMML.
# Pass the training dataset as the `data` argument
r2pmml(iris.ranger, "iris_ranger.pmml", data = iris)

Package `xgboost`

Training and exporting an xgb.Booster model:

library("xgboost")
library("r2pmml")

data(iris)

iris_X = iris[, 1:4]
iris_y = as.integer(iris[, 5]) - 1

# Generate XGBoost feature map
iris.fmap = genFMap(iris_X)

# Generate XGBoost DMatrix
iris.DMatrix = genDMatrix(iris_y, iris_X)

# Train a model
iris.xgb = xgboost(data = iris.DMatrix, missing = NULL, objective = "multi:softmax", num_class = 3, nrounds = 13)

# Export the model to PMML.
# Pass the feature map as the `fmap` argument.
# Pass the name and category levels of the target field as `response_name` and `response_levels` arguments, respectively.
# Pass the value of missing value as the `missing` argument
# Pass the optimal number of trees as the `ntreelimit` argument (analogous to the `ntreelimit` argument of the `xgb::predict.xgb.Booster` function)
r2pmml(iris.xgb, "iris_xgb.pmml", fmap = iris.fmap, response_name = "Species", response_levels = c("setosa", "versicolor", "virginica"), missing = NULL, ntreelimit = 7, compact = TRUE)

Advanced functionality

Tweaking JVM configuration:

Sys.setenv(JAVA_TOOL_OPTIONS = "-Xms4G -Xmx8G")

r2pmml(iris.rf, "iris_rf.pmml")

Employing a custom converter class:

r2pmml(iris.rf, "iris_rf.pmml", converter = "com.mycompany.MyRandomForestConverter", converter_classpath = "/path/to/myconverter-1.0-SNAPSHOT.jar")

Please refer to the following resources for more ideas and code examples:

Converting R to PMML

De-installation

Removing the package:

remove.packages("r2pmml")

License

R2PMML is dual-licensed under the GNU Affero General Public License (AGPL) version 3.0, and a commercial license.

Additional information

R2PMML is developed and maintained by Openscoring Ltd, Estonia.

Interested in using JPMML software in your application? Please contact [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
R		R
inst/java		inst/java
java		java
man		man
.Rbuildignore		.Rbuildignore
.Rinstignore		.Rinstignore
DESCRIPTION		DESCRIPTION
LICENSE.txt		LICENSE.txt
NAMESPACE		NAMESPACE
README.md		README.md
cran-comments.md		cran-comments.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R2PMML

Features

Prerequisites

Installation

Usage

Base functionality

Data pre-processing

Model formulae

Package `ranger`

Package `xgboost`

Advanced functionality

De-installation

License

Additional information

About

Releases

Packages

Languages

License

marcabrus/r2pmml

Folders and files

Latest commit

History

Repository files navigation

R2PMML

Features

Prerequisites

Installation

Usage

Base functionality

Data pre-processing

Model formulae

Package ranger

Package xgboost

Advanced functionality

De-installation

License

Additional information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Package `ranger`

Package `xgboost`

Packages