Introduction
Rcpp is a powerful tool to write fast C++ code to speed up R programs. However, it is not easy, or at least not straightforward, to compute numerical integration or do optimization using pure C++ code inside Rcpp.
RcppNumerical integrates a number of open source numerical computing libraries into Rcpp, so that users can call convenient functions to accomplish such tasks.
- To use RcppNumerical with
Rcpp::sourceCpp()
, add
// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::depends(RcppNumerical)]]
in the C++ source file. - To use RcppNumerical in
your package, add Imports: RcppNumerical
and
LinkingTo: Rcpp, RcppEigen, RcppNumerical
to the
DESCRIPTION
file, and import(RcppNumerical)
to
the NAMESPACE
file.
Numerical Integration
One-dimensional
The one-dimensional numerical integration code contained in RcppNumerical is based on the NumericalIntegration library developed by Sreekumar Thaithara Balan, Mark Sauder, and Matt Beall.
To compute integration of a function, first define a functor derived
from the Func
class (under the namespace
Numer
):
class Func
{
public:
virtual double operator()(const double& x) const = 0;
virtual void eval(double* x, const int n) const
{
for(int i = 0; i < n; i++)
[i] = this->operator()(x[i]);
x}
virtual ~Func() {}
};
The first function evaluates one point at a time, and the second version overwrites each point in the array by the corresponding function values. Only the second function will be used by the integration code, but usually it is easier to implement the first one.
RcppNumerical provides a wrapper function for the NumericalIntegration library with the following interface:
inline double integrate(
const Func& f, const double& lower, const double& upper,
double& err_est, int& err_code,
const int subdiv = 100, const double& eps_abs = 1e-8, const double& eps_rel = 1e-6,
const Integrator<double>::QuadratureRule rule = Integrator<double>::GaussKronrod41
)
f
: The functor of integrand.lower
,upper
: Limits of integral.err_est
: Estimate of the error (output).err_code
: Error code (output). Seeinst/include/integration/Integrator.h
Line 676-704.subdiv
: Maximum number of subintervals.eps_abs
,eps_rel
: Absolute and relative tolerance.rule
: Integration rule. Possible values areGaussKronrod{15, 21, 31, 41, 51, 61, 71, 81, 91, 101, 121, 201}
. Rules with larger values have better accuracy, but may involve more function calls.- Return value: The final estimate of the integral.
See a full example below, which can be compiled using the
Rcpp::sourceCpp
function in Rcpp.
// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::depends(RcppNumerical)]]
#include <RcppNumerical.h>
using namespace Numer;
// P(0.3 < X < 0.8), X ~ Beta(a, b)
class BetaPDF: public Func
{
private:
double a;
double b;
public:
(double a_, double b_) : a(a_), b(b_) {}
BetaPDF
double operator()(const double& x) const
{
return R::dbeta(x, a, b, 0);
}
};
// [[Rcpp::export]]
::List integrate_test()
Rcpp{
const double a = 3, b = 10;
const double lower = 0.3, upper = 0.8;
const double true_val = R::pbeta(upper, a, b, 1, 0) -
::pbeta(lower, a, b, 1, 0);
R
(a, b);
BetaPDF fdouble err_est;
int err_code;
const double res = integrate(f, lower, upper, err_est, err_code);
return Rcpp::List::create(
::Named("true") = true_val,
Rcpp::Named("approximate") = res,
Rcpp::Named("error_estimate") = err_est,
Rcpp::Named("error_code") = err_code
Rcpp);
}
Runing the integrate_test()
function in R gives
integrate_test()
# $true
# [1] 0.2528108
#
# $approximate
# [1] 0.2528108
#
# $error_estimate
# [1] 2.806764e-15
#
# $error_code
# [1] 0
Note that infinite intervals are also possible in the case of one-dimensional integration:
// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::depends(RcppNumerical)]]
#include <RcppNumerical.h>
using namespace Numer;
class TestInf: public Func
{
public:
double operator()(const double& x) const
{
return x * x * R::dnorm(x, 0.0, 1.0, 0);
}
};
// [[Rcpp::export]]
::List integrate_test2(const double& lower, const double& upper)
Rcpp{
;
TestInf fdouble err_est;
int err_code;
const double res = integrate(f, lower, upper, err_est, err_code);
return Rcpp::List::create(
::Named("approximate") = res,
Rcpp::Named("error_estimate") = err_est,
Rcpp::Named("error_code") = err_code
Rcpp);
}
integrate(function(x) x^2 * dnorm(x), 0.5, Inf) # integrate() in R
# 0.4845702 with absolute error < 3e-08
integrate_test2(0.5, Inf)
# $approximate
# [1] 0.4845702
#
# $error_estimate
# [1] 1.633995e-08
#
# $error_code
# [1] 0
Multi-dimensional
Multi-dimensional integration in RcppNumerical is done by the Cuba library developed by Thomas Hahn.
To calculate the integration of a multivariate function, one needs to
define a functor that inherits from the MFunc
class:
class MFunc
{
public:
virtual double operator()(Constvec& x) = 0;
virtual ~MFunc() {}
};
Here Constvec
represents a read-only vector with the
definition
// Constant reference to a vector
typedef const Eigen::Ref<const Eigen::VectorXd> Constvec;
(Basically you can treat Constvec
as a
const Eigen::VectorXd
. Using Eigen::Ref
is
mainly to avoid memory copy. See the explanation here.)
The function provided by RcppNumerical for multi-dimensional integration is
inline double integrate(
& f, Constvec& lower, Constvec& upper,
MFuncdouble& err_est, int& err_code,
const int maxeval = 1000,
const double& eps_abs = 1e-6, const double& eps_rel = 1e-6
)
f
: The functor of integrand.lower
,upper
: Limits of integral. Both are vectors of the same dimension off
.err_est
: Estimate of the error (output).err_code
: Error code (output). Non-zero values indicate failure of convergence.maxeval
: Maximum number of function evaluations.eps_abs
,eps_rel
: Absolute and relative tolerance.- Return value: The final estimate of the integral.
See the example below:
// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::depends(RcppNumerical)]]
#include <RcppNumerical.h>
using namespace Numer;
// P(a1 < X1 < b1, a2 < X2 < b2), (X1, X2) ~ N([0], [1 rho])
// ([0], [rho 1])
class BiNormal: public MFunc
{
private:
const double rho;
double const1; // 2 * (1 - rho^2)
double const2; // 1 / (2 * PI) / sqrt(1 - rho^2)
public:
(const double& rho_) : rho(rho_)
BiNormal{
= 2.0 * (1.0 - rho * rho);
const1 = 1.0 / (2 * M_PI) / std::sqrt(1.0 - rho * rho);
const2 }
// PDF of bivariate normal
double operator()(Constvec& x)
{
double z = x[0] * x[0] - 2 * rho * x[0] * x[1] + x[1] * x[1];
return const2 * std::exp(-z / const1);
}
};
// [[Rcpp::export]]
::List integrate_test2()
Rcpp{
(0.5); // rho = 0.5
BiNormal f::VectorXd lower(2);
Eigen<< -1, -1;
lower ::VectorXd upper(2);
Eigen<< 1, 1;
upper double err_est;
int err_code;
const double res = integrate(f, lower, upper, err_est, err_code);
return Rcpp::List::create(
::Named("approximate") = res,
Rcpp::Named("error_estimate") = err_est,
Rcpp::Named("error_code") = err_code
Rcpp);
}
We can test the result in R:
library(mvtnorm)
= pmvnorm(c(-1, -1), c(1, 1), sigma = matrix(c(1, 0.5, 0.5, 1), 2))
trueval integrate_test2()
# $approximate
# [1] 0.4979718
#
# $error_estimate
# [1] 4.612333e-09
#
# $error_code
# [1] 0
as.numeric(trueval) - integrate_test2()$approximate
# [1] 2.893336e-11
Numerical Optimization
Currently RcppNumerical contains the L-BFGS algorithm for unconstrained minimization problems based on the LBFGS++ library.
Again, one needs to first define a functor to represent the multivariate function to be minimized.
class MFuncGrad
{
public:
virtual double f_grad(Constvec& x, Refvec grad) = 0;
virtual ~MFuncGrad() {}
};
Same as the case in multi-dimensional integration,
Constvec
represents a read-only vector and
Refvec
a writable vector. Their definitions are
// Reference to a vector
typedef Eigen::Ref<Eigen::VectorXd> Refvec;
typedef const Eigen::Ref<const Eigen::VectorXd> Constvec;
The f_grad()
member function returns the function value
on vector x
, and overwrites grad
by the
gradient.
The wrapper function for LBFGS++ is
inline int optim_lbfgs(
& f, Refvec x, double& fx_opt,
MFuncGradconst int maxit = 300, const double& eps_f = 1e-6, const double& eps_g = 1e-5
)
f
: The function to be minimized.x
: In: the initial guess. Out: best value of variables found.fx_opt
: Out: Function value on the outputx
.maxit
: Maximum number of iterations.eps_f
: Algorithm stops if|f_{k+1} - f_k| < eps_f * |f_k|
.eps_g
: Algorithm stops if||g|| < eps_g * max(1, ||x||)
.- Return value: Error code. Negative values indicate errors.
Below is an example that illustrates the optimization of the
Rosenbrock function
f(x1, x2) = 100 * (x2 - x1^2)^2 + (1 - x1)^2
:
// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::depends(RcppNumerical)]]
#include <RcppNumerical.h>
using namespace Numer;
// f = 100 * (x2 - x1^2)^2 + (1 - x1)^2
// True minimum: x1 = x2 = 1
class Rosenbrock: public MFuncGrad
{
public:
double f_grad(Constvec& x, Refvec grad)
{
double t1 = x[1] - x[0] * x[0];
double t2 = 1 - x[0];
[0] = -400 * x[0] * t1 - 2 * t2;
grad[1] = 200 * t1;
gradreturn 100 * t1 * t1 + t2 * t2;
}
};
// [[Rcpp::export]]
::List optim_test()
Rcpp{
::VectorXd x(2);
Eigen[0] = -1.2;
x[1] = 1;
xdouble fopt;
;
Rosenbrock fint res = optim_lbfgs(f, x, fopt);
return Rcpp::List::create(
::Named("xopt") = x,
Rcpp::Named("fopt") = fopt,
Rcpp::Named("status") = res
Rcpp);
}
Calling the generated R function optim_test()
gives
optim_test()
# $xopt
# [1] 0.9999683 0.9999354
#
# $fopt
# [1] 1.150395e-09
#
# $status
# [1] 0
A More Practical Example
It may be more meaningful to look at a real application of the RcppNumerical package. Below is an example to fit logistic regression using the L-BFGS algorithm. It also demonstrates the performance of the library.
// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::depends(RcppNumerical)]]
#include <RcppNumerical.h>
using namespace Numer;
typedef Eigen::Map<Eigen::MatrixXd> MapMat;
typedef Eigen::Map<Eigen::VectorXd> MapVec;
class LogisticReg: public MFuncGrad
{
private:
const MapMat X;
const MapVec Y;
public:
(const MapMat x_, const MapVec y_) : X(x_), Y(y_) {}
LogisticReg
double f_grad(Constvec& beta, Refvec grad)
{
// Negative log likelihood
// sum(log(1 + exp(X * beta))) - y' * X * beta
::VectorXd xbeta = X * beta;
Eigenconst double yxbeta = Y.dot(xbeta);
// X * beta => exp(X * beta)
= xbeta.array().exp();
xbeta const double f = (xbeta.array() + 1.0).log().sum() - yxbeta;
// Gradient
// X' * (p - y), p = exp(X * beta) / (1 + exp(X * beta))
// exp(X * beta) => p
.array() /= (xbeta.array() + 1.0);
xbeta.noalias() = X.transpose() * (xbeta - Y);
grad
return f;
}
};
// [[Rcpp::export]]
::NumericVector logistic_reg(Rcpp::NumericMatrix x, Rcpp::NumericVector y)
Rcpp{
const MapMat xx = Rcpp::as<MapMat>(x);
const MapVec yy = Rcpp::as<MapVec>(y);
// Negative log likelihood
(xx, yy);
LogisticReg nll// Initial guess
::VectorXd beta(xx.cols());
Eigen.setZero();
beta
double fopt;
int status = optim_lbfgs(nll, beta, fopt);
if(status < 0)
::stop("fail to converge");
Rcpp
return Rcpp::wrap(beta);
}
Here is the R code to test the function:
set.seed(123)
= 5000
n = 100
p = matrix(rnorm(n * p), n)
x = runif(p)
beta = c(x %*% beta)
xb = exp(xb) / (1 + exp(xb))
p = rbinom(n, 1, p)
y
system.time(res1 <- glm.fit(x, y, family = binomial())$coefficients)
# user system elapsed
# 0.263 0.006 0.269
system.time(res2 <- logistic_reg(x, y))
# user system elapsed
# 0.003 0.000 0.002
max(abs(res1 - res2))
# [1] 0.0001873564
It is much faster than the standard glm.fit()
function
in R! (Although glm.fit()
calculates some other quantities
besides beta.)
RcppNumerical also provides the
fastLR()
function to run fast logistic regression, which is
a modified and more stable version of the code above.
system.time(res3 <- fastLR(x, y)$coefficients)
# user system elapsed
# 0.004 0.000 0.004
max(abs(res1 - res3))
# [1] 7.066969e-06