This vignette is an elaboration of the Hierarchical Normal Example Stan vignette in the bridgesampling R package (Gronau, Singmann, and Wagenmakers 2020). It shows that the LoRaD method (Wang et al. 2023) produces an estimate of the marginal likelihood nearly identical to that produced by Bridge Sampling (Meng and Wong 1996) using the bridgesampling package. The steps in this vignette are:
The first three steps are identical to what is shown in the bridgesampling vignette.
We will use three packages for this vignette: rstan, bridgesampling, and lorad.
library(rstan)
#> Loading required package: StanHeaders
#>
#> rstan version 2.32.3 (Stan version 2.26.1)
#> For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores()).
#> To avoid recompilation of unchanged Stan programs, we recommend calling
#> rstan_options(auto_write = TRUE)
#> For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
#> change `threads_per_chain` option:
#> rstan_options(threads_per_chain = 1)
library(bridgesampling)
library(lorad)
The data for this example comprise \(n=20\) values \(\{y_i : i=1,2,\cdots, n\}\), where \[y_i \sim N(\theta_i, \sigma^2) \\ \theta_i \sim N(\mu, \tau^2)\\ \mu = 0\\ \tau^2 = \tfrac{1}{2}\]
set.seed(12345)
mu <- 0
tau2 <- 0.5
sigma2 <- 1
n <- 20
theta <- rnorm(n, mu, sqrt(tau2))
y <- rnorm(n, theta, sqrt(sigma2))
For Bayesian analyses, the priors used for the group-level mean and variance are: \[\mu \sim N(\mu_0, \tau_0^2)\\ \tau^2 \sim \mbox{InvGamma}(\alpha, \beta)\\ \mu_0 = 0\\ \tau_0^2 = 1\\ \alpha = \beta = 1\]
We will compare the following two models using marginal likelihood:
These models differ only in whether the group-level mean is fixed to
0 or is estimated. As mentioned in the original bridgesampling vignette,
it is important when using STAN to use the target += ...
method for specifying log probability densities so that normalizing
constants are included.
stancodeH0 <- 'data {
int<lower=1> n; // number of observations
vector[n] y; // observations
real<lower=0> alpha;
real<lower=0> beta;
real<lower=0> sigma2;
}
parameters {
real<lower=0> tau2; // group-level variance
vector[n] theta; // participant effects
}
model {
target += inv_gamma_lpdf(tau2 | alpha, beta);
target += normal_lpdf(theta | 0, sqrt(tau2));
target += normal_lpdf(y | theta, sqrt(sigma2));
}
'
stancodeH1 <- 'data {
int<lower=1> n; // number of observations
vector[n] y; // observations
real mu0;
real<lower=0> tau20;
real<lower=0> alpha;
real<lower=0> beta;
real<lower=0> sigma2;
}
parameters {
real mu;
real<lower=0> tau2; // group-level variance
vector[n] theta; // participant effects
}
model {
target += normal_lpdf(mu | mu0, sqrt(tau20));
target += inv_gamma_lpdf(tau2 | alpha, beta);
target += normal_lpdf(theta | mu, sqrt(tau2));
target += normal_lpdf(y | theta, sqrt(sigma2));
}
'
# compile models
stanmodelH0 <- stan_model(model_code = stancodeH0, model_name="stanmodel")
stanmodelH1 <- stan_model(model_code = stancodeH1, model_name="stanmodel")
Now use STAN to sample from the posterior distribution of both models.
# fit models
stanfitH0 <- sampling(stanmodelH0, data = list(y = y, n = n,
alpha = alpha,
beta = beta,
sigma2 = sigma2),
iter = 50000, warmup = 1000, chains = 3, cores = 1, seed = 12345)
#>
#> SAMPLING FOR MODEL 'stanmodel' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 2.9e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.29 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1:
#> Chain 1:
#> Chain 1: Iteration: 1 / 50000 [ 0%] (Warmup)
#> Chain 1: Iteration: 1001 / 50000 [ 2%] (Sampling)
#> Chain 1: Iteration: 6000 / 50000 [ 12%] (Sampling)
#> Chain 1: Iteration: 11000 / 50000 [ 22%] (Sampling)
#> Chain 1: Iteration: 16000 / 50000 [ 32%] (Sampling)
#> Chain 1: Iteration: 21000 / 50000 [ 42%] (Sampling)
#> Chain 1: Iteration: 26000 / 50000 [ 52%] (Sampling)
#> Chain 1: Iteration: 31000 / 50000 [ 62%] (Sampling)
#> Chain 1: Iteration: 36000 / 50000 [ 72%] (Sampling)
#> Chain 1: Iteration: 41000 / 50000 [ 82%] (Sampling)
#> Chain 1: Iteration: 46000 / 50000 [ 92%] (Sampling)
#> Chain 1: Iteration: 50000 / 50000 [100%] (Sampling)
#> Chain 1:
#> Chain 1: Elapsed Time: 0.041 seconds (Warm-up)
#> Chain 1: 1.892 seconds (Sampling)
#> Chain 1: 1.933 seconds (Total)
#> Chain 1:
#>
#> SAMPLING FOR MODEL 'stanmodel' NOW (CHAIN 2).
#> Chain 2:
#> Chain 2: Gradient evaluation took 7e-06 seconds
#> Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
#> Chain 2: Adjust your expectations accordingly!
#> Chain 2:
#> Chain 2:
#> Chain 2: Iteration: 1 / 50000 [ 0%] (Warmup)
#> Chain 2: Iteration: 1001 / 50000 [ 2%] (Sampling)
#> Chain 2: Iteration: 6000 / 50000 [ 12%] (Sampling)
#> Chain 2: Iteration: 11000 / 50000 [ 22%] (Sampling)
#> Chain 2: Iteration: 16000 / 50000 [ 32%] (Sampling)
#> Chain 2: Iteration: 21000 / 50000 [ 42%] (Sampling)
#> Chain 2: Iteration: 26000 / 50000 [ 52%] (Sampling)
#> Chain 2: Iteration: 31000 / 50000 [ 62%] (Sampling)
#> Chain 2: Iteration: 36000 / 50000 [ 72%] (Sampling)
#> Chain 2: Iteration: 41000 / 50000 [ 82%] (Sampling)
#> Chain 2: Iteration: 46000 / 50000 [ 92%] (Sampling)
#> Chain 2: Iteration: 50000 / 50000 [100%] (Sampling)
#> Chain 2:
#> Chain 2: Elapsed Time: 0.043 seconds (Warm-up)
#> Chain 2: 1.742 seconds (Sampling)
#> Chain 2: 1.785 seconds (Total)
#> Chain 2:
#>
#> SAMPLING FOR MODEL 'stanmodel' NOW (CHAIN 3).
#> Chain 3:
#> Chain 3: Gradient evaluation took 8e-06 seconds
#> Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
#> Chain 3: Adjust your expectations accordingly!
#> Chain 3:
#> Chain 3:
#> Chain 3: Iteration: 1 / 50000 [ 0%] (Warmup)
#> Chain 3: Iteration: 1001 / 50000 [ 2%] (Sampling)
#> Chain 3: Iteration: 6000 / 50000 [ 12%] (Sampling)
#> Chain 3: Iteration: 11000 / 50000 [ 22%] (Sampling)
#> Chain 3: Iteration: 16000 / 50000 [ 32%] (Sampling)
#> Chain 3: Iteration: 21000 / 50000 [ 42%] (Sampling)
#> Chain 3: Iteration: 26000 / 50000 [ 52%] (Sampling)
#> Chain 3: Iteration: 31000 / 50000 [ 62%] (Sampling)
#> Chain 3: Iteration: 36000 / 50000 [ 72%] (Sampling)
#> Chain 3: Iteration: 41000 / 50000 [ 82%] (Sampling)
#> Chain 3: Iteration: 46000 / 50000 [ 92%] (Sampling)
#> Chain 3: Iteration: 50000 / 50000 [100%] (Sampling)
#> Chain 3:
#> Chain 3: Elapsed Time: 0.04 seconds (Warm-up)
#> Chain 3: 1.716 seconds (Sampling)
#> Chain 3: 1.756 seconds (Total)
#> Chain 3:
stanfitH1 <- sampling(stanmodelH1, data = list(y = y, n = n,
mu0 = mu0,
tau20 = tau20,
alpha = alpha,
beta = beta,
sigma2 = sigma2),
iter = 50000, warmup = 1000, chains = 3, cores = 1, seed = 12345)
#>
#> SAMPLING FOR MODEL 'stanmodel' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 3.1e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.31 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1:
#> Chain 1:
#> Chain 1: Iteration: 1 / 50000 [ 0%] (Warmup)
#> Chain 1: Iteration: 1001 / 50000 [ 2%] (Sampling)
#> Chain 1: Iteration: 6000 / 50000 [ 12%] (Sampling)
#> Chain 1: Iteration: 11000 / 50000 [ 22%] (Sampling)
#> Chain 1: Iteration: 16000 / 50000 [ 32%] (Sampling)
#> Chain 1: Iteration: 21000 / 50000 [ 42%] (Sampling)
#> Chain 1: Iteration: 26000 / 50000 [ 52%] (Sampling)
#> Chain 1: Iteration: 31000 / 50000 [ 62%] (Sampling)
#> Chain 1: Iteration: 36000 / 50000 [ 72%] (Sampling)
#> Chain 1: Iteration: 41000 / 50000 [ 82%] (Sampling)
#> Chain 1: Iteration: 46000 / 50000 [ 92%] (Sampling)
#> Chain 1: Iteration: 50000 / 50000 [100%] (Sampling)
#> Chain 1:
#> Chain 1: Elapsed Time: 0.048 seconds (Warm-up)
#> Chain 1: 2.546 seconds (Sampling)
#> Chain 1: 2.594 seconds (Total)
#> Chain 1:
#>
#> SAMPLING FOR MODEL 'stanmodel' NOW (CHAIN 2).
#> Chain 2:
#> Chain 2: Gradient evaluation took 8e-06 seconds
#> Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
#> Chain 2: Adjust your expectations accordingly!
#> Chain 2:
#> Chain 2:
#> Chain 2: Iteration: 1 / 50000 [ 0%] (Warmup)
#> Chain 2: Iteration: 1001 / 50000 [ 2%] (Sampling)
#> Chain 2: Iteration: 6000 / 50000 [ 12%] (Sampling)
#> Chain 2: Iteration: 11000 / 50000 [ 22%] (Sampling)
#> Chain 2: Iteration: 16000 / 50000 [ 32%] (Sampling)
#> Chain 2: Iteration: 21000 / 50000 [ 42%] (Sampling)
#> Chain 2: Iteration: 26000 / 50000 [ 52%] (Sampling)
#> Chain 2: Iteration: 31000 / 50000 [ 62%] (Sampling)
#> Chain 2: Iteration: 36000 / 50000 [ 72%] (Sampling)
#> Chain 2: Iteration: 41000 / 50000 [ 82%] (Sampling)
#> Chain 2: Iteration: 46000 / 50000 [ 92%] (Sampling)
#> Chain 2: Iteration: 50000 / 50000 [100%] (Sampling)
#> Chain 2:
#> Chain 2: Elapsed Time: 0.049 seconds (Warm-up)
#> Chain 2: 2.274 seconds (Sampling)
#> Chain 2: 2.323 seconds (Total)
#> Chain 2:
#>
#> SAMPLING FOR MODEL 'stanmodel' NOW (CHAIN 3).
#> Chain 3:
#> Chain 3: Gradient evaluation took 8e-06 seconds
#> Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
#> Chain 3: Adjust your expectations accordingly!
#> Chain 3:
#> Chain 3:
#> Chain 3: Iteration: 1 / 50000 [ 0%] (Warmup)
#> Chain 3: Iteration: 1001 / 50000 [ 2%] (Sampling)
#> Chain 3: Iteration: 6000 / 50000 [ 12%] (Sampling)
#> Chain 3: Iteration: 11000 / 50000 [ 22%] (Sampling)
#> Chain 3: Iteration: 16000 / 50000 [ 32%] (Sampling)
#> Chain 3: Iteration: 21000 / 50000 [ 42%] (Sampling)
#> Chain 3: Iteration: 26000 / 50000 [ 52%] (Sampling)
#> Chain 3: Iteration: 31000 / 50000 [ 62%] (Sampling)
#> Chain 3: Iteration: 36000 / 50000 [ 72%] (Sampling)
#> Chain 3: Iteration: 41000 / 50000 [ 82%] (Sampling)
#> Chain 3: Iteration: 46000 / 50000 [ 92%] (Sampling)
#> Chain 3: Iteration: 50000 / 50000 [100%] (Sampling)
#> Chain 3:
#> Chain 3: Elapsed Time: 0.049 seconds (Warm-up)
#> Chain 3: 2.818 seconds (Sampling)
#> Chain 3: 2.867 seconds (Total)
#> Chain 3:
Now load the bridgesampling package and estimate the log marginal likelihood for each model. The null model \(\cal{H}_0\) is slightly preferred, which makes sense because the data were simulated assuming \(\mu=0\).
# compute log marginal likelihood via bridge sampling for H0
H0.bridge <- bridge_sampler(stanfitH0, silent = TRUE)
# compute log marginal likelihood via bridge sampling for H1
H1.bridge <- bridge_sampler(stanfitH1, silent = TRUE)
print(H0.bridge)
#> Bridge sampling estimate of the log marginal likelihood: -37.53668
#> Estimate obtained in 4 iteration(s) via method "normal".
print(H1.bridge)
#> Bridge sampling estimate of the log marginal likelihood: -37.7984
#> Estimate obtained in 4 iteration(s) via method "normal".
The log marginal likelihood estimates are -37.53668 for \({\cal H}_0\) and -37.7984 for \({\cal H}_1\).
In order to use the LoRaD method, we need to export the sampled
parameter values along with the unnormalized log posterior to a data
frame. The permuted=TRUE
setting causes samples from all 3
chains to be merged.
After loading the lorad package, specify what each column in the
exported parameter file represents. STAN transforms all parameters so
that they are unconstrained (e.g. it log-transforms parameters with
support restricted to the positive real numbers so that the transformed
parameter has support equal to the entire real line). This means that
the lorad package need not perform any transformations. Create a named
list that specifies unconstrained for all columns and
posterior for the column labeled lp
. The lorad
package assumes that all columns labeled posterior should be
summed to form the log posterior kernel (the posterior kernel is simply
the unnormalized posterior: the product of prior and likelihood).
The command lorad_estimate
estimates the log marginal
likelihood. This function is provided
paramsH0
) containing the sampled
parameter vectorscolspecH0
) containing the column
specificationsleft
), the last part (right
),
or a random sample (random
) of the parameter vectors in
paramsH0
The LoRaD method uses the training sample to determine the mean and covariance matrix used to standardize the parameter vectors and the radius to be used in defining the working parameter space. It then uses only a small coverage fraction of the highest density points in the remaining estimation sample to estimate the marginal likelihood. The values 0.5 and 0.1 are generally good values to use for the training fraction and coverage. Using too large a fraction for coverage risks a poor fit of the multivariate normal reference distribution to the sampled values inside the working parameter space.
colspecH0 <- c("tau2"="unconstrained", "theta.1"="unconstrained", "theta.2"="unconstrained", "theta.3"="unconstrained", "theta.4"="unconstrained", "theta.5"="unconstrained", "theta.6"="unconstrained", "theta.7"="unconstrained", "theta.8"="unconstrained", "theta.9"="unconstrained", "theta.10"="unconstrained", "theta.11"="unconstrained", "theta.12"="unconstrained", "theta.13"="unconstrained", "theta.14"="unconstrained", "theta.15"="unconstrained", "theta.16"="unconstrained", "theta.17"="unconstrained", "theta.18"="unconstrained", "theta.19"="unconstrained", "theta.20"="unconstrained", "lp__"="posterior")
results0 <- lorad_estimate(paramsH0, colspecH0, 0.5, "random", 0.1)
lorad_summary(results0)
#> This is lorad 0.0.1.0
#> Parameter sample comprises 147000 sampled points
#> Each sample point is a vector of 21 parameter values
#>
#> Training fraction is 0.5
#> Coverage specified is 0.1
#>
#> Partitioning samples into training and estimation:
#> Sample size is 147000
#> Training sample size is 73500
#> Estimation sample size 73500
#>
#> Processing training sample...
#> Lowest radial distance is 3.418842393
#> Log Delta -2.949772427
#>
#> Processing estimation sample...
#> Number of samples used is 7335
#> Nominal coverage specified is 0.100000
#> Realized coverage is 0.099796
#> Log marginal likelihood is -37.385193
#>
The log marginal likelihood for the null hypothesis, estimated by LoRaD, is -37.385193, which is quite close to the value -37.53668 estimated by the bridgesampling package.
Repeat the LoRaD analysis for the alternative hypothesis.
colspecH1 <- c("mu"="unconstrained", "tau2"="unconstrained", "theta.1"="unconstrained", "theta.2"="unconstrained", "theta.3"="unconstrained", "theta.4"="unconstrained", "theta.5"="unconstrained", "theta.6"="unconstrained", "theta.7"="unconstrained", "theta.8"="unconstrained", "theta.9"="unconstrained", "theta.10"="unconstrained", "theta.11"="unconstrained", "theta.12"="unconstrained", "theta.13"="unconstrained", "theta.14"="unconstrained", "theta.15"="unconstrained", "theta.16"="unconstrained", "theta.17"="unconstrained", "theta.18"="unconstrained", "theta.19"="unconstrained", "theta.20"="unconstrained", "lp__"="posterior")
results1 <- lorad_estimate(paramsH1, colspecH1, 0.5, "random", 0.1)
lorad_summary(results1)
#> This is lorad 0.0.1.0
#> Parameter sample comprises 147000 sampled points
#> Each sample point is a vector of 22 parameter values
#>
#> Training fraction is 0.5
#> Coverage specified is 0.1
#>
#> Partitioning samples into training and estimation:
#> Sample size is 147000
#> Training sample size is 73500
#> Estimation sample size 73500
#>
#> Processing training sample...
#> Lowest radial distance is 3.483237724
#> Log Delta -3.091635220
#>
#> Processing estimation sample...
#> Number of samples used is 7161
#> Nominal coverage specified is 0.100000
#> Realized coverage is 0.097429
#> Log marginal likelihood is -37.693758
#>
The log marginal likelihood for the altenative hypothesis was -37.693758 for lorad (compared to -37.7984 for bridgesampling).