This package provides fast algorithms to calculate the marginal
posterior probabilities of data points having a non-zero mean: πn(θi ≠ 0 ∣ X), i = 1, …, n.
Given these, it is easy to calculate the posterior mean, which is also
provided: 𝔼[θ|X] = (𝔼[θi|X], …, 𝔼[θn|X])
Main Methods
There are two main methods: general_sequence_model and
fast_spike_slab_beta.
general_sequence_model
This method can handle the general hierarchical prior described
above. This means it requires the user to provide a choice for πn as input, and
to specify whether the slab prior G should be a Laplace or a Cauchy
distribution.
The run time of this method scales as O(n2) in the
sample size n. It has been
used to handle sample sizes up to 20 000 within half an hour of computation
time.
fast_spike_slab_beta
This method is a faster special-purpose method for the spike-and-slab
prior with Λn = Beta(κ, λ)
This corresponds to the beta-binomial prior $$
\pi_n(s) \propto \frac{\Gamma(\kappa + s) \Gamma(\lambda + n - s)}{s!
(n-s)!}
$$ in the general hierarchical prior. This method further
requires specifying whether the slab prior G should be a Laplace or a Cauchy
distribution.
The run time of this method scales as O(mn3/2),
where m is a user-supplied
parameter that controls the precision of the method. Larger values of
m give more precision but slow
down the algorithm. The default value of m = 20 has been seen to provide
sufficient precision for data sets of different sizes. The method has
been used to handle sample sizes up to 100 000 within half and our of computation
time.
Advanced Usage
The package is organized as a set of supporting functions in R, which
wrap around two C++ functions that implement the main algorithms. The
C++ function corresponding to general_sequence_model is accessible
directly via SSS_hierarchical_prior and
SSS_hierarchical_prior_binomial. The C++ function corresponding
to fast_spike_slab_beta can be accessed via
SSS_discrete_spike_slab. There are further supporting functions
whose name starts with ‘SSS_’.
Implementing Custom Slab Distributions
The C++ functions do not take the data X directly as input. Instead, they
require two vectors Φ = (Φ1, …, Φn)
and Ψ = (Ψ1, …, Ψn)
that specify the densities of the data points X1, …, Xn
conditional on θi being 0 or θi being
distributed according to G,
respectively: Φi = ϕσ(Xi)
Ψi = ∫−∞∞ϕσ(Xi − θi)dG(θi),
where ϕσ(Xi)
is the density of a Gaussian with mean 0 and standard deviation σ. Advanced users may implement
their own slab distributions G
by calculating the Ψ vector
themselves. Custom values for Φ and Ψ may also be used to model
non-Gaussian noise or to incorporate into Φ a shrinkage prior rather than a
point-mass at zero.