The Modified Bessel function of the first kind and Automatic Differentiation

I was intending to write a longer blog-post about my favourite dataset of all time: the internet surf-times of a previous co-worker (and dear I say, friend? :) ). However, this will have to be delayed to another post, because in running some old TMB code, all optimisations collapsed. The old codes were built on a Linux computer around two years ago, so I first thought it was due to some difference in the operating systems, or that TMB had changed. However, on inspection the root cause was discovered: In building the likelihood, I needed the modified Bessel function of the first kind \[\begin{align} I_\nu(x) = \sum_{m=0}^\infty \frac{1}{m!\Gamma(m+\nu+1)}\left( \frac{x}{2} \right)^{2m+\nu}. \end{align}\]

This is now implemented in TMB, but at the time of writing my programs, it was not, and I therefore had to create my own implementation.

For my purposes of integrating out a latent process (using the Laplace approximation), the derivative with respecto to both \(x\) and \(\nu\) are needed for the inner problem. However, while my own old implementation of the Bessel function had a way of doing this, the current standard TMB package only allows differentiation with respect to \(x\) but not \(\nu\). As the new implementation has the same name besselI(Type x, Type nu) as my old implementation, the compiler did not complain, but the Laplace approximation was completely off.

So why are the derivatives not implemented in TMB, and how did I do it? The reasons can be found by checking the derivative expressions in at the following wolfram page; The first derivative is easily computed, using the relation \[\begin{align} \frac{\partial}{\partial x} I_\nu (x) = \frac{\nu}{x} I_{\nu}(x) + I_{\nu+1}(x) \end{align}\]

or any of the other relations using the original function itself. The problem lies in the second parameter, \(\nu\), for which there is no closed form expression, which is also the reason why it is not implemented in the standard TMB.

To my implementation: One of the great things about TMB is how it employs the fact that R is running in the background, and that R is also written in C: the function values can be drawn from R (using the \(\texttt{Rmath}\) library) since \(I_\nu(x)\) is already implemented in R as besselI(). The partial derivatives of the function must then be implemented manually. To do this, create the following code in the \(\texttt{"atomic_math.hpp"}\) in the \(\texttt{"include"}\) folder of the TMB package. For the difficult derivative, note that a finite difference approximation of the derivative with respect to \(\nu\) was used, due to the complicated expression of this term. This goes against the exactness of AD, but was tested and found to work well in practice.

/** \brief Atomic version of \f$besselI(x,\nu)\f$.
    Valid parameter range: \f$x =(x,\nu) \in \mathbb{R}_+\times\mathbb{R}\f$.
    \note Derivative wrt. \f$\nu\f$ is now implemented approximately.
    \param x Input vector of length 2.
    \return Vector of length 1.
*/
TMB_ATOMIC_VECTOR_FUNCTION(
            // ATOMIC_NAME
            besselI2
            ,
            // OUTPUT_DIM
            1
            ,
            // ATOMIC_DOUBLE
            ty[0] = Rmath::Rf_bessel_i(tx[0], tx[1], 1.0 /* Not scaled */ );
            ,
            // ATOMIC REVERSE
            Type value = ty[0];
            Type x = tx[0];
            Type nu = tx[1];
            CppAD::vector<Type> arg(2);
            arg[0] = x;
            arg[1] = nu + Type(1);
            px[0] = ( besselI2(arg)[0] + value * (nu / x) )* py[0]; // With respect to x
            arg[1] = nu + Type(0.00001);
            px[1] = ( ( besselI2(arg)[0] - besselI2(tx)[0] ) / Type(0.00001) )* py[0]; // With respect to nu, approximated
            )

Then, to make everything user friendly, add the following code snippet in \(\texttt{"convenience.hpp"}\) of the same folder.

/**
 * Modified bessel function of the first kind.
 * Differentiation is allowed with respect to both parameters, x and nu.
 */

template<class Type>
Type besselI2(Type x, Type nu){
  CppAD::vector<Type> tx(2);
  tx[0] = x;
  tx[1] = nu;
  return atomic::besselI2(tx)[0];
}

Note the naming convention besselI2, as we do not want it to conflict with the current implementation (without a derivative).

I have implemented besselI2 in the same fork as I did for the SPA option: https://github.com/Blunde1/adcomp.

The Bessel occur for many common densities, and its a good thing to be able to use without thinking about it. One such example is the Noncentral chi-squared distribution. The density is given as \[\begin{align} f(x;k,\lambda) = \frac{1}{2}e^{-(x+\lambda)} \left( \frac{x}{\lambda} \right)^{k/4-1/2} I_{k/2-1}(\sqrt{\lambda x}). \end{align}\]

Using the modified TMB version, we can build the likelihood for Noncentral chi-squared observations as so:

#include <TMB.hpp>
template <class Type>
Type objective_function<Type>::operator() ()
{
    DATA_VECTOR(x);
    PARAMETER(k);
    PARAMETER(lambda);
    
    Type jnll, dnchi, u;
    jnll = dnchi = 0;
    u = k / 2 - 1;
    for (int i = 0; i < x.size(); i++) {
        dnchi = 0.5*exp(-(x[i] + lambda) / 2) * pow(Type(x[i] / lambda), Type(u / 2)) *besselI2(sqrt(lambda*x[i]), u);
        jnll -= log(dnchi);
        dnchi = 0;
    }
    return jnll;
}

And on the R side:

# Compile c++ code and load into R
library(TMB)
compile("dbessel_I_nchisq.cpp")
## [1] 0
dyn.load(dynlib("dbessel_I_nchisq"))

n <- 1000
df = 3
ncp = 3.5
data<-rchisq(n, df, ncp)

obj<-MakeADFun(data=list(x=data),parameters=list(k=1,lambda=1), silent = T)
opt<-nlminb(obj$par,obj$fn,obj$gr,obj$he)
rep<-sdreport(obj)
knitr::kable(summary(rep, p.value = TRUE))
Estimate Std. Error z value Pr(>|z^2|)
k 2.732397 0.2905179 9.405262 0
lambda 4.073920 0.3467622 11.748456 0

I will try to make my next post about my favourite co-worker dataset – very much looking forward to that!

Avatar
Berent Å. S. Lunde
Senior Consultant | Adjunct Associate Professor
comments powered by Disqus