Bayesian Quadrature: Variance Derivation ======================================== We want to compute: .. math:: \mathbb{V}[Z] \approx \int\int \bar{\ell}(x)C_{\log \ell}(x, x^\prime)\bar{\ell}(x^\prime)p(x)p(x^\prime)\ \mathrm{d}x\ \mathrm{d}x^\prime We expand this to obtain: .. math:: \begin{align*} E[\bar{\ell} C_{\log\ell}\bar{\ell}] &= \int \int \left(K_{\exp(\log\ell)}(x, \mathbf{x}_c)K(\mathbf{x}_c, \mathbf{x}_c)^{-1}\bar{\ell}(\mathbf{x}_c)\right)\left(K_{\log\ell}(x, x^\prime)-K_{\log\ell}(x, \mathbf{x}_c)K_{\log\ell}(\mathbf{x}_c, \mathbf{x}_c)^{-1}K_{\log\ell}(\mathbf{x}_c, x^\prime)\right)\left(K_{\exp(\log\ell)}(x^\prime, \mathbf{x}_c)K(\mathbf{x}_c, \mathbf{x}_c)^{-1}\bar{\ell}(\mathbf{x}_c)\right)p(x)p(x^\prime)\ \mathrm{d}x\ \mathrm{d}x^\prime\\\\ &= \int \int \left(K_{\exp(\log\ell)}(x, \mathbf{x}_c)\alpha(\mathbf{x}_c)\right)\left(K_{\log\ell}(x, x^\prime)-K_{\log\ell}(x, \mathbf{x}_c)K_{\log\ell}(\mathbf{x}_c, \mathbf{x}_c)^{-1}K_{\log\ell}(\mathbf{x}_c, x^\prime)\right)\left(K_{\exp(\log\ell)}(x^\prime, \mathbf{x}_c)\alpha(\mathbf{x}_c)\right)p(x)p(x^\prime)\ \mathrm{d}x\ \mathrm{d}x^\prime\\\\ &= \int \int \left(K_{\exp(\log\ell)}(x, \mathbf{x}_c)\alpha(\mathbf{x}_c)\right)K_{\log\ell}(x, x^\prime)\left(K_{\exp(\log\ell)}(x^\prime, \mathbf{x}_c)\alpha(\mathbf{x}_c)\right)p(x)p(x^\prime)\ \mathrm{d}x\ \mathrm{d}x^\prime\\\\ &\ \ \ \ - \int \int \left(K_{\exp(\log\ell)}(x, \mathbf{x}_c)\alpha(\mathbf{x}_c)\right) K_{\log\ell}(x, \mathbf{x}_c)K_{\log\ell}(\mathbf{x}_c, \mathbf{x}_c)^{-1}K_{\log\ell}(\mathbf{x}_c, x^\prime)\left(K_{\exp(\log\ell)}(x^\prime, \mathbf{x}_c)\alpha(\mathbf{x}_c)\right)p(x)p(x^\prime)\ \mathrm{d}x\ \mathrm{d}x^\prime, \end{align*} where :math:`\alpha(\mathbf{x}_c)=K_{\exp(\log\ell)}(\mathbf{x}_c, \mathbf{x}_c)^{-1}\bar{\ell}(\mathbf{x}_c)`. The first part of this is: .. math:: \begin{align*} \int \int &\left(K_{\exp(\log\ell)}(x, \mathbf{x}_c)\alpha(\mathbf{x}_c)\right)K_{\log\ell}(x, x^\prime)\left(K_{\exp(\log\ell)}(x^\prime, \mathbf{x}_c)\alpha(\mathbf{x}_c)\right)p(x)p(x^\prime)\ \mathrm{d}x\ \mathrm{d}x^\prime\\\\ &= \alpha(\mathbf{x}_c)^\top \left(\int\int K_{\exp(\log\ell)}(\mathbf{x}_c, x)K_{\log\ell}(x, x^\prime)K_{\exp(\log\ell)}(x^\prime, \mathbf{x}_c)p(x)p(x^\prime)\ \mathrm{d}x\ \mathrm{d}x^\prime\right)\alpha(\mathbf{x}_c) \end{align*} And the second part is: .. math:: \begin{align*} - \int \int &\left(K_{\exp(\log\ell)}(x, \mathbf{x}_c)\alpha(\mathbf{x}_c)\right) K_{\log\ell}(x, \mathbf{x}_c)K_{\log\ell}(\mathbf{x}_c, \mathbf{x}_c)^{-1}K_{\log\ell}(\mathbf{x}_c, x^\prime)\left(K_{\exp(\log\ell)}(x^\prime, \mathbf{x}_c)\alpha(\mathbf{x}_c)\right)p(x)p(x^\prime)\ \mathrm{d}x\ \mathrm{d}x^\prime\\\\ &= - \int \int \left(K_{\exp(\log\ell)}(x, \mathbf{x}_c)\alpha(\mathbf{x}_c)\right) K_{\log\ell}(x, \mathbf{x}_c)\left(L_{\log\ell}(\mathbf{x}_c, \mathbf{x}_c)^{-1}\right)^\top L_{\log\ell}(\mathbf{x}_c, \mathbf{x}_c)^{-1}K_{\log\ell}(\mathbf{x}_c, x^\prime)\left(K_{\exp(\log\ell)}(x^\prime, \mathbf{x}_c)\alpha(\mathbf{x}_c)\right)p(x)p(x^\prime)\ \mathrm{d}x\ \mathrm{d}x^\prime\\\\ &= - \int \int \left(L_{\log\ell}(\mathbf{x}_c, \mathbf{x}_c)^{-1}K_{\log\ell}(\mathbf{x}_c, x) K_{\exp(\log\ell)}(x, \mathbf{x}_c)\alpha(\mathbf{x}_c) \right)^\top \left(L_{\log\ell}(\mathbf{x}_c, \mathbf{x}_c)^{-1}K_{\log\ell}(\mathbf{x}_c, x^\prime)K_{\exp(\log\ell)}(x^\prime, \mathbf{x}_c)\alpha(\mathbf{x}_c)\right)p(x)p(x^\prime)\ \mathrm{d}x\ \mathrm{d}x^\prime\\\\ &= - \left[L_{\log\ell}(\mathbf{x}_c, \mathbf{x}_c)^{-1}\left(\int K_{\log\ell}(\mathbf{x}_c, x) K_{\exp(\log\ell)}(x, \mathbf{x}_c)p(x)\ \mathrm{d}x \right)\alpha(\mathbf{x}_c)\right]^\top\left[L_{\log\ell}(\mathbf{x}_c, \mathbf{x}_c)^{-1}\left(\int K_{\log\ell}(\mathbf{x}_c, x) K_{\exp(\log\ell)}(x, \mathbf{x}_c)p(x)\ \mathrm{d}x \right)\alpha(\mathbf{x}_c)\right]\\\\ &= -\beta(\mathbf{x}_c)^\top\beta(\mathbf{x}_c), \end{align*} where :math:`\beta(\mathbf{x}_c)=L_{\log\ell}(\mathbf{x}_c, \mathbf{x}_c)^{-1}\left(\int K_{\log\ell}(\mathbf{x}_c, x) K_{\exp(\log\ell)}(x, \mathbf{x}_c)p(x)\ \mathrm{d}x \right)\alpha(\mathbf{x}_c)` and :math:`L_{\log\ell}(\mathbf{x}_c, \mathbf{x}_c)` is the Cholesky decomposition of :math:`K_{\log\ell}(\mathbf{x}_c, \mathbf{x}_c)`. Putting these together, we obtain: .. math:: E[\bar{\ell} C_{\log\ell}\bar{\ell}]=\alpha(\mathbf{x}_c)^\top \left(\int\int K_{\exp(\log\ell)}(\mathbf{x}_c, x)K_{\log\ell}(x, x^\prime)K_{\exp(\log\ell)}(x^\prime, \mathbf{x}_c)p(x)p(x^\prime)\ \mathrm{d}x\ \mathrm{d}x^\prime\right)\alpha(\mathbf{x}_c)-\beta(\mathbf{x}_c)^\top\beta(\mathbf{x}_c) First integral ~~~~~~~~~~~~~~ Assuming the kernels are Gaussian kernels, and :math:`p(x)` is also Gaussian with mean :math:`\mu` and covariance :math:`\Sigma`, we can derive the integrals analytically. Then: .. math:: \begin{align*} \int\int & K_{\exp(\log\ell)}(x_{c,i}, x)K_{\log\ell}(x, x^\prime)K_{\exp(\log\ell)}(x^\prime, x_{c,j})p(x)p(x^\prime)\ \mathrm{d}x\ \mathrm{d}x^\prime\\\\ &= h_1^4 h_2^2 \int\int \mathcal{N}\left(x_{c,i}\ \big\vert\ x, W_\ell\right)\mathcal{N}\left(x\ \big\vert\ x^\prime, W_{\log\ell}\right)\mathcal{N}\left(x^\prime\ \big\vert\ x_{c,j}, W_\ell\right)\mathcal{N}\left(x\ \big\vert\ \mu, \Sigma\right)\mathcal{N}\left(x^\prime\ \big\vert\ \mu, \Sigma\right)\ \mathrm{d}x\ \mathrm{d}x^\prime\\\\ \end{align*} From [O13]\_, we have: .. math:: \mathcal{N}\left(x_{c,i}\ \big\vert\ x, W_\ell\right)\mathcal{N}\left(x\ \big\vert\ \mu, \Sigma\right) = \mathcal{N}\left(x_{c,i}\ \big\vert\ \mu, W_\ell + \Sigma\right)\mathcal{N}\left(x\ \big\vert\ \mu + \Gamma(x_{c,i}-\mu), \Sigma -\Gamma\Sigma\right), where :math:`\Gamma = \Sigma(W_\ell + \Sigma)^{-1}`. Using this identity for both :math:`x_{c,i}` and :math:`x_{c,j}`, we obtain: .. math:: = h_1^4 h_2^2 \mathcal{N}\left(x_{c,i}\ \big\vert\ \mu, W_\ell + \Sigma\right)\mathcal{N}\left(x_{c,j}\ \big\vert\ \mu, W_\ell+\Sigma\right)\int \mathcal{N}\left(x\ \big\vert\ \mu + \Gamma(x_{c,i}-\mu), \Sigma -\Gamma\Sigma\right)\int \mathcal{N}\left(x-x^\prime\ \big\vert\ 0, W_{\log\ell}\right)\mathcal{N}\left(x^\prime\ \big\vert\ \mu + \Gamma(x_{c,j}-\mu), \Sigma -\Gamma\Sigma\right)\ \mathrm{d}x\ \mathrm{d}x^\prime The innermost integral is a convolution, giving us: .. math:: = h_1^4 h_2^2 \mathcal{N}\left(x_{c,i}\ \big\vert\ \mu, W_\ell + \Sigma\right)\mathcal{N}\left(x_{c,j}\ \big\vert\ \mu, W_\ell+\Sigma\right)\int \mathcal{N}\left(x\ \big\vert\ \mu + \Gamma(x_{c,i}-\mu), \Sigma -\Gamma\Sigma\right) \mathcal{N}\left(x\ \big\vert\ \mu + \Gamma(x_{c,j}-\mu), W_{\log\ell} + \Sigma -\Gamma\Sigma\right)\ \mathrm{d}x This can also be rewritten in convolution form: .. math:: \begin{align*} &= h_1^4 h_2^2 \mathcal{N}\left(x_{c,i}\ \big\vert\ \mu, W_\ell + \Sigma\right)\mathcal{N}\left(x_{c,j}\ \big\vert\ \mu, W_\ell+\Sigma\right)\int \mathcal{N}\left(x_{c,i}-x\ \big\vert\ x_{c,i} - \mu - \Gamma(x_{c,i}-\mu), \Sigma -\Gamma W_\ell\right) \mathcal{N}\left(x\ \big\vert\ \mu + \Gamma(x_{c,j}-\mu), W_{\log\ell} + \Sigma -\Gamma W_\ell\right)\ \mathrm{d}x\\\\ &= h_1^4 h_2^2 \mathcal{N}\left(x_{c,i}\ \big\vert\ \mu, W_\ell + \Sigma\right)\mathcal{N}\left(x_{c,j}\ \big\vert\ \mu, W_\ell+\Sigma\right) \mathcal{N}\left(x_{c,i}\ \big\vert\ x_{c,i} + \Gamma(x_{c,j}-x_{c,i}), W_{\log\ell} + 2\Sigma -2\Gamma\Sigma\right)\\\\ &= h_1^4 h_2^2 \left\vert \Gamma\right\vert^{-1} \mathcal{N}\left(x_{c,i}\ \big\vert\ \mu, W_\ell + \Sigma\right)\mathcal{N}\left(x_{c,j}\ \big\vert\ \mu, W_\ell+\Sigma\right) \mathcal{N}\left(x_{c,i}\ \big\vert\ x_{c,j}, \Gamma^{-1}(W_{\log\ell} + 2\Sigma -2\Gamma\Sigma)\Gamma^{-1}\right) \end{align*} Second integral ~~~~~~~~~~~~~~~ The integral in :math:`\beta(x)` can be expressed analytically as follows, from [O13]\_: .. math:: \begin{align*} \int K_{\log\ell}(\mathbf{x}_c, x) K_{\exp(\log\ell)}(x, \mathbf{x}_c)p(x)\ \mathrm{d}x&=\int h_{\log\ell}^2 h_\ell^2\mathcal{N}\left(x_{s,i}\ \big\vert\ x, W_{\log\ell}\right)\mathcal{N}\left(x_{s,j}\ \big\vert\ x, W_\ell\right)\mathcal{N}\left(x\ \big\vert\ \mu, \Sigma\right)\ \mathrm{d}x\\\\ &= h_{\log\ell}^2 h_\ell^2\int\mathcal{N}\left([x_{s,i}, x_{s,j}]\ \big\vert\ [x, x], [W_{\log\ell}, 0; 0, W_\ell]\right)\mathcal{N}\left(x\ \big\vert\ \mu, \Sigma\right)\ \mathrm{d}x\\\\ &= h_{\log\ell}^2 h_\ell^2 \mathcal{N}\left([x_{s,i}, x_{s,j}]\ \big\vert\ [\mu, \mu], [W_{\log\ell}+\Sigma, \Sigma; \Sigma, W_\ell+\Sigma]\right) \end{align*}