Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Additive multivariate Gaussian processes for joint species distribution modeling with heterogeneous data

Additive multivariate Gaussian processes for joint species distribution modeling with... Bayesian Analysis (2019) Authors’ accepted version, Number , pp. Additive multivariate Gaussian processes for joint species distribution modeling with heterogeneous data Accepted for publication in Bayesian Analysis † ‡ § Jarno Vanhatalo , Marcelo Hartmann and Lari Veneranta Abstract. Species distribution models (SDM) are a key tool in ecology, conser- vation and management of natural resources. Two key components of the state-of- the-art SDMs are the description for species distribution response along environ- mental covariates and the spatial random effect that captures deviations from the distribution patterns explained by environmental covariates. Joint species distri- bution models (JSDMs) additionally include interspecific correlations which have been shown to improve their descriptive and predictive performance compared to single species models. However, current JSDMs are restricted to hierarchical gen- eralized linear modeling framework. Their limitation is that parametric models have trouble in explaining changes in abundance due, for example, highly non- linear physical tolerance limits which is particularly important when predicting species distribution in new areas or under scenarios of environmental change. On the other hand, semi-parametric response functions have been shown to improve the predictive performance of SDMs in these tasks in single species models. Here, we propose JSDMs where the responses to environmental covariates are modeled with additive multivariate Gaussian processes coded as linear models of coregionalization. These allow inference for wide range of functional forms and interspecific correlations between the responses. We propose also an efficient ap- proach for inference with Laplace approximation and parameterization of the in- terspecific covariance matrices on the euclidean space. We demonstrate the bene- fits of our model with two small scale examples and one real world case study. We use cross-validation to compare the proposed model to analogous semi-parametric single species models and parametric single and joint species models in interpo- lation and extrapolation tasks. The proposed model outperforms the alternative models in all cases. We also show that the proposed model can be seen as an extension of the current state-of-the-art JSDMs to semi-parametric models. MSC 2010 subject classifications: Primary 60G15, 60K35; secondary 62P12. Keywords: linear model of coregionalization, hierarchical model, heterogeneous data, spatial prediction, model comparison, Laplace approximation, covariance transformation. First available in Project Euclid: 3 June 2019: Available in Project Euclid: https://projecteuclid.org/euclid.ba/1559548823 Department of Mathematics and Statistics and Organismal and Evolutionary Biology Research Program, University of Helsinki, jarno.vanhatalo@helsinki.fi Department of Mathematics and Statistics and Department of Computer Science, University of Helsinki, marcelo.hartmann@helsinki.fi Natural Resources Institute Finland, Finland, lari.veneranta@luke.fi c 2019 International Society for Bayesian Analysis DOI: 10.1214/19-BA1158 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 arXiv:1809.02432v2 [stat.ME] 8 Aug 2019 2 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 1 Introduction Species distribution models (SDMs) are key tools in the ecologists’ toolbox. They have been widely used, among other applications, to study species habitat preferences (La- timer et al., 2006), to improve identification and management of conservation areas and natural resources (Kallasvuo et al., 2017) and to evaluate species responses to environ- mental filtering under climate change scenarios (Clark et al., 2014; Kotta et al., 2019). The main goals of statistical inference in these contexts are to use species observations and information on the associated environment to infer the relationship between these two attributes and to predict over regions of unsampled locations to build thematic species distribution maps (Gelfand et al., 2006; Elith and Leathwich, 2009). Species distribution modeling is, thus, directly related to inferring species’ responses to environmental factors (Latimer et al., 2006). This is traditionally done using gen- eralized linear or additive models as well as an array of machine learning approaches such as regression trees or maximum entropy modeling (Elith and Leathwich, 2009). These approaches model each species separately and cannot account for species interac- tions nor shared responses to the environment. However, species interaction with other species is potentially as important factor as its response to environment. Moreover, in many practical situations, data from species can be patchy or scarce in which case sharing information between species can significantly improve models’ predictive per- formance (Ovaskainen and Soininen, 2011; Clark et al., 2014; Hui et al., 2013; Thorson et al., 2015; Hartmann et al., 2017). For these reasons, joint species distribution models (JSDM) have gained increasing attention in recent years (Warton et al., 2015). Dunstan et al. (2013) and Hui et al. (2013) introduced species archetype modeling where species’ responses to the environment are clustered into few archetype models. Pollock et al. (2014) proposed to use the multivariate probit regression model (Chib and Greenberg, 1998) to describe geographical co-occurrence patterns between frogs and eucalyptus trees in Australia and Clark et al. (2014) built a JSDM to infer richness and loss of species under climate change scenarios. Thorson et al. (2015) introduced a spatial latent factor model to predict spatial distribution of breeding birds and rock fish communities and recently, Ovaskainen et al. (2017) introduced the hierarchical modeling of species communities (HMSC) framework which includes detailed description of interspecific correlations in covariate responses and spatial random effects. Current JSDMs rely on the classical framework of hierarchical generalized linear models (GLMs). Even though this approach allows flexible modeling by describing the randomness of response variables with different probabilistic models (Nelder and Wed- derburn, 1972), it is still limited by its parametric assumptions. Hence, it may fail to accurately describe a species’ response to environmental conditions (Vanhatalo et al., 2012; Kotta et al., 2019). Here, we propose a semi-parametric JSDM model represented with multivariate Gaussian processes (GPs). GPs are flexible semi-parametric regres- sion models where the regression function is estimated without restrictive parametric assumptions about its form (O’Hagan, 1978; Rasmussen and Williams, 2006). Our model integrates the main strengths of semi-parametric single species models (Vanhatalo et al., 2012; Golding and Purse, 2016) and generalized linear model based JSDMs. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 3 First, we model the species responses to environmental covariates with additive mul- tivariate GPs. This allows us to capture wide range of nonlinear responses and share in- formation about these responses between species. Second, due to the hierarchical model structure, the model can simultaneously accommodate several kinds of outcome vari- ables (observations/measurements) by assigning different types of probabilistic models for them. This is important, since it allows us to exploit different types of measure- ments which commonly arise in real-case scenarios of multiple species surveys. Our presentation focuses in the mostly used probabilistic models in practice, the Bernoulli (Binomial) and Poisson with over-dispersion (Negative-Binomial). We also propose an efficient computational approach build around Laplace method (Golding and Purse, 2016; Vanhatalo et al., 2010). Lastly, we present a structured cross-validation scheme to validate and compare models’ performance in different kinds of prediction tasks. This paper is organized as follows. In Section 2 we introduce a motivating case study. In section 3, we introduce the additive multivariate GPs and in Section 4 we discuss its predictive properties and introduce the computational methods for inference. In Section 5, we illustrate the basic properties of the model through two simple examples and introduce the specific case study model. In Section 6 we present the case study results and we end by discussion and conclusions in Section 7. 2 Motivating case study: coastal species distributions in the Gulf of Bothnia As a motivating example, we study spring distribution of four fish species and three types of algae or macro-vegetation on the coastal region of the Gulf of Bothnia in the northern Baltic Sea. The Gulf of Bothnia is a brackish water basin between Sweden and Finland covering an area of approximately 600 × 120 km. Its coastal areas play a central role in the ecosystem and many Baltic fish stocks are dependent on the coastal regions for their reproduction (Veneranta et al., 2013; Kallasvuo et al., 2017). Coastal zones are also the most sensitive regions of the Baltic sea to both natural variation and anthropogenic pressures (Reusch et al., 2018). Hence, these areas are of central importance for conservation and there is need for detailed knowledge on the distribution of coastal species and predictions concerning their response to environmental changes. 2.1 Case study species and data The studied fish species are the juvenile or adult three-spined stickleback (Gasterosteus aculeatus ) and nine-spined stickleback (Pungitius pungitius ) as well as larvae of white- fish (Coregonus lavaretus ) and vendace (Coregonus albula ). Both whitefish and vendace are commercially important species and the sticklebacks are one key fish fauna in the Gulf of Bothnia ecosystem (Bergstr¨ om et al., 2015). We treat the studied vegetation and algae in functional group level comprising of diatomous algae, filamentous and macro- vegetation. These are the dominating groups of benthic vegetation in shores where larvae of Coregonids (whitefish and vendace) occur at early developmental stages. In the scale imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 4 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 Figure 1: The study region, Gulf of Bothnia, and locations for species data. The region is divided into five subregions (I-V) to be used in cross-validation. The environmental conditions (described by the environmental covariates) are heterogeneous between these regions (Veneranta et al., 2013) which allows to test models’ extrapolation performance. of Gulf of Bothnia, their occurrence also reflects the salinity, nutrient balance and wind exposure of studied area (Veneranta et al., 2013). The case study species reflect the large scale environmental gradients and the changes in the Baltic Sea environment in last decades. Sticklebacks in the Baltic Sea use the shallow coastal zone for reproduction (Bergstr¨ om et al., 2015) and high abundances of sticklebacks in the Baltic Sea have been positively correlated with high occurrence of filamentous algae (Eriksson et al., 2009, 2012). Coregonids in coastal areas prefer more oligotrophic waters (Veneranta et al., 2013). Whitefish and vendace reproduce in stony reefs and islets of Baltic Sea in late autumn and the larvae hatch at ice break-up in spring. In sheltered areas the overwintering reeds (Phragmites australis ) dominate the macro-vegetation in spring. Diatomous algae consist of several epiphytic species that have an early spring bloom at ice break up and settle rapidly over bottom when light and water temperature increase (Busse and Snoeijs, 2002). Filamentous algae in this study consist mostly of Pilayella sp. It is a fast growing annual species that dominates the exposed shores at Baltic Sea in early spring (R¨ onnberg and Bonsdorff, 2004). The whole study area was divided into sampling subareas (Figure 1). Within each sub-area, data were collected at several sampling sites distributed so that they covered the range of most important environmental covariates in that sub-area. Sampling was imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 5 Variable Description Resolution [m] Openness The average openness and exposure to winds 300 Distance to deep Distance to 20 m depth curve 200 Sandy bottom index Area weighted distance to the sandy shores 90 Ice breakup week The end date (weeks) of ice cover in 2009 1852 Chlorophyll-a Chlorophyll-a concentration 1852 River Distance to the nearest river mouth 150 Bottom class Bottom classification to 6 categories 200 Winter salinity Length of ice winter 10,000 Table 1: Environmental covariates used in the case study and their original resolution. For this study, the resolution of all the environmental covariates was scaled to 300m. conducted in 2009-2011. At each site sampling was done approximately one week after the ice break. In the scale of Gulf of Bothnia, the ice break-up happens in a period of approximately one month. The species data used in this work comprise of 160 distinct sampling sites. In 70 sites (2010 data) all species were sampled and the rest of the sites comprise samples for the fishes only (Figure 1). Fish samples comprise the number of caught fish together with information on the sampling effort at each site. The effort was measured as the volume of sampled water (m ). A more detailed description of the data collection is provided by Veneranta et al. (2013) For plant data, the bottom was photographed at 5-13 locations at depth of 30 cm within a distance of 2 m and parallel to shore line. The occurrence of a species was recorded at 16 uniformly distributed points in all these photographs. The case study plants can grow over each others so that presence of one plant at a spatial location does not exclude the presence of other plants. The whitefish and vendace data were previously analyzed by Veneranta et al. (2013) and Vanhatalo et al. (2012) but the data on other species are unpublished. 2.2 Environmental covariates Gulf of Bothnia hosts rich variety of environmental conditions. Coastal areas are affected by inflows from land as well as shallow and complex topography. In the scale of Gulf of Bothnia, there is a gradient in river influence, salinity, temperature and length of ice cover period from north to south. We used seven real-valued and one categorical abiotic environmental covariates that were available throughout the study area as raster maps. Each species observation was combined with covariates from the raster cell where the sample location fell in. These are summarized in table 1 and described and motivated in detail by Veneranta et al. (2013). Due its fresh water origin vendace is a priori sensitive to even small changes in salinity levels experienced in the Gulf of Bothnia whereas the other species respond to salinity only in the Baltic Sea scale. Hence, we used all covariates only for vendace but excluded the winter salinity from other species. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 6 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 3 Hierarchical multivariate species distribution model We start our model building following the generic hierarchical structure similar to the one presented by, e.g., Wikle (2003), Cressie and Wikle (2011) and Banerjee et al. (2015) [data|process, parameters] : π (y(x, s)| f (x, s),η) [process|parameters] : π (f (x, s)|θ) [parameters] : π(η,θ) (3.1) where the first layer of hierarchy is the probabilistic model which defines the conditional distribution for multivariate data Y (x, s), at spatial location s with associated covariates x, given the model parameters η and the multivariate latent process f (x, s). The second layer defines the prior distribution for the latent process given the process parameters θ, and the third layer defines the prior distribution for all unknown parameters. Let j ∈ J = {1,··· , J} be the species index set and J the total number of species in the study. Denote by Y (x, s) = [Y ··· Y ] the J -variate random vector with com- 1 J th T ponents Y = Y (x , s) related to the j species at spatial location s = [s s ] (coordi- j j j 1 2 nates) under the influence of environmental covariates x = [x ··· x ] where c is the j,1 j,c j j j number of environmental covariates for the species j. We denote by x the full vector of covariates and X the covariate space. The species specific covariates x are sub-vectors of x such that x ∈ X where X ⊂ X and X is c -dimensional. Here, we assume that j j j j j T 2 J given a multivariate latent process f (x, s) = [f (x , s)··· f (x , s)] : X × R → R , 1 1 J J the probabilistic model for Y = Y (x, s) factorizes as follows π (y(x, s)| f (x, s),η) = π (y | f , η ) (3.2) Y Y j j j j=1 where f = f (x , s) is fixed but an unknown realization of the j’th process with covari- j j j ates x at the spatial location s. η is the vector of parameters of the probabilistic model j j π for the species j and η = [η ··· η ] . In general any probabilistic model for data Y 1 J could be used and the observation models should be chosen according to the assumed sampling process. Here, we consider in detail the Binomial and the Negative-Binomial (over-dispersed Poisson) models used in our case study (Section 2). These models can be seen as observation models resulting from inhomogeneous point process model for species abundance (Gelfand et al., 2006; Warton and Shepherd, 2010). In practice, each sampling site consists of a small area where the observations are made. In our case study, the area covered by a sampling site is so small compared to the whole study region that the latent function is practically constant in each sampling site. The model for the count of species presences at uniformly distributed points in a sampling site (plants in our case study) is then Binomial given by (Gelfand et al., 2006) z (s) y z (s)−y π (y|f (x, s), z (s)) = p(f (x, s)) [1− p(f (x, s))] I (y) (3.3) Y B {0,...,z (s)} B B where z (s) is the total number of observation points in the sampling site at location s and p(f (x, s)) is the success probability, which will be modeled through the logit here. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 7 The natural model for the number of individuals in a closed area or volume in a sampling site (fish in our case study) is Poisson which we extend to over-dispersed Poisson using the Negative-Binomial given by (see Liu and Vanhatalo, 2018, Section 7.1 for details) r y Γ(r + y) r m(x, s) π (y|f (x, s), z (s), r) = I (y) (3.4) Y N N N 0 y!Γ(r) r + m(x, s) r + m(x, s) where m(x, s) = z (s) exp[f (x, s)] = E(Y |f (x, s), z (x), r) and r is the over-dispersion N N N parameter. The latent process f (·) corresponds to the logarithm of a species (relative) density and z (s) is the sampled volume of water in the sampling site at location s. For this parameterization V (Y |f (x, s), z (x), r) = m(x, s) + m(x, s) /r. Increasing r N N decreases the variance and when r → ∞, the model approaches the Poisson distribution. 3.1 Univariate additive latent Gaussian process First, we assume that marginally for any species j, the process model is given by f (x , s) = β + h (x ) + ε (s) (3.5) j j j,0 j j j 2 2 where β is the offset weight with distribution β |σ ∼ N (0, σ ) and h : X → R j,0 j,0 j j j,0 j,0 is a predictor function of environmental covariates. The predictor function is assumed to be additive over the covariates, h (x ) = h (x ), and each additive term is j j j,r j,r r=1 given an independent zero mean GP prior, so that h (x )|θ ∼ GP 0, k (x , x ;θ ) (3.6) j j j h j,r j,r j,r j,r r=1 0 0 where k (x , x ;θ ) = Cov(h (x ), h (x )|θ ) is a covariance function with h j,r j,r j,r j,r j,r j,r j,r j,r j,r T T T 0 parameter θ and θ = [θ ···θ ] . For example, with k (x , x ; θ ) = x j,r j h j,r j,r j,r j,1 j,c j,r j,r 0 2 2 x σ and θ = σ , the predictor function corresponds to linear model h (x ) = j,r j,r j,r j,r j,r j,r x β where β ∼ N (0, σ ) (Rasmussen and Williams, 2006). With other choices j,r j,r j,r j,r of covariance functions we can model non-linear responses in which case the model can be seen as an alternative to the traditional generalized additive models (GAMs, Hastie and Tibishirani, 1986). The general form of an additive GP prior would include also joint effects of covariates (Duvenaud et al., 2011) but this is not considered here. The term ε (s) is a spatial GP, 2 0 2 ε (s)|σ ,` ∼ GP 0, k (s, s ;` , σ ) (3.7) j j  j j j j 0 2 2 where k (s, s ;` , σ ) is a spatial covariance function with variance σ and range param- j j j eter ` . When we consider independent models for each species, that is the traditional single species SDMs (SSDMs), the processes h and  are mutually independent among j,r j all species. The model outlined this far is similar to the GP based species distribution models used by, e.g., Vanhatalo et al. (2012), Kallasvuo et al. (2017) and Golding and Purse (2016). Next, we introduce models which consider dependency. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 8 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 3.2 Additive multivariate Gaussian process priors In order to account for possible species interdependence, we first include spatial species interaction into the model through the linear model of coregionalization (LMC) (Mardia and Goodall, 1993; Gelfand et al., 2004). Write ε(s) = [ε (s)··· ε (s)] and assume that 1 J ε(s) has LMC covariance structure with species specific correlation functions k (·,· ;` ). The covariance structure of the LMC with interspecific spatial dependence is then, 0 0 0 Cov ε (s), ε 0 (s )|Σ ,` = u (j, j )k (s, s ;` ) (3.8) j j  ,l  l l=1 T T T 0 0 T with ` = [` ···` ] and u (j, j ) the entry (j, j ) of U = L L where L is ,l ,l ,l ,l 1 J ,l th the l column of the Cholesky decomposition L of the coregionalization matrix Σ , that is matrix of interspecific correlations between spatial GP. Hence, the vector-valued latent process f (x, s) = [f (x , s)··· f (x , s)] unconditional on β ,. . .,β follows a 1 1 J J 1,0 J,0 multivariate GP which we denote as c J X X 0 0 f (x, s)|Λ ∼ MGP 0, Σ + k (x , x ;θ) + k (s, s ;` )U (3.9) 1 0 h r  j ,j r r j r=1 j=1 T T T 2 2 0 where Λ = (Σ , Σ ,θ,`), Σ = diag(σ , . . . , σ ), θ = [θ ···θ ] and k (x , x ;θ) = 1 0  0 h r 1,0 J,0 1 J r r 0 0 diag k (x , x ;θ ),··· , k (x , x ;θ ) . If the predictor functions, h , were h 1,r 1,r h J,r J,r j,r 1,r 1,r J,r J,r linear, the prior (3.9) would correspond to traditional multivariate spatial model with independent linear effects and spatial LMC (Gelfand et al., 2004). We extend (3.9) to an additive multivariate GP prior where the dependence between the species specific additive predictor functions is modeled with LMC. We demonstrate this with a model where the set of covariates is equal for all species, that is, dim(X ) = c and x = x for all j. Then the model is written as c J J XX X 0 0 ˜ ˜ f (x, s)|Λ ∼ MGP 0, Σ + k (x , x ;θ )U + k (s, s ;` )U 2 0 h r r j,r h ,j j j ,j j,r r=1 j=1 j=1 (3.10) where Λ = (Λ , Σ , . . . , Σ ) and Σ is the interspecific covariance matrix between 2 1 h h h 1 c r the species specific predictor functions of r’th covariate, k (·,·;θ ) is a correlation hj,r j,r T th function related to the predictor function h , U = L L and L is the j j,r h ,j h ,j h ,j r r r h ,j column of the Cholesky decomposition L of Σ . When the set of covariates is not the h h r r same for all species, covariate specific predictive functions are correlated only between species that share those same covariates. A JSDM with multivariate GP prior (3.10) can be interpreted as extension of GP based SSDMs (Vanhatalo et al., 2012; Golding and Purse, 2016) to joint species mod- eling similarly as done with the generalized linear model based JSDMs (Pollock et al., 2014; Ovaskainen et al., 2017). The enhanced inferential ability of the multivariate ad- ditive GP compared to univariate GP models lies in its capability to infer similarity imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 9 (dissimilarity) in species specific responses to covariates and in the spatial random ef- fect. The similarity/dissimilarity of responses of two species j and j along a covariate r is indicated by a positive/negative value for correlation and hence, examining the LMC covariance matrices, Σ , can provide new insight to species to species associations. 3.3 Marginally uniform priors for correlation parameters Here, we define the prior for coregionalization covariance matrices but define the co- variance functions and rest of the priors in Section 5. A standard choice for prior for correlation matrices is the inverse Wishart distribution. However, if there is no prior information on interspecific correlations, we follow Hartmann et al. (2017) and suggest to use marginally uniform priors for the LMC covariance matrices Σ and Σ . These can be achieved by the distribution of Barnard et al. (2000) and Tokuda et al. (2012). Let P be a correlation matrix with elements ρ . A prior distribution that is marginally j,j noninformative for the correlations, that is, the marginal distribution for every ρ is j,j uniform over (−1, 1), is achieved with the distribution v J Γ( ) 1 v 2 (v−1)(J−1)−1 − 2 2 π(P|v) = |P| |P | 1 (detP ) (3.11) jj (0,∞) Γ ( ) j=1 when v = J − 1. Here, Γ (·) is the multivariate gamma function and matrix P is J jj th th obtained by removing the j column and j row ofP . When v increases, the probability density (3.11) becomes increasingly concentrated around the origin. 4 Posterior inference and predictive properties Given a set of species observations at locations s , i = 1, . . . , n , respectively for each i j j species j = 1, . . . , J , the likelihood can be written as J j Y Y π(y| f,η) = π (y | f , η ) (4.1) j j,i j,i j j j j=1 i =1 where y is the i ’th observation of species j at the i ’th spatial location s asso- j,i j j j i T T T ciated with a set of covariates x ∈ X . The observed vector y = [y ··· y ] with i j j 1 J T T T T y = [y ··· y ] collects all the species observations and f = [f . . . f ] , where j,1 j,n j j 1 J f = [f ··· f ], collects the corresponding latent variables. Hence, the likelihood j,1 j,n j j factorizes over the latent variables and the inference can be done similarly as with uni- variate GP models. Markov chain Monte Carlo (MCMC) would provide exact answers in the limit of large sample size but deterministic approximations such as Laplace approx- imation or expectation propagation algorithm have also been shown to provide accurate approximate inference for univariate GP models with much lower computational time (Nickisch and Rasmussen, 2008; Rue and Marino, 2009; Vanhatalo et al., 2010). In order to study the properties of the model, we will examine the Laplace approximation for the posterior of latent variables conditional on the hyperparameters. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 10 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 4.1 Posterior predictive inference conditional on hyperparameters Posterior of latent variables The Laplace’s method approximates the conditional posterior of the latent function values f = f (s , x ) at the spatial location s with covariates x as ∗ ∗ ∗ ∗ ∗ −1 −1 −1 π(f | y,η, Λ) ≈ N f |C C f, C − C (C + W) C (4.2) ∗ ∗ ∗,f ∗ ∗,f f,∗ where f is the (conditional) maximum a posterior (MAP) estimate of latent variables, J j X X f = arg max log π (y | f , η ) + logN (f | 0, C ) (4.3) j j,i j,i j j j j j f∈R j=1 i =1 and W is the Hessian matrix of the negative log-likelihood function evaluated at f . Here, C is the prior covariance matrix of f , C is the (prior) covariance matrix between ∗,f elements of f and f . C is the prior covariance of f . In case of multivariate additive ∗ ∗ ∗ GP (3.10) the prior covariance matrix is given by     ˜ ˜ σ J ··· 0 u (1, 1){K } ··· u (1, J ){K } n h ,j h 1,1 h ,j h 1,J 0 1 j,r j,r c J r r XX  . .   . .  . . . . . . . . C = +     . . . . . . 2 r=1 j=1 ˜ ˜ 0 ··· σ J u (J, 1){K } ··· u (J, J ){K } 0 J h ,j h J,1 h ,j h J,J j,r j,r r r   ˜ ˜ u (1, 1){K } ··· u (1, J ){K } ,j  1,1 ,j  1,J J j j   . . . . . + (4.4)   . . j=1 ˜ ˜ u (J, 1){K } ··· u (J, J ){K } ,j  J,1 ,j  J,J j j 0 0 0 where J is an n×n matrix full of ones u (j, j ) and u (j, j ) are the entry (j, j ) of U n ,l h ,l ,l ˜ ˜ 0 0 and U (see (3.8)), and{K } and{K } are correlation matrices with elements h ,l  j,j h j,j r l l,r ˜ ˜ ˜ ˜ [{K } 0 ] = k (s , s ;` ) and [{K } 0 ] = k (x , x |θ ). The j,j i ,i 0 ,l i i 0 l h j,j i ,i 0 h i ,r i 0,r l,r l j j l,r j l,r j j j j j other correlation matrices are constructed similarly. If data on all species are available at all sampling locations, the covariance matrix reduces to Kronecker product similarly as in LMC models by Mardia and Goodall (1993) and Gelfand et al. (2004), so that P P P c J J ˜ ˜ C = Σ ⊗ J + U ⊗ K + U ⊗ K where n = n for all 0 n h ,j h ,j  j r=1 j=1 j,r j=1 j j,r j = 1, . . . , J . Since, in general, the second and third term of C are full matrices, it can be seen from (4.2) and (4.3) that the posterior of f and the posterior predictive distribution of f are affected by observations of all species at any spatial locations. The marginal posterior of additive terms Typically, we are also interested in species’ responses along covariates encoded by the additive predictor functions. Their marginal posteriors, conditional on hyperparame- ters, are analogous to (4.2). The matrices C and C are only constructed using the ∗ ∗,f In case of Gaussian observation model, this equals to the true posterior density of f | y,η, Λ. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 11 correlation functions corresponding to the latent function of interest. For example, to study the response of species j along covariate x we evaluate the posterior predictive distribution for h (x ) using (4.2) so that we replace C j,r r ∗,f h i ˜ ˜ C = u (j, 1){K } ,··· , u (j, J ){K } , (4.5) h ,f h ,l h j,1 h ,l h j,J j,r r l,r l,r l=1 and similarly for C . However, in case of the traditional LMC (3.9) only the spatial random effects are correlated between species whereas the predictor functions are not. 0 2 0 Hence, Σ is diagonal for all r so that u (j, j ) = σ if l = j = j and zero otherwise. h h ,l r r l The second terms of (4.4) is then block diagonal and only the j’th block column in (4.5) is non-zero for which reason there is no information exchange between species specific predictor functions. In case of univariate GP prior all the terms in (4.4) are block diagonal and (4.2) reduces to J independent Gaussian distributions with no information exchange among species at all. Hence, the predictive performance of the univariate GP and the two multivariate GP models are very different. This is illustrated in section 5. The (marginal) posterior expectation and variance for new observations When predicting species occurrence probability or abundance, we need to marginalize over the posterior of the latent variables. The posterior expectation and variance for a new outcome for species j at a location (x , s ) conditional on hyperparameters are ∗ ∗ E[Y (x , s )| y,η, Λ] =E[E(Y (x , s )|f (x , s ), y,η, Λ)] (4.6) j ∗ ∗ j ∗ ∗ j ∗ ∗ V[Y (x , s )| y,η, Λ] =E[V(Y (x , s )|f (x , s ), y,η, Λ)] + j ∗ ∗ j ∗ ∗ j ∗ ∗ V[E(Y (x , s )|f (x , s ), y,η, Λ)] j ∗ ∗ j ∗ ∗ When the probabilistic model for species j is assumed to be the Negative-Binomial or Binomial model with logistic link function (3.3) we can find either an exact result or an analytical approximation for these expectations and variances. These are given in the Supplementary material. These solutions speed up the predictive calculations considerably compared to numerically integrating f over R. Conditional scenario predictions In some applications, one might be interested in scenario predictions conditional on changes in species composition. For example, one might be interested in how removing from or introducing specific species into an area would affect other species. In our case study setting, we could be interested in, for example, effects of management actions that would clean filamentous algae from the shoreline. Such scenario predictions would be naturally tackled with predictive causal inference (Lindley, 2002; Pearl, 2009). In predictive causal inference the parameters of the model are inferred with the available data so far and predictions made considering alternative scenarios. To illustrate this lets first introduce a short notation y = y (x , s ) and f ∗ ∗ J ,∗ J J ,∗ 1 1 1 = f (x , s ) where J denotes the set of species to be predicted and J the scenario ∗ ∗ 1 2 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 12 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 species assumed to be “managed”. For brevity, we omit the conditioning on hyperpa- rameters and data. The conditional distribution of Y (x , s )| y is J ∗ ∗ 1 J π(y | y ) = π(y | y , f )π(y , f )df /π(y ) (4.7) J ,∗ J ,∗ J ,∗ J ,∗ J J ,∗ J 1 J 1 1 J 1 2 1 2 2 2 where Y (x , s )| y , f only depends on f . The Laplace approximation for this J ∗ ∗ J ,∗ J ,∗ 1 J 1 1 conditional predictive distribution is shown in Supplementary material. 4.2 Parameter inference We used Laplace approximation (Vanhatalo et al., 2010) to approximate the conditional posterior of latent variables π(f | y,η, Λ) and the marginal likelihood of the hyperparam- eters π(y|η, Λ) = π(y| f,η)π(f |Λ)d f . We searched for the (approximate) maximum a posterior (MAP) estimate for hyperparameters given by (η, Λ) = arg max log q(y|η, Λ) + log π(η, Λ). (4.8) η,Λ where log q(y|η, Λ) is the Laplace approximation for the log marginal likelihood for pa- rameters. This approach produces also good approximation for the posterior predictive distribution for latent variables (Vanhatalo et al., 2010; Vehtari et al., 2016), which are the main interest in this work. Hence, we fixed hyperparameters to their MAP estimate. In order to avoid constrained optimization, all the parameters were transferred to unconstrained space. We used log transformation for covariance function parameters, and for the interspecific correlation matrices we used the transformation presented by Kurowicka and Cooke (2003) and Lewandowski et al. (2009), which is a bijective map- ( ) ping between the space of correlation matrices and R . This is summarized in Sup- plementary material. We used scaled conjugate gradient optimization for locating the MAP and checked carefully that a (local) mode had really been found by verifying that gradients along all hyperparameters were zero. The required gradients of log q(y|η, Λ) were solved analytically as described by Rasmussen and Williams (2006) and Vanhat- alo et al. (2010). The Supplementary material summarizes the derivatives w.r.t. to the covariance parameters in Σ , Σ , r = 1, . . . , c. 5 Experiments Here, we first illustrate the properties of the hierarchical multivariate GP models with two simple examples. These examples highlight particular properties of the proposed model. After this we introduce the model and analysis for our case study (Section 2). We implemented all the models using the GPstuff package (Vanhatalo et al., 2013). 5.1 Demonstration with simulated spatial data Figure 2 presents simulated data and posterior predictions for spatial modelling of two species. The model follows spatial LMC; that is model (3.9), where the covariate terms imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 13 Figure 2: Illustration of spatial multivariate GP prior for JSDM with two species and spatial only component. Crosses and dots are simulated data locations of species 1 (Binomial data) and species 2 (Negative-Binomial data). See Section 5.1 for discussion. are dropped out. The spatial correlation function used in this demo is the Gaussian 0 2 2 0 −|| s− s || /2l 0 k(s, s ) = e , where || s− s || is the Euclidean distance and l is the length- scale. The first row of subplots shows the posterior predictive probability of observing species 1 (E[Y ], Y (s) ∼ Binomial) and the second row shows the expected number of 1 1 species 2 (E[Y ], Y (s) ∼ Negative-Binomial). Plots (a) and (e) show the predictions 2 2 when the species observations are from the same region but not from exactly the same locations. In this case, the prediction of multivariate GP is similar to predictions by univariate GPs. Plots (b) and (f ) show the predictions when data are available on both species from the lower left corner and additionally data on species 1 is available from upper right corner. There is positive correlation between species, which has been inferred from the data in the lower left region, so the prediction for species 2 in upper right corner is informed by data on species 1 in that region. The last two columns illustrate the conditional scenario predictions (4.7). In the third column, the training data is the same as in the first column and the expected values in plots (c) and (g) show joint scenario prediction in new region of same size and shape. In this scenario, species in the new region were observed so that species 1 is known to be in the top right, and species 2 in the bottom left. The fourth column shows the conditional scenario prediction for both of the species separately so that the grey marks show the locations where the other species would be introduced in these scenarios. 5.2 Demonstration with time series of species abundances Next, we consider the classical predator-prey system containing annual population counts of hare and lynx in the northern Canada from 1845 to 1935 (Odum, 1953). The data is not spatially explicit since the observations are total counts over the region so we model it as time series. This illustrates the behavior of individual additive co- variate effect in the multivariate GP model (3.10) with time being the covariate. Let’s denote by Y and Y the population sizes of hare and lynx at time t, and assume that 1,t 2,t imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 14 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 Figure 3: The results from the analysis of hare and lynx interaction (predator-prey system) analyzed with single and joint SDM. Plots a) and c) show the posterior mean and 95% credible interval of the latent functions and plots b) and d) show the training data, test data and posterior expectation of observations. See Section 5.2 for explanation. Y |f (t) ∼ Poisson(exp(f (t)) and Y |f (t) ∼ Poisson(exp(f (t)). We compared two 1,t 1 1 2,t 2 2 alternative priors for the latent functions, one with independent GP priors and another with joint bivariate GP prior with the LMC structure. In order to compare models’ pre- dictive performances when some species have not been observed, we removed parts of the original data from the training set, the period of 1870-1900 for hare and 1850-1870 for lynx. Figure 3 displays the result of the data analysis. The model with bivariate GP prior clearly outperforms the independent model since it predicts better the test data points in periods where training data was removed. Moreover, its predictions have also smaller uncertainty than the predictions of independent GPs in those periods of time. 5.3 The case study on coastal species distribution Case study models The plant data are modeled with the Binomial observation model (3.3) and the fish data are modeled with the Negative-Binomial observation model (3.4). In order to test the effect of different model components and the effect of semi-parametric response functions versus standard quadratic response functions we compare the following models: 1) additive univariate GP predictor functions and univ. spatial random effects (3.5), 2) additive univariate GP predictor functions and LMC spatial random effect (3.9), 3) additive multivariate GP predictor function and LMC spatial random effect (3.10), 4) additive univariate GP predictor functions only (model (3.5) without  (s)), 5) univariate spatial random effects only (model (3.5) without h (x )), j j imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 15 6) additive univariate quadratic predictors and univariate spatial random effects (model (3.5) with h (x ) = x β and x includes the covariates and their squares), j j j j j 7) additive univariate quadratic predictors and LMC spatial random effect (model (3.9) with h (x ) = x β and x includes the covariates and their squares), j j j j j 8) additive multivariate quadratic predictors and LMC spatial random effects (model (3.10) with h (x ) = x β and x includes the covariates and their squares). j j j j j 9) additive multivariate quadratic predictors only (model (3.10) without (s) with h (x ) = x β and x includes the covariates and their squares). j j j j j In each model, the spatial random effects were given the M´ atern covariance function 0 2 2 0 − 3r (s,s ) with ν = 3/2 degrees of freedom k (s, s ; σ ,` ) = σ 1 + 3 r (s, s ) e j j j j j where σ is the variance parameter, ` = [` ` ] is the vector of length-scales j,1 j,2 j j 0 0 T 2 −1 0 1/2 and r (s, s ) = [(s− s ) (diag(` ) ) (s− s )] . The continuous covariate effects in j j the additive GP models (models 1-4) were given the Gaussian correlation function 0 2 2 0 −||x −x || /2l ˜ r r j,r k (x , x ) = e . The additive linear models (models 6-8) were coded h ,r r j r 0 0 2 as additive GPs with k (x , x ) = x x σ . Even though this is not a proper correla- h r r j,r r r j,r tion function, when used in LMC it implies a generalized linear model with interspecific dependencies between weights, β , that are the key components in state-of-the-art para- metric JSDMs. See Discussion for details. For the categorical covariate (Bottom class, 0 2 0 0 table 1) we used a delta function so that k (x , x ;θ) = σ δ (x ), where δ (x ) = 1 h r x x j,r r δ r r r r if x = x and zero otherwise. This corresponds to having an own intercept for each category. We used the marginally uniform priors (section 4.1) for the between species correlations and weakly informative priors for the rest of the parameters. The variance parameters were given Half -Student-t (μ = 0, σ = 4, ν = 4) priors and the length-scale parameters were given Half -Inverse-Student-t (μ = 0, σ = 1, ν = 4). Hence, a priori more weight is given for smooth functions with small variability. Model validation and comparison We aim to provide models that give reliable posterior predictions. Hence, it is natural to compare models with the goodness of their posterior predictive distributions. This can be done efficiently with cross-validation (CV) using the log predictive density diagnostics (Vehtari and Ojanen, 2012). Let D denote the full data-set. Fix K disjoint sub-sets of D, say D , . . . ,D , such that their union is D. The K-fold CV log point-wise marginal 1 K predictive density is then L (D) = log π(y |D ) where D = {∪ D : K i ∼k(i) ∼k(i) r i=1 r=1 r 6= k(i)} and k(i) is such that y ∈ D . We compare the models with the leave-one- k(i) out CV (LOO-CV, K = n) and structured 5-fold CV. We used the MAP estimate for the hyperparameters. In 5-fold CV we found the MAP for each training set separately. The LOO-CV was conducted at the MAP found with full data and we used Laplace approximation to approximate the LOO-CV log predictive densities (Vehtari et al., 2016). Laplace approximation for LOO-CV is shown to work well in GP models and, since our data is rather large, leaving only one data point out of the training set has only negligible effect on the posterior of the hyperparameters (Vehtari et al., 2016). The rationale for calculating both LOO-CV and 5-fold CV comparison is the follow- ing. Since multiple species were sampled at every sampling site and each of the sampling imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 16 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 GP models LOO Cross Validation 5-fold Cross validation 1) Univariate GP h(x), univ. (s) -2.230 (0.082) -2.465 (0.094) 2) Univariate GP h(x), multiv. (s) -2.081 (0.080) -2.480 (0.100) 3) Multiv. GP h(x), multiv. (s) -1.669 (0.080) -2.316 (0.086) 4) Multiv. GP h(x) only -2.228(0.089) -2.590(0.012) 5) Multiv. (s) only -2.351(0.087) -2.500(0.086) 6) Univ. quadr. h(x), univ. (s) -2.262 (0.083) -2.517 (0.095) 7) Univ. quadr. h(x), multiv. (s) -2.117 (0.084) -2.484 (0.109) 8) Multiv. quadr. h(x), multiv. (s) -2.105 (0.084) -2.416 (0.099) 9) Multiv. quadr. h(x) only -10.246(1.297) -9.305(1.008) Table 2: Model comparison with leave-one-out (LOO) and 5-fold cross validation using the mean point wise log marginal predictive density statistics (and its standard error) over all species. see Section 5.3 for model descriptions. sites has other sites spatially nearby it, the LOO-CV log predictive densities are affected significantly by the spatial random effects. The LOO-CV, thus, measures models’ inter- polation performance which can be good even if the models were not able to represent well responses along covariates (predictor functions) (Vanhatalo et al., 2012; Veneranta et al., 2013). For this reason, we structured the 5-fold CV by dividing the data into five spatially distinct groups corresponding to regions I-V in Figure 1. The sampling sites in different groups are spatially so far from each others that the spatial random effects do not affect the posterior predictive distributions for test groups. Hence, the 5-fold CV tests mostly the extrapolation performance of a model, which is governed by the goodness of the predictor functions, whereas the LOO-CV tests the interpolation performance, which is governed also by the spatial random effects. 6 Results 6.1 Predictive performance of models Table 2 summarizes the CV comparisons over all species and tables 1-2 in Supplementary material show the models’ predictive performance for each species separately. Tables 3- 4 in Supplementary material show the pairwise comparison of the CV point wise log predictive densities between the best GP/parametric model (models 3 and 8) and the other GP/parametric models with both the covariate effects and spatial random effect (models 1-2 and 6-7). The results show that model 3, which includes multivariate GP predictors and multivariate spatial random effects (3.10), is the best in both LOO-CV −2 and 5-fold CV with a difference of 10 or more to the other models. Since the mean log predictive density is an average of n = 850 point-wise predictions the difference −2 of 10 corresponds to a difference of 8.5 in log (point-wise) joint predictive densities. Analogously to Bayes factors, which compare log prior predictive distribution, this can be considered a significant difference between two models (Kass and Raftery, 1995). Hence, the multivariate GP (model 3) has significantly better overall predictive performance over all species than the other models. According to posterior predictive imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 17 Figure 4: The maximum a posterior estimate of the correlation matrices in the multi- variate additive GP model with Gaussian covariance function (model 3). checks (Gelman et al., 2013) there are no serious discrepancies between its predictions and observed data. In extrapolation (5-fold CV) model 3 performs best also for all species separately. However, in interpolation (LOO-CV) it is not the best for all species separately (tables 1-2 in Supplementary material). Moreover, the semi-parametric GP models (models 1-4) work better than the corresponding parametric models (quadratic environmental response, models 6-9) in both interpolation and extrapolation in general. Dropping either spatial random effect or covariate effect out from the model decreases its performance clearly. All models work better in interpolation than in extrapolation and compared to univariate models including interspecific correlations either in spatial random effects or also in environmental predictors improves models’ performance in both of these tasks. Moreover, including interspecific correlations between environmen- tal responses, h (x ), improves the extrapolation performance relatively more than j,r r including interspecific correlations only for spatial random effects. Hence, multivariate spatial random effect improved more interpolation whereas multivariate predictors has larger effect for extrapolation as would intuitively be expected. 6.2 Effects of environmental covariates The interspecific correlations (Figure 4) show that the responses to environment are similar among Coregonids and among sticklebacks whereas there are clear differences among these groups. These fish groups respond differently to sandy bottom and distance to deep. Moreover, there is strong positive spatial correlation among Coregonids and among sticklebacks but not between these groups. All species have negative spatial correlation with filamentous algae whereas there is either weak positive or negligible spatial correlation between fish species and macrovegetation and diatomous algae. Figure 5 shows the marginal effects of the predictor functions for both univariate (SSDM) and multivariate (JSDM) GP models calculated as described in section 4.1. In general, the responses in JSDM and SSDM are similar. In most of the cases the uncertainty in response function is smaller (narrower credible regions) in JSDM than in SSDM although the differences are not large. Most of the responses are smooth but imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 18 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 Figure 5: Marginal latent responses along covariates, h . Grey corresponds to single j,r species GP model (model 1) and black to multivariate additive GP model (model 3). show patterns that would be hard to capture with a quadratic function. For example, three and nine spine stickleback, macrovegetation and filamentous algae show logistic style response to either distance to deep or openness. Moreover, the response of vendace on ice break up week is first constant but has very steep increase after week 16. The responses to ice break up or chlorophyll-a show non-linear and non-quadratic responses also for white fish, macrovegetation and filamentous algae. The MAP estimates of other hyper-parameters are summarized in Supplementary material. 6.3 Spatial predictions Figure 6 shows the distribution maps as posterior median of intensity and expected count of individuals in replicate sampling produced by SSDM (model 1) and JSDM (model 3) for vendace. The distribution maps for other species together with maps on posterior predictive variance are shown in Supplementary material. In broad scale the posterior median of the intensity looks similar with SSDM and JSDM whereas SSDM predicts larger species counts than JSDM for all species throughout the study region and the uncertainty related to SSDM predictions is larger than that of JSDM predictions. Both SSDM and JSDM predict that vendace is distributed mostly in the northern Gulf of Bothnia. However, SSDM predicts somewhat higher median intensity than JSDM also in the southern Gulf of Bothnia. In relation to median intensity similar pattern that imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 19 Figure 6: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for vendace predicted by SSDM (model 1) and JSDM (model 3). JSDM predicts more restricted distribution range than the single species model is seen also in the predictions concerning sticklebacks and whitefish. In case of diatomous and filamentous algae SSDM and JSDM predict the distribution pattern similarly whereas for macrovegetation JSDM predicts slightly larger distribution ranges. The posterior distributions of spatial lenght-scale parameters were concentrated near one kilometer or less (see table 5 in Supplementary material). Hence, the differences in distribution predictions cannot be explained by the spatial random effects over large areas but the spatial random effect explains local deviations from the predictions based on covariates. 7 Discussion and concluding remarks 7.1 Case study results The GP based SDMs had better predictive performance than the parametric SDMs and JSDMs had better predictive performance than the corresponding SSDMs. The differences between SSDM and JSDM are most apparent in the distribution maps and predictor functions. The JSDM predicted in general smaller distribution ranges than SSDMs (macrovegetation being the only exception) and the posterior uncertainty in their predictions were smaller than in SSDMs. When interpreting the results, it should be remembered that the sampling was tar- imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 20 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 geted to larvae of sea-spawning Coregonids (whitefish and vendace) and planned to cover their plausible distribution range in terms of spatial (coastal areas of Gulf of Bothnia) and temporal extent (spring). Other species were sampled alongside Coregonids and the sampling area covers only limited portion of their full distribution range in the spring. Moreover, the abundance and distribution of all the studied species vary annually. Fish change their distribution areas seasonally and their larval stages last only few weeks. Vegetation and algal cover vary due changes in temperature and ice effects. For these reasons the results are most representative for larvae of whitefish and vendace. For other species the results describe their spring distribution in the shallow coastal regions only. Our results correspond rather well to the earlier knowledge on the studied species. The responses to the environmental covariates and the interspecific correlations (Fig- ures 5 and 4) indicate that whitefish and vendace larvae are, in general, distributed in different areas than sticklebacks. Most of the literature on Baltic Sea sticklebacks focus on three-spined sticklebacks whereas the nine-spined stickleback is not well stud- ied. In early spring, stickleback abundances have been found to be highest in sheltered archipelago areas, where part of the population overwinter. High abundance of stick- lebacks are typically thought to indicate structural complexity on the bottom; such as the presence of stones and boulders as well as reeds and other macrovegetation that function as shelter and provide food (Peltonen et al., 2004; Sieben et al., 2011). On contrary, highest densities of whitefish and vendace larvae are observed in open sandy shores near deep areas and in structurally simple shores, without macrovegetation, boul- ders and stones (Hudd et al., 1988; Leskel¨ a et al., 1991; Veneranta et al., 2013). The distribution of vendace is highly influenced by ice break up week and salinity so that long ice cover period and low salinity increase their abundance. Ice winter is longer and salinity is lower in the northern than in the southern Gulf of Bothnia and thus it correlates positively to Coregonid presence (Vanhatalo et al., 2012). In general, optimal habitats for Coregonid larvae are mainly located in the northernmost Gulf of Bothnia. Sticklebacks are known to prey mainly on mesozooplankton, but also on grazers (Pel- tonen et al., 2004; Sieben et al., 2011). Sticklebacks feed also on fish larvae if available (Bystr¨ om et al., 2015), but there are no studies on potential predation risk to Corego- nids. Based on our results, in the scale of coastal region of Gulf of Bothnia, sticklebacks are not thread to Coregonid larvae since their high density areas do not overlap. More- over, there was no spatial correlation between sticklebacks and coregonids (Figure 4) which could indicate interspecific interaction of any kind. The presence of sticklebacks has been connected to higher eutrophication status and stickleback reproduction and growth benefit from increasing temperature and eutrophication (Lef´ ebure et al., 2011; Candolin et al., 2008; Meier et al., 2012) On contrary, these environmental characteris- tics are assumed to affect negatively the Coregonid reproduction (M.Cingi et al., 2010; Muller ¨ , 1992; Veneranta et al., 2013; Vanhatalo et al., 2012). In our results whitefish and vendace larvae have positive response to Chlorophyl-a, which is a strong indicator for increasing eutrophication. However, in the scale of Gulf of Bothnia Chlorophyl-a concentration is typically higher near river mouths where salinity is lower and nutrient inflow high. Hence the result more likely reflects the high river influence through low salinity than preference for eutrophicated water. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 21 The vegetation and algae distribution in our results reflect the nutrient status during winter and early spring as well as effect of ice cover and ice scraping in wind exposed shallow areas. Both the SSDMs and JSDMs indicate that filamentous algae occur in all coastal areas in high densities, except in some sheltered inner coastal areas. Macroveg- etation and diatomous algae had highest presence probabilities at areas with lower filament presence probability. This pattern agrees well with the general understand- ing of these species groups. Filamentous algae are typically distributed in wind and wave exposed shores, that epiphytic diatoms and reeds that require shelter cannot tol- erate. Filamentous algae can, however, grow over macrovegetation and it is possible that macrovegetation distribution is underestimated if filamentous algae growth over macrovegetation has hided macrovegetation from the sampling pictures. The higher nu- trient levels in sheltered areas have been found to affect positively reed belt growth, and high abundances of macrovegetation species have been found from archipelago areas, lagoons, bays and river inlets (Pitk¨ anen et al., 2013; Altartouri et al., 2014). The JSDM model reflects these smaller scale occurrences better than SSDM. In our results, diato- mous algae are more common in estuaries and northern parts of the study area (see Supplementary material). This is likely explained by longer ice winter towards northern Gulf of Bothnia and longer distance to deep water. 7.2 Multivariate additive Gaussian process Next we discuss some of the similarities and differences between our model and the existing JSDMs and highlight the novel methodological contributions in this paper. Interspecific correlations From the predictive point of view, the interspecific correlations in predictive functions are attractive for many reasons. In many applications of SDMs the aim is to pre- dict species distribution over regions that include locations spatially far from the data (Record et al., 2013) or to conduct scenario predictions related to, for example, climate change or land use (Guisan et al., 2013). In these applications, predictions are based on the responses to environmental covariates. The inclusion of interspecific dependence allows information flow between species which improves the estimates for predictive functions especially for species with only scarce data (Thorson et al., 2015; Ovaskainen et al., 2017; Hui et al., 2013; Clark et al., 2014). We used the marginally uniform priors of (Barnard et al., 2000; Tokuda et al., 2012) for the interspecific correlations. This is justified by prior ignorance on correlations. Prior information on interspecific correlations could, however, be added into model with, e.g., informative inverse Wishart prior. With many species inferring the full covariance matrix is hard in which case we could use spatially dependent latent factors which induce for the 0 0 0 spatial random effects a covariance structure Cov  (s),  (s ) = λ λ k (s, s ), j qj qj q q=1 where k is the q’th spatial covariance function and λ the corresponding species specific q qj factor loading. Here, M < J so that this covariance structure is a low rank represen- tation of LMC (Lopez, 2000; Lopez et al., 2008). Latent factor representation was first introduced to SDMs by Thorson et al. (2015) and it is used also in HMSC Ovaskainen imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 22 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 et al. (2017). Recently Taylor-Rodr´ ıguez et al. (2017) proposed a clustering scheme where the species are clustered to less than J factor loading vectors. Interspecific correlation between response functions has received less interest. Typi- cally the response functions are assumed to be independent among species. Exception are species archetype models (Dunstan et al., 2013) and the HMSC framework of Ovaskainen et al. (2017). In the latter, the response functions are defined as h (x ) = x θ where j j j j· T T T θ = [θ , . . . , θ ] is a J × p matrix of regression weights with hierarchical prior. The 1· J· species specific weights are given Gaussian prior θ ∼ N (μ , V ) where μ is the ex- j· j· j j· pected response of a species that can be common for all species, common within groups of species or modelled through species traits μ = τ γ where τ is a vector of trait jr r j values of species j and γ ∼ N (0, V ) are the effects of traits to response along covariate r γ 2 2 r. With V = σ I and common prior mean μ = μ ∼ N (0, σ ) for all j ∈ 1, . . . , J , J×J j· the induced covariance between species specific additive response functions is 0 0 0 Cov (h (x ), h 0 (x )) =E[Cov (x θ , x θ 0 )] + Cov (E[x θ ],E[x θ 0 ]) j,r r j ,r r jr j r r jr j r r r r 2 0 2 0 =(σ + δ (j )σ )x x j r μ r 0 T 0 2 0 0 0 and with trait dependent prior mean Cov (h (x ), h (x )) = (τ V τ +δ (j )σ )x x . j,r r j ,r γ j j r r j r Hence, these alternatives have the same covariance structure as an additive multivari- ate GP (3.10) with k = x x for all j = 1, . . . , J , and interspecific covariances hj,r r r 2 0 2 T 0 2 Σ = σ + δ (j )σ and Σ = τ V τ 0 + δ (j )σ . Hence, the hierarchical prior struc- h j h γ j j r μ r j ture of HMSC could easily be extended to semiparametric models as well if we had trait information from our species. The benefit would be that these hierarchical model structures contain ecologically relevant prior information and, hence, using the induced interspecific covariance structure in the additive multivariate GP model would make it ecologically more realistic and potentially improve its predictive performance. The HMSC framework has many other features as well, such as phylogenetic relationships Ovaskainen et al. (2017), which could in principle be incorporated with our approach as well. The structure of interspecific correlations is likely to be especially important in scenario based predictive analyses, such as climate change predictions. Since our model does not make any mechanistic assumptions on species interde- pendencies the results concerning interspecific correlations have to be interpreted with care. For example, a positive correlation between spatial random effects could arize due mutualism, predator prey association or similar response to unobserved environmental covariates. Hence, interpreting these correlations should be done in light of more gen- eral ecological knowledge on the species. This is, however, a common challenge with all current JSDMs (Thorson et al., 2015; Dunstan et al., 2013; Ovaskainen et al., 2017; Taylor-Rodr´ ıguez et al., 2017). Moreover, the property of our model, as well as most other JSDM, is that if the training data shows positive or negative spatial correlation between two species this positive correlation is assumed to hold everywhere in the study domain. In many cases this might be an unrealistic assumption for which reason one in- teresting future development would be to extend our models to allow species-to-species associations to depend on the environmental context(Fox and Dunson, 2015; Tikhonov et al., 2017). However, the colocalization patterns are explained by both environmental imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 23 covariates and spatial random effects and regional differences in colocalization patterns can be explained by different environmental covariates in different regions. The interspecific correlation in our approach are modelled on the latent variable level and, hence, are not directly comparable to interspecific correlations in data. Some authors have argued that this is a conceptual weakness of latent variable models since it makes interpretating their results hard and thus less transparent. An alternative would be to model the correlations in the observation error model (see, e.g., Ovaskainen et al., 2010; Pollock et al., 2014). Clark et al. (2016) proposed generalized joint attribute model (GJAM) to fix this interpretation challenge by aiming to model the correlations between species on the data scale. However, for example, in case of presence absence observations GJAM corresponds to multivariate probit model that is the marginal likelihood of our model in case of Bernoulli observation model. It is also unclear how to disentangle the process underlying the species occurrence and abundance from the observation process leading to the actual data in GJAM approach. Hence, despite the challenge of inter- pretating the correlations, we prefer the hierarchical latent variable modeling since it provides a coherent approach to separate these two processes. Response functions The response functions along environmental covariates provide basis to understand how environment affects species distribution and to predict the species distribution in new areas. Linear and other parametric models are efficient in finding overall trends in re- sponses but have trouble in locating abrupt changes and change points due, for example, physical tolerance limits. Such limits, for example, along temperature and salinity are typical for large variety of taxa in aquatic domains (MacKenzie et al., 2007; Kotta et al., 2019). With SSDMs Vanhatalo et al. (2012), Shelton et al. (2014), Golding and Purse (2016) and Kallasvuo et al. (2017) have demonstrated the benefits of semiparametric models in such situations. The GP approach for modeling environmental responses is similar to generalized additive models (Guisan et al., 2002) but the latter have not been implemented as JSDMs. The flexibility of semiparametric models provides also challenges compared to the more restricted parametric models. The prior for covariance function parameters has to be set with care (Golding and Purse, 2016) and we may also need to pose monotonicity constraints to the response functions Kotta et al. (2019) in order to prevent overfitting. Inferring the responses reliably requires typically more data compared to parametric models. The multivariate additive GP helps in these challenges as well since the interspecific correlations increase the effectively amount of data to be used to infer each response function and hence decreases the uncertainty in them. Additive multivariate GP framework opens also new questions related to the choice of covariance functions and interspecific correlations. A typical choice for general pur- pose covariance function is a radial basis function. However, predictions using these covariance functions revert to prior predictive distributions when predicting beyond covariate range covered by data. Hence, in extrapolation tasks stationary covariance functions may not be the optimal choice (Vanhatalo et al., 2012; Kallasvuo et al., 2017) and combining the multivariate GP models with functional constraints (e.g. Kotta et al., 2019) could improve their predictive performance further. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 24 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 Spatial random effects Our model comparison results show clearly that including spatial random effects into model improves their predictive performance in both inter- and extrapolation. This re- sult is well in line with earlier works demonstrating that environmental conditions alone may not sufficiently explain species distribution and spatial random effects improve their predictive performance (Latimer et al., 2006; Vanhatalo et al., 2012; Clark et al., 2014; Thorson et al., 2015; Kallasvuo et al., 2017). This is reasonable, since the distribution of species is shaped by the interplay of environmental covariates, stochastic processes and species interaction (Ovaskainen et al., 2017). Hence, a justified model should account for random processes and as demonstrated also by our results also to species interac- tions in these processes. However, adding spatial random effects into model may lead to problems in model identifibiality which we discuss in the next section. Posterior inference and predictive performance The JSDM built with multivariate additive GP had clearly the best overall predictive performance. Moreover, the uncertainty in the predictions by JSDMs were smaller than in the SSDMs which, together with the best log predictive density performance, indicates that the predictions were also more accurate. These findings have important implications to practical use of SDMs for management and other purposes. For example Kallasvuo et al. (2017) used SSDMs to classify coastal areas of Finland to unsuitable, suitable and highly productive spawning regions for four commercially important fish species. They based their estimates on the expected numbers of larvae in the coastal region which, as shown by Figure 6 is highly sensitive to the uncertainty of the predictions. Moreover, by using SSDMs we can overestimate the total biomass of all species (Clark et al., 2014). An inherent challenge with the type of models considered here is the model identifib- iality. Spatial random effects are known to affect the fixed effects estimates (Hodges and Reich, 2010) and in some cases the spatial random effect can actually capture variability otherwise explainable with fixed effects (Paciorek, 2010; Bose et al., 2018). Moreover, with uniform (uninformative) priors the length-scale and variance hyperparameters of Mat´ ern class of covariance functions are non-identifiable (Zhang, 2004) in general. In order to alleviate these identifibiality challenges we used weakly informative priors for the variance and length-scale parameters defined through the principles proposed by Gelman (2006), Simpson et al. (2017) and Fuglstad et al. (2018). Our priors for the covariance function parameters favor small variance and large length-scale parameter values. The former penalizes for large magnitudes and the latter penalizes for wiggly responses along covariates or space. In case of covariate responses this is ecologically justified since according to ecological niche theory species’ responses to environmental covariates are typically either monotonic or have only one mode (Ovaskainen et al., 2016). Moreover, Fuglstad et al. (2018) show that these type of priors work well for spa- tial random effects with Mat´ ern covariance functions in general. Favoring longer spatial length-scales is justified also from the model identifibiality point of view. Identifiability between fixed and spatial random effects is of less concern if spatial random effect varies with larger spatial range than the environmental covariates (Paciorek, 2010, Section 3). imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 25 Since the environmental covariates in Finnish coastal region vary with relatively small spatial range, our weakly informative prior effectively favors models that explain this variation with covariate responses instead of with spatial random effects. Our model comparison suggests that the models with both covariate effects and spatial random effects lead to most reliable inference on environmental responses and spatial random effects. A model’s extrapolation performance is governed mainly by the environmental covariates so better extrapolation performance indicates more reliable response func- tions. Similarly more accurate interpolation performance indicates more reliable joint effect of environmental covariates and spatial random effect. An evident challenge with multivariate additive GP is the computation. The core element in the multivariate additive GP is the covariance matrix induced by the GP components. With many species and sampling sites the covariance matrix can become so large that it makes the implementation infeasible. Here we used Laplace method to speed up the computation by decreasing the number of costly posterior density calculations compared to full MCMC. However, in order to scale up the methods for large data sets involving thousands of sampling sites, the model would need to be implemented even more efficiently. One option to reduce the computational time could be to exploit the property that linear LMC can be parameterized through species specific conditional distributions (Gelfand et al., 2004). We could also implement multivariate GPs with sparse GP approximations (Vanhatalo et al., 2010; Alvarez et al., 2010) and replace the full rank LMC model with latent factor model (Thorson et al., 2015; Ovaskainen et al., 2017). However, we leave these considerations for future. 7.3 Summary Our JSDM based on multivariate additive GP combines the key ideas from semi- parametric SSDMs and state-of-the-art parametric JSDMs. In our case study, it showed superior predictive performance compared to existing GP based SSDMs and paramet- ric SSDMs and JSDMs in both interpolation and extrapolation. Hence, the multivariate additive GP can be seen as the first step towards integration of the semi-parametric SSDMs and hierarchical JSDMs. We propose also an efficient approach for inference by utilizing Laplace approximation and gradient based optimization for hyperparame- ters. The multivariate additive GP model is not restricted only to species distribution modeling but can be used in wide variety of other applications as well. Supplementary Material The supplementary material contains additional mathematical formulation of the method- ology proposed in this paper and additional figures and tables for the case study analysis. (http://www.some-url-address.org/dowload/0000.zip). Acknowledgments This work has bee funded by the Academy of Finland (grant 317255) and Research Funds of the University of Helsinki (decision No. 465/51/2014). imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 26 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 References Altartouri, A., Nurminen, L., and Jolma, A. (2014). “Modeling the role of the close- range effect and environmental variables in the occurrence and spread of Phragmites australis in four sites on the Finnish coast of the Gulf of Finland and the Archipelago Sea.” Ecology and Evolution , 4: 987–1005. 21 Alvarez, M. A., Luengo, D., Titsias, M. K., and Lawrence., N. D. (2010). “Efficient multi- output Gaussian processes through variational inducing kernels.” JMLR Workshop and conference proceedings , 9: 25–32. 25 Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2015). Hierarchical Modelling and Analysis for Spatial Data . Chapman Hall/CRC, second edition. 6 Barnard, J., McCulloch, R., and Meng, X.-L. (2000). “Modelling covariance matrices in terms of standard deviations and correlations, with applications to shrinkage.” Statistical Sinica . 9, 21 Bergstr¨ om, U., Olsson, J., Casini, M., Eriksson, B. K., Fredriksson, R., Wennhage, H., and Appelberg, M. (2015). “Stickleback increase in the Baltic Sea – A thorny issue for coastal predatory fish.” Estuarine, Coastal and Shelf Science , 163: 134–142. 3, 4 Bose, M., Hodges, J. S., and Banerjee, S. (2018). “Toward a diagnostic toolkit for linear models with Gaussian-process distributed random effects.” Biometrics , 74(3): 863–873. 24 Busse, S. and Snoeijs, P. (2002). “Gradient responses of diatom communities in the Bothnian Bay, northern Baltic Sea.” Nova Hedwigia , 3-4: 501–525. 4 Bystr¨ om, P., Bergstr¨ om, U., Hj¨ alten, A., St˚ ahl, S., Jonsson, D., and Olsson, J. (2015). “Declining coastal piscivore populations in the Baltic Sea: Where and when do stick- lebacks matter?” Ambio, 44: 462–471. 20 Candolin, U., Engstr¨ omOst, J., and Salesto, T. (2008). “Humaninduced eutrophication enhances reproductive success through effects on parenting ability in sticklebacks.” Oikos , 117: 459–465. 20 Chib, S. and Greenberg, E. (1998). “Analysis of multivariate probit models.” Biometrika , 85(2): 347–361. 2 Clark, J. S., Gelfand, A., Woodall, C. W., and Zhu, K. (2014). “More than the sum of the parts: forest climate response from joint species distribution models.” Ecological Applications , 24(5): 990–999. 2, 21, 24 Clark, J. S., Nemergut, D., Seyednasrollah, B., Turner, P. J., and Zhang, S. (2016). “Generalized joint attribute modeling for biodiversity analysis: median-zero, multi- variate, multifarious data.” Ecological Monographs , 87(1): 34–56. 23 Cressie, N. and Wikle, C. K. (2011). Statistics for Spatial-Temporal Data . Wiley Series in Probability and Statistics. 6 Dunstan, P. K., Foster, S. D., Hui, F. K. C., and Warton, D. I. (2013). “Finite Mixture of Regression Modeling for High-Dimensional Count and Biomass Data in Ecology.” imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 27 Journal of Agricultural, Biological, and Environmental Statistics , 18(3): 357–375. 2, Duvenaud, D., Nickisch, H., and Rasmussen, C. (2011). “Additive Gaussian Processes.” Neural Information Processing Systems . 7 Elith, J. and Leathwich, J. R. (2009). “Species Distributions Models: Ecological Ex- planation and Predictions Across Space and Time.” The Annual Review of Ecology, Evolution and Systematics , 40(677-697). 2 Eriksson, B. K., Ljunggren, L., Sandstrom, A., Johansson, G., Mattila, J., Rubach, A. E., R˚ aberg, S., and Snickars, M. (2009). “Declines in predatory fish promote bloomforming macroalgae.” Ecological Applications , 19: 1975e1988. 4 Eriksson, B. K., Rubach, A., Batsleer, J., and Hillebrand, H. (2012). “Cascading preda- tor control interacts with productivity to determine the trophic level of biomass ac- cumulation in a benthic food web.” Ecological Research , 27: 203e210. 4 Fox, E. B. and Dunson, D. B. (2015). “Bayesian Nonparametric Covariance Regression.” Journal of Machine Learning research , 16(1): 2501–2542. 22 Fuglstad, G.-A., Simpson, D., Lindgren, F., and Rue, H. (2018). “Constructing priors that penalize the complexity of Gaussian random fields.” Journal of the American Statistical Association , 1(1): 1–8. 24 Gelfand, A. E., Jr, J. A. S., Wu, S., Latimer, A., Lewis, P. O., Rebelo, A. G., and Holder, M. (2006). “Explaining Species Distribution Patterns through Hierarchical Modelling.” Bayesian Analysis , 1(1): 41–92. 2, 6 Gelfand, A. E., Schmidt, A. M., Banerjee, S., and Sirmans, C. F. (2004). “Nonstationary multivariate process modeling through spatially varying coregionalization.” Test , 13(2): 263–312. 8, 10, 25 Gelman, A. (2006). “Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper).” Bayesian Analysis , 1(3): 515–534. 24 Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis . Chapman & Hall/CRC, third edition. 17 Golding, N. and Purse, B. V. (2016). “Fast and flexible Bayesian species distribution modelling using Gaussian processes.” Methods in Ecology and Evolution , 7: 598–608. 2, 3, 7, 8, 23 Guisan, A., Edwards, T. C., and Hastie, T. (2002). “Generalized linear and generalized additive models in studies of species distributions: setting the scene.” Ecological Modelling , 157(2-3): 89–100. 23 Guisan, A., Tingley, R., Baumgartner, J. B., Naujokaitis-Lewis, I., Sutcliffe, P. R., Tulloch, A. I. T., Regan, T. J., Brotons, L., McDonald-Madden, E., Mantyka-Pringle, C., Martin, T. G., Rhodes, J. R., Maggini, R., Setterfield, S. A., Elith, J., Schwartz, M. W., Wintle, B. A., Broennimann, O., Austin, M., Ferrier, S., Kearney, M. R., Possingham, H. P., and Buckley, Y. M. (2013). “Predicting species distributions for conservation decisions.” Ecology Letters , 16(12): 1424–1435. 21 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 28 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 Hartmann, M., Hosack, G. R., Hillary, R. M., and Vanhatalo, J. (2017). “Gaussian process framework for temporal dependence and discrepancy functions in Ricker-type population growth models.” Ann. Appl. Statist., 11(3): 1375–1402. 2, 9 Hastie, T. and Tibishirani, R. (1986). “Generalize additive models.” Statistical Science , 1(3): 297–318. 7 Hodges, J. S. and Reich, B. J. (2010). “Adding Spatially-Correlated Errors Can Mess Up the Fixed Effect You Love.” The American Statistician , 64(4): 325–334. 24 Hudd, R., Lehtonen, H., and Kurttila, I. (1988). “Growth and abundance of fry; factors which influence the year-class strength of whitefish (Coregonus lavaretus widegreni ) in the southern Bothnian Bay (Baltic).” Finnish Fisheries Research , 9: 213–220. 20 Hui, F. K. C., Warton, D. I., Foster, S. D., and Dunstan, P. K. (2013). “To mix or not to mix: Comparing the predictive performance of mixture models vs. separate species distribution models.” Ecology , 94(9): 1913–1919. 2, 21 Kallasvuo, M., Vanhatalo, J., and Veneranta, L. (2017). “Modeling the spatial dis- tribution of larval fish abundance provides essential information for management.” Canadian Journal of Fisheries and Aquatic Sciences , 74: 636–649. 2, 3, 7, 23, 24 Kass, R. E. and Raftery, A. E. (1995). “Bayes Factors.” Journal of the American Statistical Association , 90(430): 773–795. 16 Kotta, J., Vanhatalo, J., J¨ anes, H., Orav-Kotta, H., Rugiu, L., Jormalainen, V., Bobsien, I., Viitasalo, M., Virtanen, E., Sandman, A. N., Isaeus, M., Leidenberger, S., Jonsson, P., and Johannesson, K. (2019). “Integrating experimental and distribution data to predict future species patterns.” Scientific Reports , 9(1): 1821. 2, 23 Kurowicka, D. and Cooke, R. (2003). “A parameterization of positive definite matrices in terms of partial correlation vines.” Linear Algebra and its Applications , 372(Sup- plement C): 225–251. 12 Latimer, A. M., Wu, S., Gelfand, A. E., and Silander, Jr., J. A. (2006). “Building Statistical Models to Analyze Species Distributions.” Ecological Applications , 16(1): 33–50. 2, 24 Lef´ ebure, R., Larsson, S., and Bystr¨ om, P. (2011). “A temperature dependent growth model for the threespined stickleback Gasterosteus aculeatus.” Journal of fish biology , 79: 1815–1827. 20 Leskel¨ a, A., Hudd, R., Lehtonen, H., Huhmarniemi, A., and Sandstr¨ om, O. (1991). “Habitats of whitefish (Coregonus lavaretus (L.) s.l.) larvae in the Gulf of Bothnia.” Aqua Fennica , 21: 145–151. 20 Lewandowski, D., Kurowicka, D., and Joe, H. (2009). “Generating random correla- tion matrices based on vines and extended onion method.” Journal of Multivariate Analysis , 100(9): 1989–2001. 12 Lindley, D. V. (2002). “Seeing and Doing: The Concept of Causation.” International Statistical Review , 70(2): 191–214. 11 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 29 Liu, J. and Vanhatalo, J. (2018). “Bayesian model-based spatiotemporal survey design for log-Gaussian Cox process.” arXiv e-prints , arXiv:1808.09200. 7 Lopez, H. F. (2000). “Bayesian Analysis in Latent Factor and Longitudinal Models.” Ph.D. thesis, Institute of Statistics and Decision Sciences, Duke University. 21 Lopez, H. F., Salazar, E., and Gamerman, D. (2008). “Spatial Dynamic Factor Analy- sis.” Bayesian Statistics , 3(4): 759–792. 21 MacKenzie, B. R., Gislason, H., M¨ ollmann, C., and K¨ oster, F. W. (2007). “Impact of 21st century climate change on the Baltic Sea fish community and fisheries.” Global Change Biology , 13: 1348–1367. 23 Mardia, K. V. and Goodall, C. R. (1993). “Spatio-Temporal analysis of Multivariate Environmental Monitoring Data.” Multivariate Environmental Statistics , 347–386. 8, 10 M.Cingi, S., Kein¨ anen, and Vuorinen, P. J. (2010). “Elevated water temperature impairs fertilization and embryonic development of whitefish Coregonus lavaretus.” Journal of Fish Biology , 76: 502–521. 20 Meier, H. E. M., Hordoir, R., Andersson, H. C., Dieterich, C., Eilola, K., ..., B. G. G., and Schimanke, S. (2012). “Modeling the combined impact of changing climate and changing nutrient loads on the Baltic Sea environment in an ensemble of transient simulations for 19612099.” Climate Dynamics , 39: 2421–2441. 20 Muller, ¨ R. (1992). “Trophic state and its implications for natural reproduction of salmonid fish.” In Dynamics and Use of Lacustrine Ecosystems . Springer, Dordrecht. Nelder, J. A. and Wedderburn, R. W. M. (1972). “Generalized Linear Models.” Journal of the Royal Statistical Association , 135(3): 370–384. 2 Nickisch, H. and Rasmussen, C. E. (2008). “Approximations for binary Gaussian process classification.” Journal of Machine learning , 9: 2035–2078. 9 Odum, E. P. (1953). Fundamentals of ecology . Wiley Subscription Services, Inc., A Wiley Company. 13 O’Hagan, A. (1978). “Curve Fitting and Optimal Design for Prediction.” Journal of Royal Statistical Society B , 40(1): 1–42. 2 Ovaskainen, O., de Knegt, H. J., and del Mar Delgado, M. (2016). Quantitative Ecology and Evolutionary Biology -Integrating models with data . Oxford University Press. 24 Ovaskainen, O., Hottola, J., and Siitonen, J. (2010). “Modeling species co-occurence by multivariate logistic regression generates new hypotheses on fungal interaction.” Ecology , 91(9): 2514–2521. 23 Ovaskainen, O. and Soininen, J. (2011). “Making more out of sparse data: Hierarchical modelling of species communities.” Ecology , 92(2): 289–295. 2 Ovaskainen, O., Tikhonov, G., Norberg, A., Guillaume Blanchet, F., Duan, L., Dunson, D., Roslin, T., and Abrego, N. (2017). “How to make more out of community data? imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 30 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 A conceptual framework and its implementation as models and software.” Ecology Letters , 20(5): 561–576. 2, 8, 21, 22, 24, 25 Paciorek, C. J. (2010). “The Importance of Scale for Spatial-Confounding Bias and Precision of Spatial Regression Estimators.” Statistical Science , 25(1): 107–125. 24 Pearl, J. (2009). “Causal inference in statistics: An overview.” Statistics Surveys , 3: 96–146. 11 Peltonen, H., Vinni, M., Lappalainen, A., and Pnni, J. (2004). “Spatial feeding patterns of herring (Clupea harengus L.), sprat (Sprattus sprattus L.), and the three-spined stickleback (emphGasterosteus aculeatus L.) in the Gulf of Finland, Baltic Sea.” ICES Journal of Marine Science , 61: 966–971. 20 Pitk¨ anen, H., Peuraniemi, M., Westerbom, M., Kilpi, M., and von Numers, M. (2013). “Longterm changes in distribution and frequency of aquatic vascular plants and charo- phytes in an estuary in the Baltic Sea.” Annales Botanica Fennica , 50: 1e54. 21 Pollock, L. J., Tingley, R., Morris, W. K., Golding, N., Hara, R. B. O., Parris, K. M., Vesk, P. A., and McCarthy, M. A. (2014). “Understanding co-occurence by modelling species simultaneously with joint species distribution model (JSDM).” Methods in Ecology and Evolution , 5: 397–406. 2, 8, 23 Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning . The MIT Press. 2, 7, 12 Record, S., Fitzpatrick, M. C., Finley, A. O., Veloz, S., and Ellison, A. M. (2013). “Should species distribution models account for spatial autocorrelation? A test of model projections across eight millennia of climate change.” Global Ecology and Biogeography , 22(6): 760–771. 21 Reusch, T. B. H., Dierking, J., Andersson, H. C., Bonsdorff, E., Carstensen, J., Casini, M., Czajkowski, M., Hasler, B., Hinsby, K., Hyyti¨ ainen, K., Johannesson, K., Jomaa, S., Jormalainen, V., Kuosa, H., Kurland, S., Laikre, L., MacKenzie, B. R., Margon- ski, P., Melzner, F., Oesterwind, D., Ojaveer, H., Refsgaard, J. C., Sandstr¨ om, A., Schwarz, G., Tonderski, K., Winder, M., and Zandersen, M. (2018). “The Baltic Sea as a time machine for the future coastal ocean.” Science Advances , 4(5). 3 R¨ onnberg, C. and Bonsdorff, E. (2004). “Baltic Sea eutrophication: area-specific eco- logical consequences.” Hydrobiologia , 514: 227–241. 4 Rue, H. and Marino, S. (2009). “Approximate Bayesian Inference for latent Gaussian models by using integrated Laplace approximantions.” Journal of the Royal Statistical Society , 71(2): 319–392. 9 Shelton, A. O., Thorson, J. T., Ward, E. J., and Feist, B. E. (2014). “Spatial semipara- metric models improve estimates of species abundance and distribution.” Canadian Journal of Fisheries and Aquatic Sciences , 71(July): 1655–1666. 23 Sieben, K., Ljunggren, L., Bergstr¨ om, U., and Eriksson, B. K. (2011). “A meso-predator release of stickleback promotes recruitment of macroalgae in the Baltic Sea.” Journal of Experimental Marine Biology and Ecology , 397: 79e84. 20 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 31 Simpson, D. P., Rue, H. v., Riebler, A., Martins, T. G., and Sørbye, S. H. (2017). “Penal- ising model component complexity: A principled, practical approach to constructing priors.” Statistical Science , 32(1): 1–28. 24 Taylor-Rodr´ ıguez, D., Kaufeld, K., Schliep, E. M., Clark, J. S., and Gelfand, A. E. (2017). “Joint Species Distribution Modeling: Dimension Reduction Using Dirichlet Processes.” Bayesian Analysis , 12(4): 939–967. 22 Thorson, J. T., Scheuerell, M. D., Shelton, A. O., See, K. E., Skaug, H. J., and Kris- tensen, K. (2015). “Spatial factor analysis: a new tool for estimating joint species distributions and correlations in species range.” Methods in Ecology and Evolution , 6(6): 627–637. 2, 21, 22, 24, 25 Tikhonov, G., Abrego, N., Dunson, D., and Ovaskainen, O. (2017). “Using joint species distribution models for evaluating how species-to-species associations depend on the environmental context.” Methods in Ecology and Evolution , 8(4): 443–452. 22 Tokuda, T., Goodrich, B., Mechelen, I. V., and Gelman, A. (2012). “Visualizing Distri- butions of Covariance Matrices.” 9, 21 Vanhatalo, J., Pietil¨ ainen, V., and Vehtari, A. (2010). “Approximate inference for disease mapping with sparse Gaussian processes.” Statistics in Medicine , 29(15): 1580–1607. 3, 9, 12, 25 Vanhatalo, J., Riihim¨ aki, J., Hartikainen, J., Jyl¨ anki, P., Tolvanen, V., and Vehtari, A. (2013). “GPstuff : Bayesian Modeling with Gaussian Processes.” Journal of Machine Learning Research , 14: 1175–1179. 12 Vanhatalo, J., Veneranta, L., and Hudd, R. (2012). “Species distribution modelling with Gaussian processes: A case study with youngest stages of sea spawning whitefish (Coregonus lavatus L. s.l.) larvae.” Ecological Modelling , (228): 49–58. 2, 5, 7, 8, 16, 20, 23, 24 Vehtari, A., Mononen, T., Tolvanen, V., Sivula, T., and Winther, O. (2016). “Bayesian Leave-One-Out Cross-Validation Approximations for Gaussian Latent Variable Mod- els.” Journal of Machine Learning Research , 17: 1–38. 12, 15 Vehtari, A. and Ojanen, J. (2012). “A survey of Bayesian predictive methods for model assessment, selection and comparison.” Statistics Surveys , 6: 141–228. 15 Veneranta, L., Hudd, R., and Vanhatalo, J. (2013). “Reproduction areas of sea-spawning coregionids reflect the environment in shallow coastal areas.” Marine Ecology Progress Series , 477: 231–250. 3, 4, 5, 16, 20 Warton, D. I., Blanchet, F. G., O’Hara, R. B., Ovaskainen, O., Taskinen, S., Walker, S. C., and Hui, F. K. (2015). “So Many Variables: Joint Modeling in Community Ecology.” Trends in Ecology and Evolution , 30(12): 766–779. 2 Warton, D. I. and Shepherd, L. C. (2010). “Poisson point process models solve the ”pseudo-absence problem” for presence-only data in ecology.” Annals of Applied Statistics , 4(3): 1383–1402. 6 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 32 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 Wikle, C. K. (2003). “Hierarchical Models in Environmental Science.” International Statistical Review , 71(2): 181–199. 6 Zhang, H. (2004). “Inconsistent Estimation and Asymptotically Equal Interpolations in Model-Based Geostatistics.” Journal of the American Statistical Association , 99(465): 250–261. 24 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Bayesian Analysis (2018) , Number , pp. 1 Supplementary Material: Additive multivariate Gaussian process for joint species distribution modeling with heterogeneous data ∗ † ‡ Jarno Vanhatalo , Marcelo Hartmann and Lari Veneranta 1 The posterior expectation and variance for new observations First we approximate the logistic function with p(f ) ≈ p ˜(f ) = aΦ(f/v )+(1−a)Φ(f/v ) 1 2 where a ∈ (0, 1) and v , v ∈ (0,∞) are parameters of the approximation (see Demi- 1 2 −4 denko, 2004). This approximation is used since it has small error bound (≤ |10 |, Demidenko (2004)) and can considerably speed-up marginalization over f for large datasets. Using Lemma 2 in the Section 4, we obtain, μ μ j,∗ j,∗ √ √ E[Y (x , s )| y,η, Λ] = z aΦ + z (1− a)Φ (1.1) j ∗ ∗ j,∗ j,∗ 2 2 Σ +v Σ +v j,∗ j,∗ 1 2 V[Y (x , s )| y,η, Λ] = E[Y (x , s )| y,η, Λ]− E[Y (x , s )| y,η, Λ] j ∗ ∗ j ∗ ∗ j ∗ ∗ 2 2 + (z − z )E[p ˜ (f )] j,∗ ∗ j,∗ where, Φ(·) is the cumulative distribution of the standard Gaussian random variable and 2 2 2 E[p ˜ (f )] = a F (μ |V ) + (1− a) F (μ |V ) + 2a(1− a)F (μ |V ) (1.2) ∗ 2 1,1 2 2,2 2 1,2 j,∗ j,∗ j,∗ F (μ |V ) is the zero-mean 2-dimensional Gaussian cumulative distribution function 2 m,n j,∗ evaluated at μ = [μ μ ] with covariance matrix given by j,∗ j,∗ j,∗ Σ + v Σ j,∗ j,∗ V = . (1.3) m,n Σ Σ + v j,∗ j,∗ In the case of Negative-Binomial model, we use the moment generating function of a Gaussian random variable to obtain the unconditional expectation of a future outcome w.r.t to latent variable, i.e. μ +Σ /2 j,∗ j,∗ E[Y (x , s )| y,η, Λ] = z e (1.4) j ∗ ∗ j,∗ h i μ +Σ /2 2 2μ +Σ Σ /2 j,∗ j,∗ j,∗ j,∗ j,∗ V[Y (x , s )| y,η, Λ] = z e + z e e (r ˆ + 1)/r ˆ − 1 . j ∗ ∗ j,∗ j j j,∗ Department of Mathematics and Statistics and Organismal and Evolutionary Biology Research Program, University of Helsinki, jarno.vanhatalo@helsinki.fi Department of Mathematics and Statistics, University of Helsinki, marcelo.hartmann@helsinki.fi Natural Resources Institute Finland, Finland c 2018 International Society for Bayesian Analysis imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 arXiv:1809.02432v2 [stat.ME] 8 Aug 2019 2 Supplement: JSDMs with additive multivariate Gaussian processes 2 Conditional predictions Recall that, in the main paper, the distribution of Y (x , s )| y conditioned on J ∗ ∗ 1 J hyperparameters (they were ommited from the notation for simplicity) is given by π(y | y ) = π(y | y , f )π(y , f )df /π(y ) (2.1) J ,∗ J ,∗ J ,∗ J ,∗ J J ,∗ J 1 J 1 1 J 1 2 1 2 2 2 The second term in the numerator of the integrand of (2.1) is the same as, π(y , f ) = π(f | f , y )π(f | y )π(y )df . (2.2) J ,∗ J ,∗ J J J J 1 1 2 J 2 J J 2 2 2 2 2 Now, f | f , y also only depends on f . Plugging (2.2) into (2.1) we get J ,∗ J J 1 2 J 2 π(y | y ) = π(y | f )π(f | f )π(f | y )df df (2.3) J ,∗ J ,∗ J J J J ,∗ J ,∗ J J ,∗ 1 1 2 2 J 2 1 1 2 1 2 where last two terms in (2.3) are Gaussians. The first one is the conditional Gaus- sian density function w.r.t. the predictor function values f , that is, π(f | f ) = J J ,∗ J 2 1 2 −1 −1 T N (f |C C f , C − C C C ) and the second one is the J ,∗ J ,J ∗ J J ∗,∗ J ,J ∗ 1 1 2 2 1 1 2 J ,J J ,J J ,J ∗ 2 2 2 2 1 2 Laplace approximation using the scenario species data related to the set J , that is, −1 −1 π(f | y ) ≈ N f |C ∇ log π(y |f ), (C + W ) . Plugging this ap- J J J ,J J J 2 J 2 2 2 J 2 2 2 2 J ,J 2 2 proximate density function in the equation (2.3) gives π(y | y ) = π(y | f )N (f |μ , Σ )df (2.4) J ,∗ J J ,∗ J ,∗ J ,∗ |J |J J ,∗ 1 2 1 1 1 2 2 1 with μ = C ∇ log π(y |f ) J ,J ∗ J |J 1 2 J 2 2 2 −1 −1 T Σ = C − C (C + W ) C (2.5) |J J ∗,∗ J ,J ∗ J ,J 2 1 1 2 2 2 J J ,J ∗ 2 1 2 where C is the covariance between species in the setJ at the point (x , s ) and the J ,J ∗ 1 ∗ ∗ 1 2 species in the setJ in their respective covariates and spatial locations. ∇ log π(y |f ) 2 J J 2 correspond to the gradient of the log-likelihood function for species related to the set J . C is the covariance matrix of f at (x , s ). C is the covariance matrix of J ∗,∗ J ,∗ ∗ ∗ J ,J 1 1 2 2 f and W is the negative Hessian matrix of the negative logarithm of the likelihood J J 2 2 functions related to species in the set J . Now, to calculate the predictive mean vector and predictive covariance matrix we follow the steps used in deriving unconditional expectation and unconditional variances as in the main paper. However we exclude them for brevity. 3 Unconstrained parametrization of covariance matrices Let the matrix Σ be a covariance matrix of dimension J . In terms of variances and cor- 1 1 2 2 2 2 2 2 2 relations, we rewrite Σ = diag(σ , . . . , σ ) P diag(σ , . . . , σ ) , where σ , j = 1, . . . , J 1 J 1 J j imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 3 are variances and these are transformed as σ = exp(δ ). The correlation matrix P is written in terms of its Cholesky decomposition, that is P = LL , where L is the lower triangular Cholesky decomposition. In our case, the upper triangular Cholesk decomposition is given by   1 z z ··· z 1,2 1,3 1,J   Q Q Q   1 1 1 1 1 1 2 2 2  2 2 2 0 (1− z ) z (1− z ) ··· z (1− z ) 2,3 2,J i,2 i,3 i=1 i=1 i=1 i,J     L = (3.1)   . . . .  . . . . . . . . .     J−1 0 0 0 ··· (1− z ) i=1 i,J 0 0 where each z ∈ (−1, 1). Now, since each z can freely vary in the interval (−1, 1) i,i i,i without violate the positive definiteness property of P (see Kurowicka and Cooke, 0 0 2003; Lewandowski et al., 2009), we map each z to the real line as z = 2/[1 + i,i i,i exp(−aδ 0 )]− 1, where δ 0 ∈ R. i,i i,i The derivative of the log π(P ) (recall the prior for the correlation matrix in the main paper) w.r.t. each δ 0 is given by i,i J−1 J X X ∂ ∂ ∂ log π(P|v) = log π(P|v) ρ 0 (3.2) j,j ∂δ 0 ∂ρ 0 ∂δ 0 i,i j,j i,i j=1 j =j+1 and the partial derivatives of log π(P|v) w.r.t to ρ 0 are obtained as, j,j −1 −1 0 0 log π(P|v) = (v − 1)(J − 1)− 1 {P } − v {P } (3.3) j,j c,c rr ∂ρ j,j r=1 : r∈{ / j,j } 0 0 0 0 0 where c = j − 1 and c = j if r < j and, j = 1 or r > j . If r < j and r < j then −1 −1 −1 0 0 0 c = j− 1 and c = j − 1. {P } 0 is the entry (j, j ) of the matrix P and {P } 0 is j,j c,c rr 0 −1 the entry (c, c ) of the inverse matrix P , where P is a matrix obtained by removing rr rr the r’th row and column of P . The partial derivatives of each ρ 0 w.r.t to δ 0 are j,j i,i T T 0 0 0 obtained from ∂P/∂δ = (∂L/∂δ )L + L(∂L/∂δ ) . i,i i,i i,i Lastly, we find the derivatives of the logarithm of the absolute value of the Jacobian determinant w.r.t each δ (see equation (11) in Lewandowski et al. 2009). i,i J−1 J Y Y ∂ ∂(ρ , . . . , ρ ) ∂ dz 0 1,2 J−1,J j,j 2 (J−j−1)/2 log = log (1− z 0 ) j,j 0 0 0 ∂δ ∂(δ , . . . , δ ) ∂δ dδ i,i 1,2 J−1,J i,i j,j j=1 j =j+1 2 2 d z 0/dδ 0 0 0 z dz i,i i,i i,i i,i = −(J − i− 1) + (3.4) 1− z dδ 0 dz 0/dδ 0 i,i i,i i,i i,i imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 4 Supplement: JSDMs with additive multivariate Gaussian processes 4 Gaussian integrals Lemma 1. Let’s denote by Φ(·) the standard-Gaussian cumulative distribution function 2 2 and N (·|μ, σ ) the Gaussian density function with parameters (μ, σ ) ∈ R× R . Then the following holds x− m N (x|μ, σ ) Φ dx = F (μ |m , V ) (4.1) N N N r=1 −∞ where F (·|c,C) is the N -dimensional Gaussian cumulative distribution function with parameters (c,C) ∈ R ×R, with R the space of positive-definite matrices (covariance T N N matrices). Furthermore, m = [m ··· m ] ∈ R , μ = μ1 ∈ R , v > 0 ∀r and N 1 N N r V is a covariance matrix given by,   2 2 2 v + σ . . . σ   . . . . . V = (4.2) N   . . 2 2 2 σ . . . v + σ Proof. To show (4.1), start writing the left-hand side of the equation in full. Rewrite the integrand as the product of non-standard Gaussian density functions as well as the regions of integration, i.e., ∞ x x Z Z Z 2 2 2 ··· N (y |m , v ) . . .N (y |m , v )N (x|μ, σ )dy ··· dy dx. (4.3) 1 1 N N 1 N 1 N −∞−∞ −∞ Rewrite again using the following transformation [x, y ,··· , y ] = [w + μ, z + w + 1 N 1 m ,··· , z + w + m ] and note that |∂(x, y ,··· , y )/∂(w, z ,··· , z )| = 1. After 1 N N 1 N 1 N changing variables, group the different terms in the exponentials together to have ( " #) μ−m μ−m ∞ N 1 Z Z Z 1 2 (z +w) 1 r w ··· exp − + dz . . . dz dw (4.4) 2 2 1 N c v σ r=1 −∞ −∞ −∞ (N +1)/2 where c = σ(2π) v . Now, the expression inside the squared bracket is a r=1 quadratic form which is written with the following matrix form, N N N N X X X X (z +w) r w 2 1 1 z w z r r + = w + + w + z + 2 2 2 2 2 r 2 2 v σ v σ v v v r r r r r r=1 r=1 r=1 r=1     P P N N z 1 1 r   w + + 2 2 2 r=1 v σ r=1 v r r w       w z 1 z   1 2 2     v v = 1 1     .     .   w z N 2 2 v v N N imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 5 P    N   1 1 1 1 + ··· 2 2 2 2 w r=1 v σ w v v 1 N   1 1     ··· 0 z z 1  2 2  1 v v     1 1   = (4.5)     . . . . .  .  . .   . . . .   .  .  . . . . 1 1 z z N 0 ··· N 2 2 v v N N therefore (4.4) is the same as       N   1 1 1 1 + ···  2 2 2 2  w r=1 v σ w  v v  1 N   μ−m μ−m ∞ N 1     Z Z Z 1 1      ··· 0 z z 1  2 2  1 1 v v     1 1 1   ··· exp − dz dw     . . . . . c  .  . .  2   . . . .    .  .  .  . . .   −∞ −∞ −∞    1 1  z z N 0 ··· N 2 2 v v N N (4.6) The integrand in (4.6) has the full form of the multivariate Gaussian density. To identify this we need to find the closed-form covariance matrix from the precision matrix in (4.6) 2 N +1 and show that the determinant of the covariance matrix is given by c /(2π) . Write h i 1 1 1 1 the precision matrix as block matrix such that A = + , B = ··· , 2 2 2 2 r=1 v σ v v r 1 N 1 1 C = B and D = diag ,··· , . Use the partitioned matrix inversion lemma 2 2 v v 1 N −1 −1 2 (Strang and Borre, 1997, equation 17.44) to get the blocks, (A − BD C ) = σ , −1 −1 −1 2 −1 −1 −1 2 T −1 (BD C−A) BD = −σ [1··· 1], D C (BD C−A) = −σ [1··· 1] and D + −1 −1 −1 −1 2 2 2 2 D C (A−BD C ) BD where its main diagonal equals to [v +σ ,··· , v +σ ] and 1 N all off-diagonal elements are given by σ . Put everything together to have the covariance matrix   2 2 2 σ −σ ··· −σ 2 2 2 2   −σ v + σ ··· σ   (4.7)  . .  . . .   −σ . . 2 2 2 2 −σ σ ··· v + σ −1 2 2 2 N +1 whose determinant equals to 1/[det(D) det(A− BD C )] = σ v = c /(2π) r=1 r by the partitioned matrix determinant lemma (e.g., Rasmussen and Williams, 2006). Finally, in (4.6), interchange the order of integration with Fubini-Tonelli theorem (Fol- land, 2013) and integrate w.r.t. w to get       2 2 2 μ−m μ−m z 0 v + σ ··· σ N 1 Z Z 1       . . . . . . . . . ··· N , dz ··· dz (4.8)       1 N . . . . 2 2 2 −∞ −∞ z 0 σ ··· v + σ that equals to       2 2 2 μ m v + σ ··· σ       . . . . . . . . . F     ,  (4.9) . . . . 2 2 2 μ m σ ··· v + σ and therefore the equality (4.1) holds. imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 6 Supplement: JSDMs with additive multivariate Gaussian processes Lemma 2. Let N (·|μ , Σ) be the N -dimensional Gaussian density function with mean parameter μ and covariance matrix Σ. Then the following holds true, F (x|m , V)N (x|μ , Σ)dx = F (μ |m , V + Σ) (4.10) N N N N N N where F (·|c,C) is the N -dimensional Gaussian cumulative distribution function with T N T N parameters (c,C). Furthermore, m = [m ··· m ] ∈ R , μ = [μ ··· μ ] ∈ R , N 1 N 1 N V and Σ are covariance matrices. Proof. Let’s rewrite the left-hand side of (4.10) in full, x x n 1 Z Z Z ··· N (y|m, V)N (x|μ , Σ)dydx (4.11) −∞ −∞ T T where y = [y ,··· , y ] . Let’s use the following transformation [x ,··· , x , y ,··· , y ] 1 n 1 N 1 N = [w + μ ,··· , w + μ , z + w + m ,··· , z + w + m ] . The Jacobian of this 1 1 N N 1 1 1 N N N transformation simplifies to |∂(x ,··· , x , y ,··· , y )/ ∂(w ,··· , w , z ,··· , z )| = 1 N 1 N 1 N 1 N 1. Rewrite the above equation to get, μ −m μ −m N N 1 1 Z Z Z ··· N (w|− z, V)N (w| 0, Σ)dzdw (4.12) −∞ −∞ T T where z = [z ··· z ] and w = [w ··· w ] . Note that the product of two multivariate 1 N 1 N Gaussian density functions is another unnormalized multivariate Gaussian (see, e.g., Rasmussen and Williams, 2006, Appendix A). Therefore we write μ −m μ −m N N 1 1 Z Z Z ··· N (z| 0, V + Σ)N (w|c, C )dzdw (4.13) −∞ −∞ −1 −1 −1 −1 where c = −C V z and C = (V + Σ ) . Interchange the order of integration with Fubini-Tonelli theorem (Folland, 2013) and integrate w.r.t w to get that, μ −m μ −m N N 1 1 Z Z ··· N (z| 0, V + Σ)dz = F (μ |m , V + Σ) (4.14) N N −∞ −∞ which completes the proof. The above closed-form integrals extend many equalities in the table of Owen (1980). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 7 References Demidenko, E. (2004). Mixed Models: Theory and Application . John Wiley & Sons. Folland, G. (2013). Real Analysis: Modern Techniques and Their Applications . Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts. Wiley. Kurowicka, D. and Cooke, R. (2003). “A parameterization of positive definite matrices in terms of partial correlation vines.” Linear Algebra and its Applications , 372(Sup- plement C): 225–251. Lewandowski, D., Kurowicka, D., and Joe, H. (2009). “Generating random correla- tion matrices based on vines and extended onion method.” Journal of Multivariate Analysis , 100(9): 1989–2001. Owen, D. B. (1980). “A table of normal integrals.” Communications in Statistics - Simulation and Computation , 9(4): 389–419. URL http://dx.doi.org/10.1080/03610918008812164 Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning . The MIT Press. Strang, G. and Borre, K. (1997). Linear Algebra, Geodesy and GPS . Wellesley- Cambridge Press. imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 8 Supplement: JSDMs with additive multivariate Gaussian processes Appendix A: Extra results Standard models LOO Cross Validation 5-fold Cross validation 1) Univariate GP h(x), univ. (s) -2.230 (0.082) -2.465 (0.094) diatom.algae -3.942 (0.225) -4.002 (0.225) filame.algae -4.568 (0.141) -4.692 (0.146) macro-veg -1.909 (0.234) -2.092 (0.240) threespine-sb -1.477 (0.157) -1.624 (0.154) ninespine-sb -1.334 (0.151) -1.464 (0.149) white-fish -2.840 (0.200) -3.092 (0.216) vendance -1.634 (0.210) -2.196 (0.314) 2) Univariate GP h(x), multiv. (s) -2.081 (0.080) -2.480 (0.100) diatom.algae -3.578 (0.232) -4.010 (0.217) filame.algae -4.104 (0.159) -4.680 (0.135) macro-veg -1.733 (0.223) -2.148 (0.252) threespine-sb -1.385 (0.149) -1.637 (0.156) ninespine-sb -1.232 (0.139) -1.489 (0.151) white-fish -2.782 (0.203) -3.092 (0.213) vendance -1.537 (0.204) -2.215 (0.316) 3) Multiv. GP h(x), multiv. (s) -1.669 (0.080) -2.316 (0.086) diatom.algae -2.307 (0.112) -3.947 (0.217) filame.algae -1.244 (0.751) -4.623 (0.140) macro-veg -1.289 (0.150) -1.950 (0.248) threespine-sb -1.210 (0.140) -1.509 (0.156) ninespine-sb -1.184 (0.147) -1.389 (0.149) white-fish -3.399 (0.393) -2.964 (0.205) vendance -2.041 (0.451) -1.837 (0.247) 4) Multiv. GP h(x) only -2.228(0.089) -2.590(0.012) diatom.algae -4.033(0.235) -4.545(0.379) filame.algae -4.697(0.156) -4.865(0.187) macro-veg -1.754(0.226) -2.514(0.379) threespine-sb -1.418(0.173) -1.494(0.172) ninespine-sb -1.355(0.183) -1.393(0.168) white-fish -2.866(0.212) -3.192(0.271) vendance -1.608(0.222) -2.458(0.400) 5) Multiv. (s) only -2.351(0.087) -2.500(0.086) diatom.algae -4.129(0.200) -4.231(0.182) filame.algae -4.607(0.119) -4.729(0.109) macro-veg -2.255(0.254) -2.800(0.210) threespine-sb -1.471(0.174) -1.567(0.168) ninespine-sb -1.375(0.178) -1.473(0.152) white-fish -3.147(0.181) -3.226(0.191) vendance -1.675(0.235) -1.864(0.260) Table 1: Model comparison with leave-one-out (LOO) and 5-fold cross validation using the mean point wise log marginal predictive density statistics (and its standard error) for models 1-5 (see Section 5.3 in the main text). The bolded numbers show the overall performance over all species and the indented text shows the species specific predictions. imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 9 Standard models LOO Cross Validation 5-fold Cross validation 6) Univ. quadr. h(x), univ. (s) -2.262 (0.083) -2.517 (0.095) diatom.algae -3.941(0.228) -4.035(0.237) filame.algae -4.608(0.138) -4.741(0.150) macro-veg -1.955(0.239) -2.233(0.300) threespine-sb -1.490(0.158) -1.598(0.156) ninespine-sb -1.337(0.151) -1.501(0.140) white-fish -2.885(0.204) -3.137(0.215) vendance -1.705(0.215) -2.274(0.315) 7) Univ. quadr. h(x), multiv. (s) -2.117 (0.084) -2.484 (0.109) diatom.algae -3.568(0.243) -3.994(0.225) filame.algae -4.104(0.170) -4.717(0.137) macro-veg -1.763(0.229) -2.269(0.294) threespine-sb -1.360(0.165) -1.517(0.171) ninespine-sb -1.233(0.143) -1.484(0.142) white-fish -2.853(0.210) -3.098(0.251) vendance -1.668(0.226) -2.208(0.340) 8) Multiv. quadr. h(x), multiv. (s) -2.105 (0.084) -2.416 (0.099) diatom.algae -3.561(0.242) -4.009(0.242) filame.algae -4.090(0.177) -4.745(0.170) macro-veg -1.742(0.229) -2.197(0.280) threespine-sb -1.346(0.165) -1.476(0.173) ninespine-sb -1.217(0.149) -1.441(0.146) white-fish -2.845(0.201) -3.045(0.241) vendance -1.663(0.226) -2.080(0.311) 9) Multiv. quadr. h(x) only -10.246(1.297) -9.305(1.008) diatom.algae -48.063(10.943) -32.673(6.132) filame.algae -52.309(8.539) -42.283(6.498) macro-veg -7.139(1.439) -19.895(6.421) threespine-sb -1.441(0.178) -1.494(0.177) ninespine-sb -1.375(0.192) -1.400(0.158) white-fish -2.899(0.215) -3.209(0.275) vendance -1.677(0.227) -2.311(0.364) Table 2: Model comparison with leave-one-out (LOO) and 5-fold cross validation using the mean point wise log marginal predictive density statistics (and its standard error) for models 6-9 (see Section 5.3 in the main text). The bolded numbers show the overall performance over all species and the indented text shows the species specific predictions. imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 10 Supplement: JSDMs with additive multivariate Gaussian processes Pairwise model comparison LOO Cross Validation 5-fold Cross validation Model 8 - Model 6 0.156(0.019) 0.101(0.030) diatom.algae 0.379(0.106) 0.025(0.020) filame.algae 0.537(0.097) -0.003(0.037) macro-veg 0.214(0.214) 0.135(0.065) threespine-sb 0.144(0.032) 0.012(0.023) ninespine-sb 0.120(0.026) 0.059(0.017) white-fish 0.036(0.039) 0.091(0.045) vendance 0.043(0.032) 0.194(0.031) Model 8 - Model 7 0.012(0.004) 0.052(0.010) diatom.algae 0.006(0.016) -0.015(0.028) filame.algae 0.014(0.021) -0.027(0.046) macro-veg 0.021(0.012) 0.072(0.028) threespine-sb 0.015(0.008) 0.041(0.015) ninespine-sb 0.016(0.011) 0.043(0.017) white-fish 0.008(0.011) 0.052(0.019) vendance 0.006(0.006) 0.128(0.033) Table 3: Pairwise comparison of leave-one-out (LOO) and 5-fold cross validation point wise log predictive densities of multivariate parametric (quadratic environmental re- sponses) model (model 8) and parametric models without any interspecific correlations (model 6) or with interspecific correlations only in spatial random effect (model 7). We report the mean (and its standard error) of the differences in point wise log predic- tive densities at test locations. The bolded numbers show the overall performance and normal text shows the species specific predictions. Pairwise model comparison LOO Cross Validation 5-fold Cross validation Model 3 - model 1 0.561(0.126) 0.150(0.123) diatom.algae 1.637(0.151) 0.054(0.029) filame.algae 3.974(0.495) 0.070(0.027) macro-veg 0.620(0.129) 0.141(0.029) threespine-sb 0.267(0.097) 0.115(0.017) ninespine-sb 0.151(0.097) 0.075(0.019) white-fish -0.300(0.127) 0.129(0.026) vendance 0.051(0.073) 0.359(0.077) Model 3 - model 2 0.413(0.123) 0.164(0.017) diatom.algae 1.271(0.136) 0.065(0.029) filame.algae 3.496(0.504) 0.057(0.020) macro-veg 0.444(0.011) 0.198(0.031) threespine-sb 0.175(0.087) 0.128(0.198) ninespine-sb 0.049(0.085) 0.100(0.021) white-fish -0.357(0.011) 0.128(0.026) vendance -0.041(0.055) 0.377(0.078) Table 4: Pairwise comparison of leave-one-out (LOO) and 5-fold cross validation point wise log predictive densities of multivariate GP model (model 3) and GP models without any interspecific correlations (model 1) or with interspecific correlations only in spatial random effect (model 2). We report the mean (and its standard error) of the differences in point wise log predictive densities at test locations. The bolded numbers show the overall performance and normal text shows the species specific predictions. imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 11 imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Table 5: Hyperparameters estimates from covariates covariance functions Diatom-algae Filam-algae Macro-veg Threespine-sb Ninespine-sb White-fish Vendance SDM | JSDM SDM | JSDM SDM | JSDM SDM | JSDM SDM | JSDM SDM | JSDM SDM | JSDM σ 1.92 | 2.19 1.96 | 1.87 1.63 | 1.62 1.70 | 1.64 2.18 | 1.78 1.57 | 1.30 2.61 | 2.47 Opennes ` 1.56 | 1.59 1.29 | 1.31 1.51 | 1.26 1.90 | 2.23 2.12 | 1.86 1.98 | 1.61 0.64 | 0.92 Distance to σ 2.24 | 1.84 1.55 | 1.48 2.87 | 2.64 2.54 | 2.23 1.98 | 1.43 1.66 | 1.32 1.79 | 1.37 deep ` 1.09 | 1.18 1.16 | 1.06 1.39 | 1.87 1.04 | 1.39 1.48 | 1.80 2.02 | 1.33 1.32 | 1.09 σ 1.89 | 1.72 1.51 | 1.63 2.14 | 1.85 2.11 | 1.49 1.89 | 1.79 3.37 | 2.26 2.59 | 2.37 Sandy bottom ` 1.49 | 1.43 1.48 | 1.24 1.30 | 0.84 0.71 | 0.46 1.93 | 1.60 1.15 | 0.95 0.91 | 1.06 index σ 2.36 | 2.37 2.16 | 1.74 3.26 | 2.54 1.70 | 1.63 1.99 | 1.86 1.64 | 1.09 6.69 | 7.52 Ice breakup ` 1.11 | 1.18 0.75 | 1.06 0.62 | 1.87 1.32 | 1.39 2.22 | 1.80 1.87 | 1.33 0.70 | 1.09 week σ 1.62 | 1.53 1.55 | 1.64 2.93 | 2.68 1.91 | 1.46 1.72 | 1.42 1.95 | 1.75 2.53 | 2.48 Chlorophyl-a ` 1.11 | 1.46 0.75 | 0.78 0.62 | 0.49 1.32 | 1.76 2.22 | 1.71 1.87 | 1.30 0.70 | 0.66 σ 1.68 | 1.42 1.56 | 1.30 1.69 | 2.26 1.96 | 1.71 1.90 | 2.34 1.67 | 2.67 1.65 | 1.55 Distance to ` 1.59 | 1.76 1.63 | 1.42 1.10 | 0.80 0.76 | 0.65 1.23 | 0.73 0.36 | 0.41 1.45 | 1.63 nearest river σ 0.81 | 1.01 0.69 | 1.07 0.69 | 0.81 1.77 | 0.87 1.08 | 1.00 2.01 | 1.00 0.75 | 1.00 Bottom type | | | | | | | σ | | | | | | 2.87 | 2.17 Salinity ` | | | | | | 1.24 | 1.37 σ 10.9 | 9.35 7.31 | 6.00 4.62 | 4.67 6.87 | 6.42 5.66 | 5.74 8.11 | 7.39 9.32 | 6.85 Spatial ` 0.47 | 0.45 0.74 | 1.06 0.61 | 0.44 0.44 | 0.43 0.78 | 0.95 0.15 | 0.22 2.14 | 2.45 12 Supplement: JSDMs with additive multivariate Gaussian processes Figure 1: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for diatomo algae predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 13 Figure 2: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for filamentous algae predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 14 Supplement: JSDMs with additive multivariate Gaussian processes Figure 3: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for macrovegetation predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 15 Figure 4: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for whitefish predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 16 Supplement: JSDMs with additive multivariate Gaussian processes Figure 5: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for ninespine-stickleback predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 17 Figure 6: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for threespine-stickleback predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 18 Supplement: JSDMs with additive multivariate Gaussian processes Figure 7: Posterior predictive variance for the latent field f (x, s)| y, η ˆ, λ diatomo algae predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 19 Figure 8: Posterior predictive variance for the latent field f (x, s)| y, η ˆ, λ vendace pre- dicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 20 Supplement: JSDMs with additive multivariate Gaussian processes Figure 9: Posterior predictive variance for the latent field f (x, s)| y, η ˆ, λ for filamentous algae predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 21 Figure 10: Posterior predictive variance for the latent field f (x, s)| y, η ˆ, λ for macroveg- etation predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 22 Supplement: JSDMs with additive multivariate Gaussian processes Figure 11: Posterior predictive variance for the latent field f (x, s)| y, η ˆ, λ for whitefish predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 23 Figure 12: Posterior predictive variance for the latent field f (x, s)| y, η ˆ, λ ninespine- stickleback predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 24 Supplement: JSDMs with additive multivariate Gaussian processes Figure 13: Posterior predictive variance for the latent field f (x, s)| y, η ˆ, λ for threespine- stickleback predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Statistics arXiv (Cornell University)

Additive multivariate Gaussian processes for joint species distribution modeling with heterogeneous data

Statistics , Volume 2021 (1809) – Sep 7, 2018

Loading next page...
 
/lp/arxiv-cornell-university/additive-multivariate-gaussian-processes-for-joint-species-f8WYy0qDmt

References (90)

ISSN
1936-0975
eISSN
ARCH-3347
DOI
10.1214/19-BA1158
Publisher site
See Article on Publisher Site

Abstract

Bayesian Analysis (2019) Authors’ accepted version, Number , pp. Additive multivariate Gaussian processes for joint species distribution modeling with heterogeneous data Accepted for publication in Bayesian Analysis † ‡ § Jarno Vanhatalo , Marcelo Hartmann and Lari Veneranta Abstract. Species distribution models (SDM) are a key tool in ecology, conser- vation and management of natural resources. Two key components of the state-of- the-art SDMs are the description for species distribution response along environ- mental covariates and the spatial random effect that captures deviations from the distribution patterns explained by environmental covariates. Joint species distri- bution models (JSDMs) additionally include interspecific correlations which have been shown to improve their descriptive and predictive performance compared to single species models. However, current JSDMs are restricted to hierarchical gen- eralized linear modeling framework. Their limitation is that parametric models have trouble in explaining changes in abundance due, for example, highly non- linear physical tolerance limits which is particularly important when predicting species distribution in new areas or under scenarios of environmental change. On the other hand, semi-parametric response functions have been shown to improve the predictive performance of SDMs in these tasks in single species models. Here, we propose JSDMs where the responses to environmental covariates are modeled with additive multivariate Gaussian processes coded as linear models of coregionalization. These allow inference for wide range of functional forms and interspecific correlations between the responses. We propose also an efficient ap- proach for inference with Laplace approximation and parameterization of the in- terspecific covariance matrices on the euclidean space. We demonstrate the bene- fits of our model with two small scale examples and one real world case study. We use cross-validation to compare the proposed model to analogous semi-parametric single species models and parametric single and joint species models in interpo- lation and extrapolation tasks. The proposed model outperforms the alternative models in all cases. We also show that the proposed model can be seen as an extension of the current state-of-the-art JSDMs to semi-parametric models. MSC 2010 subject classifications: Primary 60G15, 60K35; secondary 62P12. Keywords: linear model of coregionalization, hierarchical model, heterogeneous data, spatial prediction, model comparison, Laplace approximation, covariance transformation. First available in Project Euclid: 3 June 2019: Available in Project Euclid: https://projecteuclid.org/euclid.ba/1559548823 Department of Mathematics and Statistics and Organismal and Evolutionary Biology Research Program, University of Helsinki, jarno.vanhatalo@helsinki.fi Department of Mathematics and Statistics and Department of Computer Science, University of Helsinki, marcelo.hartmann@helsinki.fi Natural Resources Institute Finland, Finland, lari.veneranta@luke.fi c 2019 International Society for Bayesian Analysis DOI: 10.1214/19-BA1158 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 arXiv:1809.02432v2 [stat.ME] 8 Aug 2019 2 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 1 Introduction Species distribution models (SDMs) are key tools in the ecologists’ toolbox. They have been widely used, among other applications, to study species habitat preferences (La- timer et al., 2006), to improve identification and management of conservation areas and natural resources (Kallasvuo et al., 2017) and to evaluate species responses to environ- mental filtering under climate change scenarios (Clark et al., 2014; Kotta et al., 2019). The main goals of statistical inference in these contexts are to use species observations and information on the associated environment to infer the relationship between these two attributes and to predict over regions of unsampled locations to build thematic species distribution maps (Gelfand et al., 2006; Elith and Leathwich, 2009). Species distribution modeling is, thus, directly related to inferring species’ responses to environmental factors (Latimer et al., 2006). This is traditionally done using gen- eralized linear or additive models as well as an array of machine learning approaches such as regression trees or maximum entropy modeling (Elith and Leathwich, 2009). These approaches model each species separately and cannot account for species interac- tions nor shared responses to the environment. However, species interaction with other species is potentially as important factor as its response to environment. Moreover, in many practical situations, data from species can be patchy or scarce in which case sharing information between species can significantly improve models’ predictive per- formance (Ovaskainen and Soininen, 2011; Clark et al., 2014; Hui et al., 2013; Thorson et al., 2015; Hartmann et al., 2017). For these reasons, joint species distribution models (JSDM) have gained increasing attention in recent years (Warton et al., 2015). Dunstan et al. (2013) and Hui et al. (2013) introduced species archetype modeling where species’ responses to the environment are clustered into few archetype models. Pollock et al. (2014) proposed to use the multivariate probit regression model (Chib and Greenberg, 1998) to describe geographical co-occurrence patterns between frogs and eucalyptus trees in Australia and Clark et al. (2014) built a JSDM to infer richness and loss of species under climate change scenarios. Thorson et al. (2015) introduced a spatial latent factor model to predict spatial distribution of breeding birds and rock fish communities and recently, Ovaskainen et al. (2017) introduced the hierarchical modeling of species communities (HMSC) framework which includes detailed description of interspecific correlations in covariate responses and spatial random effects. Current JSDMs rely on the classical framework of hierarchical generalized linear models (GLMs). Even though this approach allows flexible modeling by describing the randomness of response variables with different probabilistic models (Nelder and Wed- derburn, 1972), it is still limited by its parametric assumptions. Hence, it may fail to accurately describe a species’ response to environmental conditions (Vanhatalo et al., 2012; Kotta et al., 2019). Here, we propose a semi-parametric JSDM model represented with multivariate Gaussian processes (GPs). GPs are flexible semi-parametric regres- sion models where the regression function is estimated without restrictive parametric assumptions about its form (O’Hagan, 1978; Rasmussen and Williams, 2006). Our model integrates the main strengths of semi-parametric single species models (Vanhatalo et al., 2012; Golding and Purse, 2016) and generalized linear model based JSDMs. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 3 First, we model the species responses to environmental covariates with additive mul- tivariate GPs. This allows us to capture wide range of nonlinear responses and share in- formation about these responses between species. Second, due to the hierarchical model structure, the model can simultaneously accommodate several kinds of outcome vari- ables (observations/measurements) by assigning different types of probabilistic models for them. This is important, since it allows us to exploit different types of measure- ments which commonly arise in real-case scenarios of multiple species surveys. Our presentation focuses in the mostly used probabilistic models in practice, the Bernoulli (Binomial) and Poisson with over-dispersion (Negative-Binomial). We also propose an efficient computational approach build around Laplace method (Golding and Purse, 2016; Vanhatalo et al., 2010). Lastly, we present a structured cross-validation scheme to validate and compare models’ performance in different kinds of prediction tasks. This paper is organized as follows. In Section 2 we introduce a motivating case study. In section 3, we introduce the additive multivariate GPs and in Section 4 we discuss its predictive properties and introduce the computational methods for inference. In Section 5, we illustrate the basic properties of the model through two simple examples and introduce the specific case study model. In Section 6 we present the case study results and we end by discussion and conclusions in Section 7. 2 Motivating case study: coastal species distributions in the Gulf of Bothnia As a motivating example, we study spring distribution of four fish species and three types of algae or macro-vegetation on the coastal region of the Gulf of Bothnia in the northern Baltic Sea. The Gulf of Bothnia is a brackish water basin between Sweden and Finland covering an area of approximately 600 × 120 km. Its coastal areas play a central role in the ecosystem and many Baltic fish stocks are dependent on the coastal regions for their reproduction (Veneranta et al., 2013; Kallasvuo et al., 2017). Coastal zones are also the most sensitive regions of the Baltic sea to both natural variation and anthropogenic pressures (Reusch et al., 2018). Hence, these areas are of central importance for conservation and there is need for detailed knowledge on the distribution of coastal species and predictions concerning their response to environmental changes. 2.1 Case study species and data The studied fish species are the juvenile or adult three-spined stickleback (Gasterosteus aculeatus ) and nine-spined stickleback (Pungitius pungitius ) as well as larvae of white- fish (Coregonus lavaretus ) and vendace (Coregonus albula ). Both whitefish and vendace are commercially important species and the sticklebacks are one key fish fauna in the Gulf of Bothnia ecosystem (Bergstr¨ om et al., 2015). We treat the studied vegetation and algae in functional group level comprising of diatomous algae, filamentous and macro- vegetation. These are the dominating groups of benthic vegetation in shores where larvae of Coregonids (whitefish and vendace) occur at early developmental stages. In the scale imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 4 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 Figure 1: The study region, Gulf of Bothnia, and locations for species data. The region is divided into five subregions (I-V) to be used in cross-validation. The environmental conditions (described by the environmental covariates) are heterogeneous between these regions (Veneranta et al., 2013) which allows to test models’ extrapolation performance. of Gulf of Bothnia, their occurrence also reflects the salinity, nutrient balance and wind exposure of studied area (Veneranta et al., 2013). The case study species reflect the large scale environmental gradients and the changes in the Baltic Sea environment in last decades. Sticklebacks in the Baltic Sea use the shallow coastal zone for reproduction (Bergstr¨ om et al., 2015) and high abundances of sticklebacks in the Baltic Sea have been positively correlated with high occurrence of filamentous algae (Eriksson et al., 2009, 2012). Coregonids in coastal areas prefer more oligotrophic waters (Veneranta et al., 2013). Whitefish and vendace reproduce in stony reefs and islets of Baltic Sea in late autumn and the larvae hatch at ice break-up in spring. In sheltered areas the overwintering reeds (Phragmites australis ) dominate the macro-vegetation in spring. Diatomous algae consist of several epiphytic species that have an early spring bloom at ice break up and settle rapidly over bottom when light and water temperature increase (Busse and Snoeijs, 2002). Filamentous algae in this study consist mostly of Pilayella sp. It is a fast growing annual species that dominates the exposed shores at Baltic Sea in early spring (R¨ onnberg and Bonsdorff, 2004). The whole study area was divided into sampling subareas (Figure 1). Within each sub-area, data were collected at several sampling sites distributed so that they covered the range of most important environmental covariates in that sub-area. Sampling was imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 5 Variable Description Resolution [m] Openness The average openness and exposure to winds 300 Distance to deep Distance to 20 m depth curve 200 Sandy bottom index Area weighted distance to the sandy shores 90 Ice breakup week The end date (weeks) of ice cover in 2009 1852 Chlorophyll-a Chlorophyll-a concentration 1852 River Distance to the nearest river mouth 150 Bottom class Bottom classification to 6 categories 200 Winter salinity Length of ice winter 10,000 Table 1: Environmental covariates used in the case study and their original resolution. For this study, the resolution of all the environmental covariates was scaled to 300m. conducted in 2009-2011. At each site sampling was done approximately one week after the ice break. In the scale of Gulf of Bothnia, the ice break-up happens in a period of approximately one month. The species data used in this work comprise of 160 distinct sampling sites. In 70 sites (2010 data) all species were sampled and the rest of the sites comprise samples for the fishes only (Figure 1). Fish samples comprise the number of caught fish together with information on the sampling effort at each site. The effort was measured as the volume of sampled water (m ). A more detailed description of the data collection is provided by Veneranta et al. (2013) For plant data, the bottom was photographed at 5-13 locations at depth of 30 cm within a distance of 2 m and parallel to shore line. The occurrence of a species was recorded at 16 uniformly distributed points in all these photographs. The case study plants can grow over each others so that presence of one plant at a spatial location does not exclude the presence of other plants. The whitefish and vendace data were previously analyzed by Veneranta et al. (2013) and Vanhatalo et al. (2012) but the data on other species are unpublished. 2.2 Environmental covariates Gulf of Bothnia hosts rich variety of environmental conditions. Coastal areas are affected by inflows from land as well as shallow and complex topography. In the scale of Gulf of Bothnia, there is a gradient in river influence, salinity, temperature and length of ice cover period from north to south. We used seven real-valued and one categorical abiotic environmental covariates that were available throughout the study area as raster maps. Each species observation was combined with covariates from the raster cell where the sample location fell in. These are summarized in table 1 and described and motivated in detail by Veneranta et al. (2013). Due its fresh water origin vendace is a priori sensitive to even small changes in salinity levels experienced in the Gulf of Bothnia whereas the other species respond to salinity only in the Baltic Sea scale. Hence, we used all covariates only for vendace but excluded the winter salinity from other species. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 6 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 3 Hierarchical multivariate species distribution model We start our model building following the generic hierarchical structure similar to the one presented by, e.g., Wikle (2003), Cressie and Wikle (2011) and Banerjee et al. (2015) [data|process, parameters] : π (y(x, s)| f (x, s),η) [process|parameters] : π (f (x, s)|θ) [parameters] : π(η,θ) (3.1) where the first layer of hierarchy is the probabilistic model which defines the conditional distribution for multivariate data Y (x, s), at spatial location s with associated covariates x, given the model parameters η and the multivariate latent process f (x, s). The second layer defines the prior distribution for the latent process given the process parameters θ, and the third layer defines the prior distribution for all unknown parameters. Let j ∈ J = {1,··· , J} be the species index set and J the total number of species in the study. Denote by Y (x, s) = [Y ··· Y ] the J -variate random vector with com- 1 J th T ponents Y = Y (x , s) related to the j species at spatial location s = [s s ] (coordi- j j j 1 2 nates) under the influence of environmental covariates x = [x ··· x ] where c is the j,1 j,c j j j number of environmental covariates for the species j. We denote by x the full vector of covariates and X the covariate space. The species specific covariates x are sub-vectors of x such that x ∈ X where X ⊂ X and X is c -dimensional. Here, we assume that j j j j j T 2 J given a multivariate latent process f (x, s) = [f (x , s)··· f (x , s)] : X × R → R , 1 1 J J the probabilistic model for Y = Y (x, s) factorizes as follows π (y(x, s)| f (x, s),η) = π (y | f , η ) (3.2) Y Y j j j j=1 where f = f (x , s) is fixed but an unknown realization of the j’th process with covari- j j j ates x at the spatial location s. η is the vector of parameters of the probabilistic model j j π for the species j and η = [η ··· η ] . In general any probabilistic model for data Y 1 J could be used and the observation models should be chosen according to the assumed sampling process. Here, we consider in detail the Binomial and the Negative-Binomial (over-dispersed Poisson) models used in our case study (Section 2). These models can be seen as observation models resulting from inhomogeneous point process model for species abundance (Gelfand et al., 2006; Warton and Shepherd, 2010). In practice, each sampling site consists of a small area where the observations are made. In our case study, the area covered by a sampling site is so small compared to the whole study region that the latent function is practically constant in each sampling site. The model for the count of species presences at uniformly distributed points in a sampling site (plants in our case study) is then Binomial given by (Gelfand et al., 2006) z (s) y z (s)−y π (y|f (x, s), z (s)) = p(f (x, s)) [1− p(f (x, s))] I (y) (3.3) Y B {0,...,z (s)} B B where z (s) is the total number of observation points in the sampling site at location s and p(f (x, s)) is the success probability, which will be modeled through the logit here. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 7 The natural model for the number of individuals in a closed area or volume in a sampling site (fish in our case study) is Poisson which we extend to over-dispersed Poisson using the Negative-Binomial given by (see Liu and Vanhatalo, 2018, Section 7.1 for details) r y Γ(r + y) r m(x, s) π (y|f (x, s), z (s), r) = I (y) (3.4) Y N N N 0 y!Γ(r) r + m(x, s) r + m(x, s) where m(x, s) = z (s) exp[f (x, s)] = E(Y |f (x, s), z (x), r) and r is the over-dispersion N N N parameter. The latent process f (·) corresponds to the logarithm of a species (relative) density and z (s) is the sampled volume of water in the sampling site at location s. For this parameterization V (Y |f (x, s), z (x), r) = m(x, s) + m(x, s) /r. Increasing r N N decreases the variance and when r → ∞, the model approaches the Poisson distribution. 3.1 Univariate additive latent Gaussian process First, we assume that marginally for any species j, the process model is given by f (x , s) = β + h (x ) + ε (s) (3.5) j j j,0 j j j 2 2 where β is the offset weight with distribution β |σ ∼ N (0, σ ) and h : X → R j,0 j,0 j j j,0 j,0 is a predictor function of environmental covariates. The predictor function is assumed to be additive over the covariates, h (x ) = h (x ), and each additive term is j j j,r j,r r=1 given an independent zero mean GP prior, so that h (x )|θ ∼ GP 0, k (x , x ;θ ) (3.6) j j j h j,r j,r j,r j,r r=1 0 0 where k (x , x ;θ ) = Cov(h (x ), h (x )|θ ) is a covariance function with h j,r j,r j,r j,r j,r j,r j,r j,r j,r T T T 0 parameter θ and θ = [θ ···θ ] . For example, with k (x , x ; θ ) = x j,r j h j,r j,r j,r j,1 j,c j,r j,r 0 2 2 x σ and θ = σ , the predictor function corresponds to linear model h (x ) = j,r j,r j,r j,r j,r j,r x β where β ∼ N (0, σ ) (Rasmussen and Williams, 2006). With other choices j,r j,r j,r j,r of covariance functions we can model non-linear responses in which case the model can be seen as an alternative to the traditional generalized additive models (GAMs, Hastie and Tibishirani, 1986). The general form of an additive GP prior would include also joint effects of covariates (Duvenaud et al., 2011) but this is not considered here. The term ε (s) is a spatial GP, 2 0 2 ε (s)|σ ,` ∼ GP 0, k (s, s ;` , σ ) (3.7) j j  j j j j 0 2 2 where k (s, s ;` , σ ) is a spatial covariance function with variance σ and range param- j j j eter ` . When we consider independent models for each species, that is the traditional single species SDMs (SSDMs), the processes h and  are mutually independent among j,r j all species. The model outlined this far is similar to the GP based species distribution models used by, e.g., Vanhatalo et al. (2012), Kallasvuo et al. (2017) and Golding and Purse (2016). Next, we introduce models which consider dependency. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 8 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 3.2 Additive multivariate Gaussian process priors In order to account for possible species interdependence, we first include spatial species interaction into the model through the linear model of coregionalization (LMC) (Mardia and Goodall, 1993; Gelfand et al., 2004). Write ε(s) = [ε (s)··· ε (s)] and assume that 1 J ε(s) has LMC covariance structure with species specific correlation functions k (·,· ;` ). The covariance structure of the LMC with interspecific spatial dependence is then, 0 0 0 Cov ε (s), ε 0 (s )|Σ ,` = u (j, j )k (s, s ;` ) (3.8) j j  ,l  l l=1 T T T 0 0 T with ` = [` ···` ] and u (j, j ) the entry (j, j ) of U = L L where L is ,l ,l ,l ,l 1 J ,l th the l column of the Cholesky decomposition L of the coregionalization matrix Σ , that is matrix of interspecific correlations between spatial GP. Hence, the vector-valued latent process f (x, s) = [f (x , s)··· f (x , s)] unconditional on β ,. . .,β follows a 1 1 J J 1,0 J,0 multivariate GP which we denote as c J X X 0 0 f (x, s)|Λ ∼ MGP 0, Σ + k (x , x ;θ) + k (s, s ;` )U (3.9) 1 0 h r  j ,j r r j r=1 j=1 T T T 2 2 0 where Λ = (Σ , Σ ,θ,`), Σ = diag(σ , . . . , σ ), θ = [θ ···θ ] and k (x , x ;θ) = 1 0  0 h r 1,0 J,0 1 J r r 0 0 diag k (x , x ;θ ),··· , k (x , x ;θ ) . If the predictor functions, h , were h 1,r 1,r h J,r J,r j,r 1,r 1,r J,r J,r linear, the prior (3.9) would correspond to traditional multivariate spatial model with independent linear effects and spatial LMC (Gelfand et al., 2004). We extend (3.9) to an additive multivariate GP prior where the dependence between the species specific additive predictor functions is modeled with LMC. We demonstrate this with a model where the set of covariates is equal for all species, that is, dim(X ) = c and x = x for all j. Then the model is written as c J J XX X 0 0 ˜ ˜ f (x, s)|Λ ∼ MGP 0, Σ + k (x , x ;θ )U + k (s, s ;` )U 2 0 h r r j,r h ,j j j ,j j,r r=1 j=1 j=1 (3.10) where Λ = (Λ , Σ , . . . , Σ ) and Σ is the interspecific covariance matrix between 2 1 h h h 1 c r the species specific predictor functions of r’th covariate, k (·,·;θ ) is a correlation hj,r j,r T th function related to the predictor function h , U = L L and L is the j j,r h ,j h ,j h ,j r r r h ,j column of the Cholesky decomposition L of Σ . When the set of covariates is not the h h r r same for all species, covariate specific predictive functions are correlated only between species that share those same covariates. A JSDM with multivariate GP prior (3.10) can be interpreted as extension of GP based SSDMs (Vanhatalo et al., 2012; Golding and Purse, 2016) to joint species mod- eling similarly as done with the generalized linear model based JSDMs (Pollock et al., 2014; Ovaskainen et al., 2017). The enhanced inferential ability of the multivariate ad- ditive GP compared to univariate GP models lies in its capability to infer similarity imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 9 (dissimilarity) in species specific responses to covariates and in the spatial random ef- fect. The similarity/dissimilarity of responses of two species j and j along a covariate r is indicated by a positive/negative value for correlation and hence, examining the LMC covariance matrices, Σ , can provide new insight to species to species associations. 3.3 Marginally uniform priors for correlation parameters Here, we define the prior for coregionalization covariance matrices but define the co- variance functions and rest of the priors in Section 5. A standard choice for prior for correlation matrices is the inverse Wishart distribution. However, if there is no prior information on interspecific correlations, we follow Hartmann et al. (2017) and suggest to use marginally uniform priors for the LMC covariance matrices Σ and Σ . These can be achieved by the distribution of Barnard et al. (2000) and Tokuda et al. (2012). Let P be a correlation matrix with elements ρ . A prior distribution that is marginally j,j noninformative for the correlations, that is, the marginal distribution for every ρ is j,j uniform over (−1, 1), is achieved with the distribution v J Γ( ) 1 v 2 (v−1)(J−1)−1 − 2 2 π(P|v) = |P| |P | 1 (detP ) (3.11) jj (0,∞) Γ ( ) j=1 when v = J − 1. Here, Γ (·) is the multivariate gamma function and matrix P is J jj th th obtained by removing the j column and j row ofP . When v increases, the probability density (3.11) becomes increasingly concentrated around the origin. 4 Posterior inference and predictive properties Given a set of species observations at locations s , i = 1, . . . , n , respectively for each i j j species j = 1, . . . , J , the likelihood can be written as J j Y Y π(y| f,η) = π (y | f , η ) (4.1) j j,i j,i j j j j=1 i =1 where y is the i ’th observation of species j at the i ’th spatial location s asso- j,i j j j i T T T ciated with a set of covariates x ∈ X . The observed vector y = [y ··· y ] with i j j 1 J T T T T y = [y ··· y ] collects all the species observations and f = [f . . . f ] , where j,1 j,n j j 1 J f = [f ··· f ], collects the corresponding latent variables. Hence, the likelihood j,1 j,n j j factorizes over the latent variables and the inference can be done similarly as with uni- variate GP models. Markov chain Monte Carlo (MCMC) would provide exact answers in the limit of large sample size but deterministic approximations such as Laplace approx- imation or expectation propagation algorithm have also been shown to provide accurate approximate inference for univariate GP models with much lower computational time (Nickisch and Rasmussen, 2008; Rue and Marino, 2009; Vanhatalo et al., 2010). In order to study the properties of the model, we will examine the Laplace approximation for the posterior of latent variables conditional on the hyperparameters. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 10 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 4.1 Posterior predictive inference conditional on hyperparameters Posterior of latent variables The Laplace’s method approximates the conditional posterior of the latent function values f = f (s , x ) at the spatial location s with covariates x as ∗ ∗ ∗ ∗ ∗ −1 −1 −1 π(f | y,η, Λ) ≈ N f |C C f, C − C (C + W) C (4.2) ∗ ∗ ∗,f ∗ ∗,f f,∗ where f is the (conditional) maximum a posterior (MAP) estimate of latent variables, J j X X f = arg max log π (y | f , η ) + logN (f | 0, C ) (4.3) j j,i j,i j j j j j f∈R j=1 i =1 and W is the Hessian matrix of the negative log-likelihood function evaluated at f . Here, C is the prior covariance matrix of f , C is the (prior) covariance matrix between ∗,f elements of f and f . C is the prior covariance of f . In case of multivariate additive ∗ ∗ ∗ GP (3.10) the prior covariance matrix is given by     ˜ ˜ σ J ··· 0 u (1, 1){K } ··· u (1, J ){K } n h ,j h 1,1 h ,j h 1,J 0 1 j,r j,r c J r r XX  . .   . .  . . . . . . . . C = +     . . . . . . 2 r=1 j=1 ˜ ˜ 0 ··· σ J u (J, 1){K } ··· u (J, J ){K } 0 J h ,j h J,1 h ,j h J,J j,r j,r r r   ˜ ˜ u (1, 1){K } ··· u (1, J ){K } ,j  1,1 ,j  1,J J j j   . . . . . + (4.4)   . . j=1 ˜ ˜ u (J, 1){K } ··· u (J, J ){K } ,j  J,1 ,j  J,J j j 0 0 0 where J is an n×n matrix full of ones u (j, j ) and u (j, j ) are the entry (j, j ) of U n ,l h ,l ,l ˜ ˜ 0 0 and U (see (3.8)), and{K } and{K } are correlation matrices with elements h ,l  j,j h j,j r l l,r ˜ ˜ ˜ ˜ [{K } 0 ] = k (s , s ;` ) and [{K } 0 ] = k (x , x |θ ). The j,j i ,i 0 ,l i i 0 l h j,j i ,i 0 h i ,r i 0,r l,r l j j l,r j l,r j j j j j other correlation matrices are constructed similarly. If data on all species are available at all sampling locations, the covariance matrix reduces to Kronecker product similarly as in LMC models by Mardia and Goodall (1993) and Gelfand et al. (2004), so that P P P c J J ˜ ˜ C = Σ ⊗ J + U ⊗ K + U ⊗ K where n = n for all 0 n h ,j h ,j  j r=1 j=1 j,r j=1 j j,r j = 1, . . . , J . Since, in general, the second and third term of C are full matrices, it can be seen from (4.2) and (4.3) that the posterior of f and the posterior predictive distribution of f are affected by observations of all species at any spatial locations. The marginal posterior of additive terms Typically, we are also interested in species’ responses along covariates encoded by the additive predictor functions. Their marginal posteriors, conditional on hyperparame- ters, are analogous to (4.2). The matrices C and C are only constructed using the ∗ ∗,f In case of Gaussian observation model, this equals to the true posterior density of f | y,η, Λ. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 11 correlation functions corresponding to the latent function of interest. For example, to study the response of species j along covariate x we evaluate the posterior predictive distribution for h (x ) using (4.2) so that we replace C j,r r ∗,f h i ˜ ˜ C = u (j, 1){K } ,··· , u (j, J ){K } , (4.5) h ,f h ,l h j,1 h ,l h j,J j,r r l,r l,r l=1 and similarly for C . However, in case of the traditional LMC (3.9) only the spatial random effects are correlated between species whereas the predictor functions are not. 0 2 0 Hence, Σ is diagonal for all r so that u (j, j ) = σ if l = j = j and zero otherwise. h h ,l r r l The second terms of (4.4) is then block diagonal and only the j’th block column in (4.5) is non-zero for which reason there is no information exchange between species specific predictor functions. In case of univariate GP prior all the terms in (4.4) are block diagonal and (4.2) reduces to J independent Gaussian distributions with no information exchange among species at all. Hence, the predictive performance of the univariate GP and the two multivariate GP models are very different. This is illustrated in section 5. The (marginal) posterior expectation and variance for new observations When predicting species occurrence probability or abundance, we need to marginalize over the posterior of the latent variables. The posterior expectation and variance for a new outcome for species j at a location (x , s ) conditional on hyperparameters are ∗ ∗ E[Y (x , s )| y,η, Λ] =E[E(Y (x , s )|f (x , s ), y,η, Λ)] (4.6) j ∗ ∗ j ∗ ∗ j ∗ ∗ V[Y (x , s )| y,η, Λ] =E[V(Y (x , s )|f (x , s ), y,η, Λ)] + j ∗ ∗ j ∗ ∗ j ∗ ∗ V[E(Y (x , s )|f (x , s ), y,η, Λ)] j ∗ ∗ j ∗ ∗ When the probabilistic model for species j is assumed to be the Negative-Binomial or Binomial model with logistic link function (3.3) we can find either an exact result or an analytical approximation for these expectations and variances. These are given in the Supplementary material. These solutions speed up the predictive calculations considerably compared to numerically integrating f over R. Conditional scenario predictions In some applications, one might be interested in scenario predictions conditional on changes in species composition. For example, one might be interested in how removing from or introducing specific species into an area would affect other species. In our case study setting, we could be interested in, for example, effects of management actions that would clean filamentous algae from the shoreline. Such scenario predictions would be naturally tackled with predictive causal inference (Lindley, 2002; Pearl, 2009). In predictive causal inference the parameters of the model are inferred with the available data so far and predictions made considering alternative scenarios. To illustrate this lets first introduce a short notation y = y (x , s ) and f ∗ ∗ J ,∗ J J ,∗ 1 1 1 = f (x , s ) where J denotes the set of species to be predicted and J the scenario ∗ ∗ 1 2 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 12 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 species assumed to be “managed”. For brevity, we omit the conditioning on hyperpa- rameters and data. The conditional distribution of Y (x , s )| y is J ∗ ∗ 1 J π(y | y ) = π(y | y , f )π(y , f )df /π(y ) (4.7) J ,∗ J ,∗ J ,∗ J ,∗ J J ,∗ J 1 J 1 1 J 1 2 1 2 2 2 where Y (x , s )| y , f only depends on f . The Laplace approximation for this J ∗ ∗ J ,∗ J ,∗ 1 J 1 1 conditional predictive distribution is shown in Supplementary material. 4.2 Parameter inference We used Laplace approximation (Vanhatalo et al., 2010) to approximate the conditional posterior of latent variables π(f | y,η, Λ) and the marginal likelihood of the hyperparam- eters π(y|η, Λ) = π(y| f,η)π(f |Λ)d f . We searched for the (approximate) maximum a posterior (MAP) estimate for hyperparameters given by (η, Λ) = arg max log q(y|η, Λ) + log π(η, Λ). (4.8) η,Λ where log q(y|η, Λ) is the Laplace approximation for the log marginal likelihood for pa- rameters. This approach produces also good approximation for the posterior predictive distribution for latent variables (Vanhatalo et al., 2010; Vehtari et al., 2016), which are the main interest in this work. Hence, we fixed hyperparameters to their MAP estimate. In order to avoid constrained optimization, all the parameters were transferred to unconstrained space. We used log transformation for covariance function parameters, and for the interspecific correlation matrices we used the transformation presented by Kurowicka and Cooke (2003) and Lewandowski et al. (2009), which is a bijective map- ( ) ping between the space of correlation matrices and R . This is summarized in Sup- plementary material. We used scaled conjugate gradient optimization for locating the MAP and checked carefully that a (local) mode had really been found by verifying that gradients along all hyperparameters were zero. The required gradients of log q(y|η, Λ) were solved analytically as described by Rasmussen and Williams (2006) and Vanhat- alo et al. (2010). The Supplementary material summarizes the derivatives w.r.t. to the covariance parameters in Σ , Σ , r = 1, . . . , c. 5 Experiments Here, we first illustrate the properties of the hierarchical multivariate GP models with two simple examples. These examples highlight particular properties of the proposed model. After this we introduce the model and analysis for our case study (Section 2). We implemented all the models using the GPstuff package (Vanhatalo et al., 2013). 5.1 Demonstration with simulated spatial data Figure 2 presents simulated data and posterior predictions for spatial modelling of two species. The model follows spatial LMC; that is model (3.9), where the covariate terms imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 13 Figure 2: Illustration of spatial multivariate GP prior for JSDM with two species and spatial only component. Crosses and dots are simulated data locations of species 1 (Binomial data) and species 2 (Negative-Binomial data). See Section 5.1 for discussion. are dropped out. The spatial correlation function used in this demo is the Gaussian 0 2 2 0 −|| s− s || /2l 0 k(s, s ) = e , where || s− s || is the Euclidean distance and l is the length- scale. The first row of subplots shows the posterior predictive probability of observing species 1 (E[Y ], Y (s) ∼ Binomial) and the second row shows the expected number of 1 1 species 2 (E[Y ], Y (s) ∼ Negative-Binomial). Plots (a) and (e) show the predictions 2 2 when the species observations are from the same region but not from exactly the same locations. In this case, the prediction of multivariate GP is similar to predictions by univariate GPs. Plots (b) and (f ) show the predictions when data are available on both species from the lower left corner and additionally data on species 1 is available from upper right corner. There is positive correlation between species, which has been inferred from the data in the lower left region, so the prediction for species 2 in upper right corner is informed by data on species 1 in that region. The last two columns illustrate the conditional scenario predictions (4.7). In the third column, the training data is the same as in the first column and the expected values in plots (c) and (g) show joint scenario prediction in new region of same size and shape. In this scenario, species in the new region were observed so that species 1 is known to be in the top right, and species 2 in the bottom left. The fourth column shows the conditional scenario prediction for both of the species separately so that the grey marks show the locations where the other species would be introduced in these scenarios. 5.2 Demonstration with time series of species abundances Next, we consider the classical predator-prey system containing annual population counts of hare and lynx in the northern Canada from 1845 to 1935 (Odum, 1953). The data is not spatially explicit since the observations are total counts over the region so we model it as time series. This illustrates the behavior of individual additive co- variate effect in the multivariate GP model (3.10) with time being the covariate. Let’s denote by Y and Y the population sizes of hare and lynx at time t, and assume that 1,t 2,t imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 14 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 Figure 3: The results from the analysis of hare and lynx interaction (predator-prey system) analyzed with single and joint SDM. Plots a) and c) show the posterior mean and 95% credible interval of the latent functions and plots b) and d) show the training data, test data and posterior expectation of observations. See Section 5.2 for explanation. Y |f (t) ∼ Poisson(exp(f (t)) and Y |f (t) ∼ Poisson(exp(f (t)). We compared two 1,t 1 1 2,t 2 2 alternative priors for the latent functions, one with independent GP priors and another with joint bivariate GP prior with the LMC structure. In order to compare models’ pre- dictive performances when some species have not been observed, we removed parts of the original data from the training set, the period of 1870-1900 for hare and 1850-1870 for lynx. Figure 3 displays the result of the data analysis. The model with bivariate GP prior clearly outperforms the independent model since it predicts better the test data points in periods where training data was removed. Moreover, its predictions have also smaller uncertainty than the predictions of independent GPs in those periods of time. 5.3 The case study on coastal species distribution Case study models The plant data are modeled with the Binomial observation model (3.3) and the fish data are modeled with the Negative-Binomial observation model (3.4). In order to test the effect of different model components and the effect of semi-parametric response functions versus standard quadratic response functions we compare the following models: 1) additive univariate GP predictor functions and univ. spatial random effects (3.5), 2) additive univariate GP predictor functions and LMC spatial random effect (3.9), 3) additive multivariate GP predictor function and LMC spatial random effect (3.10), 4) additive univariate GP predictor functions only (model (3.5) without  (s)), 5) univariate spatial random effects only (model (3.5) without h (x )), j j imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 15 6) additive univariate quadratic predictors and univariate spatial random effects (model (3.5) with h (x ) = x β and x includes the covariates and their squares), j j j j j 7) additive univariate quadratic predictors and LMC spatial random effect (model (3.9) with h (x ) = x β and x includes the covariates and their squares), j j j j j 8) additive multivariate quadratic predictors and LMC spatial random effects (model (3.10) with h (x ) = x β and x includes the covariates and their squares). j j j j j 9) additive multivariate quadratic predictors only (model (3.10) without (s) with h (x ) = x β and x includes the covariates and their squares). j j j j j In each model, the spatial random effects were given the M´ atern covariance function 0 2 2 0 − 3r (s,s ) with ν = 3/2 degrees of freedom k (s, s ; σ ,` ) = σ 1 + 3 r (s, s ) e j j j j j where σ is the variance parameter, ` = [` ` ] is the vector of length-scales j,1 j,2 j j 0 0 T 2 −1 0 1/2 and r (s, s ) = [(s− s ) (diag(` ) ) (s− s )] . The continuous covariate effects in j j the additive GP models (models 1-4) were given the Gaussian correlation function 0 2 2 0 −||x −x || /2l ˜ r r j,r k (x , x ) = e . The additive linear models (models 6-8) were coded h ,r r j r 0 0 2 as additive GPs with k (x , x ) = x x σ . Even though this is not a proper correla- h r r j,r r r j,r tion function, when used in LMC it implies a generalized linear model with interspecific dependencies between weights, β , that are the key components in state-of-the-art para- metric JSDMs. See Discussion for details. For the categorical covariate (Bottom class, 0 2 0 0 table 1) we used a delta function so that k (x , x ;θ) = σ δ (x ), where δ (x ) = 1 h r x x j,r r δ r r r r if x = x and zero otherwise. This corresponds to having an own intercept for each category. We used the marginally uniform priors (section 4.1) for the between species correlations and weakly informative priors for the rest of the parameters. The variance parameters were given Half -Student-t (μ = 0, σ = 4, ν = 4) priors and the length-scale parameters were given Half -Inverse-Student-t (μ = 0, σ = 1, ν = 4). Hence, a priori more weight is given for smooth functions with small variability. Model validation and comparison We aim to provide models that give reliable posterior predictions. Hence, it is natural to compare models with the goodness of their posterior predictive distributions. This can be done efficiently with cross-validation (CV) using the log predictive density diagnostics (Vehtari and Ojanen, 2012). Let D denote the full data-set. Fix K disjoint sub-sets of D, say D , . . . ,D , such that their union is D. The K-fold CV log point-wise marginal 1 K predictive density is then L (D) = log π(y |D ) where D = {∪ D : K i ∼k(i) ∼k(i) r i=1 r=1 r 6= k(i)} and k(i) is such that y ∈ D . We compare the models with the leave-one- k(i) out CV (LOO-CV, K = n) and structured 5-fold CV. We used the MAP estimate for the hyperparameters. In 5-fold CV we found the MAP for each training set separately. The LOO-CV was conducted at the MAP found with full data and we used Laplace approximation to approximate the LOO-CV log predictive densities (Vehtari et al., 2016). Laplace approximation for LOO-CV is shown to work well in GP models and, since our data is rather large, leaving only one data point out of the training set has only negligible effect on the posterior of the hyperparameters (Vehtari et al., 2016). The rationale for calculating both LOO-CV and 5-fold CV comparison is the follow- ing. Since multiple species were sampled at every sampling site and each of the sampling imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 16 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 GP models LOO Cross Validation 5-fold Cross validation 1) Univariate GP h(x), univ. (s) -2.230 (0.082) -2.465 (0.094) 2) Univariate GP h(x), multiv. (s) -2.081 (0.080) -2.480 (0.100) 3) Multiv. GP h(x), multiv. (s) -1.669 (0.080) -2.316 (0.086) 4) Multiv. GP h(x) only -2.228(0.089) -2.590(0.012) 5) Multiv. (s) only -2.351(0.087) -2.500(0.086) 6) Univ. quadr. h(x), univ. (s) -2.262 (0.083) -2.517 (0.095) 7) Univ. quadr. h(x), multiv. (s) -2.117 (0.084) -2.484 (0.109) 8) Multiv. quadr. h(x), multiv. (s) -2.105 (0.084) -2.416 (0.099) 9) Multiv. quadr. h(x) only -10.246(1.297) -9.305(1.008) Table 2: Model comparison with leave-one-out (LOO) and 5-fold cross validation using the mean point wise log marginal predictive density statistics (and its standard error) over all species. see Section 5.3 for model descriptions. sites has other sites spatially nearby it, the LOO-CV log predictive densities are affected significantly by the spatial random effects. The LOO-CV, thus, measures models’ inter- polation performance which can be good even if the models were not able to represent well responses along covariates (predictor functions) (Vanhatalo et al., 2012; Veneranta et al., 2013). For this reason, we structured the 5-fold CV by dividing the data into five spatially distinct groups corresponding to regions I-V in Figure 1. The sampling sites in different groups are spatially so far from each others that the spatial random effects do not affect the posterior predictive distributions for test groups. Hence, the 5-fold CV tests mostly the extrapolation performance of a model, which is governed by the goodness of the predictor functions, whereas the LOO-CV tests the interpolation performance, which is governed also by the spatial random effects. 6 Results 6.1 Predictive performance of models Table 2 summarizes the CV comparisons over all species and tables 1-2 in Supplementary material show the models’ predictive performance for each species separately. Tables 3- 4 in Supplementary material show the pairwise comparison of the CV point wise log predictive densities between the best GP/parametric model (models 3 and 8) and the other GP/parametric models with both the covariate effects and spatial random effect (models 1-2 and 6-7). The results show that model 3, which includes multivariate GP predictors and multivariate spatial random effects (3.10), is the best in both LOO-CV −2 and 5-fold CV with a difference of 10 or more to the other models. Since the mean log predictive density is an average of n = 850 point-wise predictions the difference −2 of 10 corresponds to a difference of 8.5 in log (point-wise) joint predictive densities. Analogously to Bayes factors, which compare log prior predictive distribution, this can be considered a significant difference between two models (Kass and Raftery, 1995). Hence, the multivariate GP (model 3) has significantly better overall predictive performance over all species than the other models. According to posterior predictive imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 17 Figure 4: The maximum a posterior estimate of the correlation matrices in the multi- variate additive GP model with Gaussian covariance function (model 3). checks (Gelman et al., 2013) there are no serious discrepancies between its predictions and observed data. In extrapolation (5-fold CV) model 3 performs best also for all species separately. However, in interpolation (LOO-CV) it is not the best for all species separately (tables 1-2 in Supplementary material). Moreover, the semi-parametric GP models (models 1-4) work better than the corresponding parametric models (quadratic environmental response, models 6-9) in both interpolation and extrapolation in general. Dropping either spatial random effect or covariate effect out from the model decreases its performance clearly. All models work better in interpolation than in extrapolation and compared to univariate models including interspecific correlations either in spatial random effects or also in environmental predictors improves models’ performance in both of these tasks. Moreover, including interspecific correlations between environmen- tal responses, h (x ), improves the extrapolation performance relatively more than j,r r including interspecific correlations only for spatial random effects. Hence, multivariate spatial random effect improved more interpolation whereas multivariate predictors has larger effect for extrapolation as would intuitively be expected. 6.2 Effects of environmental covariates The interspecific correlations (Figure 4) show that the responses to environment are similar among Coregonids and among sticklebacks whereas there are clear differences among these groups. These fish groups respond differently to sandy bottom and distance to deep. Moreover, there is strong positive spatial correlation among Coregonids and among sticklebacks but not between these groups. All species have negative spatial correlation with filamentous algae whereas there is either weak positive or negligible spatial correlation between fish species and macrovegetation and diatomous algae. Figure 5 shows the marginal effects of the predictor functions for both univariate (SSDM) and multivariate (JSDM) GP models calculated as described in section 4.1. In general, the responses in JSDM and SSDM are similar. In most of the cases the uncertainty in response function is smaller (narrower credible regions) in JSDM than in SSDM although the differences are not large. Most of the responses are smooth but imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 18 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 Figure 5: Marginal latent responses along covariates, h . Grey corresponds to single j,r species GP model (model 1) and black to multivariate additive GP model (model 3). show patterns that would be hard to capture with a quadratic function. For example, three and nine spine stickleback, macrovegetation and filamentous algae show logistic style response to either distance to deep or openness. Moreover, the response of vendace on ice break up week is first constant but has very steep increase after week 16. The responses to ice break up or chlorophyll-a show non-linear and non-quadratic responses also for white fish, macrovegetation and filamentous algae. The MAP estimates of other hyper-parameters are summarized in Supplementary material. 6.3 Spatial predictions Figure 6 shows the distribution maps as posterior median of intensity and expected count of individuals in replicate sampling produced by SSDM (model 1) and JSDM (model 3) for vendace. The distribution maps for other species together with maps on posterior predictive variance are shown in Supplementary material. In broad scale the posterior median of the intensity looks similar with SSDM and JSDM whereas SSDM predicts larger species counts than JSDM for all species throughout the study region and the uncertainty related to SSDM predictions is larger than that of JSDM predictions. Both SSDM and JSDM predict that vendace is distributed mostly in the northern Gulf of Bothnia. However, SSDM predicts somewhat higher median intensity than JSDM also in the southern Gulf of Bothnia. In relation to median intensity similar pattern that imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 19 Figure 6: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for vendace predicted by SSDM (model 1) and JSDM (model 3). JSDM predicts more restricted distribution range than the single species model is seen also in the predictions concerning sticklebacks and whitefish. In case of diatomous and filamentous algae SSDM and JSDM predict the distribution pattern similarly whereas for macrovegetation JSDM predicts slightly larger distribution ranges. The posterior distributions of spatial lenght-scale parameters were concentrated near one kilometer or less (see table 5 in Supplementary material). Hence, the differences in distribution predictions cannot be explained by the spatial random effects over large areas but the spatial random effect explains local deviations from the predictions based on covariates. 7 Discussion and concluding remarks 7.1 Case study results The GP based SDMs had better predictive performance than the parametric SDMs and JSDMs had better predictive performance than the corresponding SSDMs. The differences between SSDM and JSDM are most apparent in the distribution maps and predictor functions. The JSDM predicted in general smaller distribution ranges than SSDMs (macrovegetation being the only exception) and the posterior uncertainty in their predictions were smaller than in SSDMs. When interpreting the results, it should be remembered that the sampling was tar- imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 20 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 geted to larvae of sea-spawning Coregonids (whitefish and vendace) and planned to cover their plausible distribution range in terms of spatial (coastal areas of Gulf of Bothnia) and temporal extent (spring). Other species were sampled alongside Coregonids and the sampling area covers only limited portion of their full distribution range in the spring. Moreover, the abundance and distribution of all the studied species vary annually. Fish change their distribution areas seasonally and their larval stages last only few weeks. Vegetation and algal cover vary due changes in temperature and ice effects. For these reasons the results are most representative for larvae of whitefish and vendace. For other species the results describe their spring distribution in the shallow coastal regions only. Our results correspond rather well to the earlier knowledge on the studied species. The responses to the environmental covariates and the interspecific correlations (Fig- ures 5 and 4) indicate that whitefish and vendace larvae are, in general, distributed in different areas than sticklebacks. Most of the literature on Baltic Sea sticklebacks focus on three-spined sticklebacks whereas the nine-spined stickleback is not well stud- ied. In early spring, stickleback abundances have been found to be highest in sheltered archipelago areas, where part of the population overwinter. High abundance of stick- lebacks are typically thought to indicate structural complexity on the bottom; such as the presence of stones and boulders as well as reeds and other macrovegetation that function as shelter and provide food (Peltonen et al., 2004; Sieben et al., 2011). On contrary, highest densities of whitefish and vendace larvae are observed in open sandy shores near deep areas and in structurally simple shores, without macrovegetation, boul- ders and stones (Hudd et al., 1988; Leskel¨ a et al., 1991; Veneranta et al., 2013). The distribution of vendace is highly influenced by ice break up week and salinity so that long ice cover period and low salinity increase their abundance. Ice winter is longer and salinity is lower in the northern than in the southern Gulf of Bothnia and thus it correlates positively to Coregonid presence (Vanhatalo et al., 2012). In general, optimal habitats for Coregonid larvae are mainly located in the northernmost Gulf of Bothnia. Sticklebacks are known to prey mainly on mesozooplankton, but also on grazers (Pel- tonen et al., 2004; Sieben et al., 2011). Sticklebacks feed also on fish larvae if available (Bystr¨ om et al., 2015), but there are no studies on potential predation risk to Corego- nids. Based on our results, in the scale of coastal region of Gulf of Bothnia, sticklebacks are not thread to Coregonid larvae since their high density areas do not overlap. More- over, there was no spatial correlation between sticklebacks and coregonids (Figure 4) which could indicate interspecific interaction of any kind. The presence of sticklebacks has been connected to higher eutrophication status and stickleback reproduction and growth benefit from increasing temperature and eutrophication (Lef´ ebure et al., 2011; Candolin et al., 2008; Meier et al., 2012) On contrary, these environmental characteris- tics are assumed to affect negatively the Coregonid reproduction (M.Cingi et al., 2010; Muller ¨ , 1992; Veneranta et al., 2013; Vanhatalo et al., 2012). In our results whitefish and vendace larvae have positive response to Chlorophyl-a, which is a strong indicator for increasing eutrophication. However, in the scale of Gulf of Bothnia Chlorophyl-a concentration is typically higher near river mouths where salinity is lower and nutrient inflow high. Hence the result more likely reflects the high river influence through low salinity than preference for eutrophicated water. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 21 The vegetation and algae distribution in our results reflect the nutrient status during winter and early spring as well as effect of ice cover and ice scraping in wind exposed shallow areas. Both the SSDMs and JSDMs indicate that filamentous algae occur in all coastal areas in high densities, except in some sheltered inner coastal areas. Macroveg- etation and diatomous algae had highest presence probabilities at areas with lower filament presence probability. This pattern agrees well with the general understand- ing of these species groups. Filamentous algae are typically distributed in wind and wave exposed shores, that epiphytic diatoms and reeds that require shelter cannot tol- erate. Filamentous algae can, however, grow over macrovegetation and it is possible that macrovegetation distribution is underestimated if filamentous algae growth over macrovegetation has hided macrovegetation from the sampling pictures. The higher nu- trient levels in sheltered areas have been found to affect positively reed belt growth, and high abundances of macrovegetation species have been found from archipelago areas, lagoons, bays and river inlets (Pitk¨ anen et al., 2013; Altartouri et al., 2014). The JSDM model reflects these smaller scale occurrences better than SSDM. In our results, diato- mous algae are more common in estuaries and northern parts of the study area (see Supplementary material). This is likely explained by longer ice winter towards northern Gulf of Bothnia and longer distance to deep water. 7.2 Multivariate additive Gaussian process Next we discuss some of the similarities and differences between our model and the existing JSDMs and highlight the novel methodological contributions in this paper. Interspecific correlations From the predictive point of view, the interspecific correlations in predictive functions are attractive for many reasons. In many applications of SDMs the aim is to pre- dict species distribution over regions that include locations spatially far from the data (Record et al., 2013) or to conduct scenario predictions related to, for example, climate change or land use (Guisan et al., 2013). In these applications, predictions are based on the responses to environmental covariates. The inclusion of interspecific dependence allows information flow between species which improves the estimates for predictive functions especially for species with only scarce data (Thorson et al., 2015; Ovaskainen et al., 2017; Hui et al., 2013; Clark et al., 2014). We used the marginally uniform priors of (Barnard et al., 2000; Tokuda et al., 2012) for the interspecific correlations. This is justified by prior ignorance on correlations. Prior information on interspecific correlations could, however, be added into model with, e.g., informative inverse Wishart prior. With many species inferring the full covariance matrix is hard in which case we could use spatially dependent latent factors which induce for the 0 0 0 spatial random effects a covariance structure Cov  (s),  (s ) = λ λ k (s, s ), j qj qj q q=1 where k is the q’th spatial covariance function and λ the corresponding species specific q qj factor loading. Here, M < J so that this covariance structure is a low rank represen- tation of LMC (Lopez, 2000; Lopez et al., 2008). Latent factor representation was first introduced to SDMs by Thorson et al. (2015) and it is used also in HMSC Ovaskainen imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 22 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 et al. (2017). Recently Taylor-Rodr´ ıguez et al. (2017) proposed a clustering scheme where the species are clustered to less than J factor loading vectors. Interspecific correlation between response functions has received less interest. Typi- cally the response functions are assumed to be independent among species. Exception are species archetype models (Dunstan et al., 2013) and the HMSC framework of Ovaskainen et al. (2017). In the latter, the response functions are defined as h (x ) = x θ where j j j j· T T T θ = [θ , . . . , θ ] is a J × p matrix of regression weights with hierarchical prior. The 1· J· species specific weights are given Gaussian prior θ ∼ N (μ , V ) where μ is the ex- j· j· j j· pected response of a species that can be common for all species, common within groups of species or modelled through species traits μ = τ γ where τ is a vector of trait jr r j values of species j and γ ∼ N (0, V ) are the effects of traits to response along covariate r γ 2 2 r. With V = σ I and common prior mean μ = μ ∼ N (0, σ ) for all j ∈ 1, . . . , J , J×J j· the induced covariance between species specific additive response functions is 0 0 0 Cov (h (x ), h 0 (x )) =E[Cov (x θ , x θ 0 )] + Cov (E[x θ ],E[x θ 0 ]) j,r r j ,r r jr j r r jr j r r r r 2 0 2 0 =(σ + δ (j )σ )x x j r μ r 0 T 0 2 0 0 0 and with trait dependent prior mean Cov (h (x ), h (x )) = (τ V τ +δ (j )σ )x x . j,r r j ,r γ j j r r j r Hence, these alternatives have the same covariance structure as an additive multivari- ate GP (3.10) with k = x x for all j = 1, . . . , J , and interspecific covariances hj,r r r 2 0 2 T 0 2 Σ = σ + δ (j )σ and Σ = τ V τ 0 + δ (j )σ . Hence, the hierarchical prior struc- h j h γ j j r μ r j ture of HMSC could easily be extended to semiparametric models as well if we had trait information from our species. The benefit would be that these hierarchical model structures contain ecologically relevant prior information and, hence, using the induced interspecific covariance structure in the additive multivariate GP model would make it ecologically more realistic and potentially improve its predictive performance. The HMSC framework has many other features as well, such as phylogenetic relationships Ovaskainen et al. (2017), which could in principle be incorporated with our approach as well. The structure of interspecific correlations is likely to be especially important in scenario based predictive analyses, such as climate change predictions. Since our model does not make any mechanistic assumptions on species interde- pendencies the results concerning interspecific correlations have to be interpreted with care. For example, a positive correlation between spatial random effects could arize due mutualism, predator prey association or similar response to unobserved environmental covariates. Hence, interpreting these correlations should be done in light of more gen- eral ecological knowledge on the species. This is, however, a common challenge with all current JSDMs (Thorson et al., 2015; Dunstan et al., 2013; Ovaskainen et al., 2017; Taylor-Rodr´ ıguez et al., 2017). Moreover, the property of our model, as well as most other JSDM, is that if the training data shows positive or negative spatial correlation between two species this positive correlation is assumed to hold everywhere in the study domain. In many cases this might be an unrealistic assumption for which reason one in- teresting future development would be to extend our models to allow species-to-species associations to depend on the environmental context(Fox and Dunson, 2015; Tikhonov et al., 2017). However, the colocalization patterns are explained by both environmental imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 23 covariates and spatial random effects and regional differences in colocalization patterns can be explained by different environmental covariates in different regions. The interspecific correlation in our approach are modelled on the latent variable level and, hence, are not directly comparable to interspecific correlations in data. Some authors have argued that this is a conceptual weakness of latent variable models since it makes interpretating their results hard and thus less transparent. An alternative would be to model the correlations in the observation error model (see, e.g., Ovaskainen et al., 2010; Pollock et al., 2014). Clark et al. (2016) proposed generalized joint attribute model (GJAM) to fix this interpretation challenge by aiming to model the correlations between species on the data scale. However, for example, in case of presence absence observations GJAM corresponds to multivariate probit model that is the marginal likelihood of our model in case of Bernoulli observation model. It is also unclear how to disentangle the process underlying the species occurrence and abundance from the observation process leading to the actual data in GJAM approach. Hence, despite the challenge of inter- pretating the correlations, we prefer the hierarchical latent variable modeling since it provides a coherent approach to separate these two processes. Response functions The response functions along environmental covariates provide basis to understand how environment affects species distribution and to predict the species distribution in new areas. Linear and other parametric models are efficient in finding overall trends in re- sponses but have trouble in locating abrupt changes and change points due, for example, physical tolerance limits. Such limits, for example, along temperature and salinity are typical for large variety of taxa in aquatic domains (MacKenzie et al., 2007; Kotta et al., 2019). With SSDMs Vanhatalo et al. (2012), Shelton et al. (2014), Golding and Purse (2016) and Kallasvuo et al. (2017) have demonstrated the benefits of semiparametric models in such situations. The GP approach for modeling environmental responses is similar to generalized additive models (Guisan et al., 2002) but the latter have not been implemented as JSDMs. The flexibility of semiparametric models provides also challenges compared to the more restricted parametric models. The prior for covariance function parameters has to be set with care (Golding and Purse, 2016) and we may also need to pose monotonicity constraints to the response functions Kotta et al. (2019) in order to prevent overfitting. Inferring the responses reliably requires typically more data compared to parametric models. The multivariate additive GP helps in these challenges as well since the interspecific correlations increase the effectively amount of data to be used to infer each response function and hence decreases the uncertainty in them. Additive multivariate GP framework opens also new questions related to the choice of covariance functions and interspecific correlations. A typical choice for general pur- pose covariance function is a radial basis function. However, predictions using these covariance functions revert to prior predictive distributions when predicting beyond covariate range covered by data. Hence, in extrapolation tasks stationary covariance functions may not be the optimal choice (Vanhatalo et al., 2012; Kallasvuo et al., 2017) and combining the multivariate GP models with functional constraints (e.g. Kotta et al., 2019) could improve their predictive performance further. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 24 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 Spatial random effects Our model comparison results show clearly that including spatial random effects into model improves their predictive performance in both inter- and extrapolation. This re- sult is well in line with earlier works demonstrating that environmental conditions alone may not sufficiently explain species distribution and spatial random effects improve their predictive performance (Latimer et al., 2006; Vanhatalo et al., 2012; Clark et al., 2014; Thorson et al., 2015; Kallasvuo et al., 2017). This is reasonable, since the distribution of species is shaped by the interplay of environmental covariates, stochastic processes and species interaction (Ovaskainen et al., 2017). Hence, a justified model should account for random processes and as demonstrated also by our results also to species interac- tions in these processes. However, adding spatial random effects into model may lead to problems in model identifibiality which we discuss in the next section. Posterior inference and predictive performance The JSDM built with multivariate additive GP had clearly the best overall predictive performance. Moreover, the uncertainty in the predictions by JSDMs were smaller than in the SSDMs which, together with the best log predictive density performance, indicates that the predictions were also more accurate. These findings have important implications to practical use of SDMs for management and other purposes. For example Kallasvuo et al. (2017) used SSDMs to classify coastal areas of Finland to unsuitable, suitable and highly productive spawning regions for four commercially important fish species. They based their estimates on the expected numbers of larvae in the coastal region which, as shown by Figure 6 is highly sensitive to the uncertainty of the predictions. Moreover, by using SSDMs we can overestimate the total biomass of all species (Clark et al., 2014). An inherent challenge with the type of models considered here is the model identifib- iality. Spatial random effects are known to affect the fixed effects estimates (Hodges and Reich, 2010) and in some cases the spatial random effect can actually capture variability otherwise explainable with fixed effects (Paciorek, 2010; Bose et al., 2018). Moreover, with uniform (uninformative) priors the length-scale and variance hyperparameters of Mat´ ern class of covariance functions are non-identifiable (Zhang, 2004) in general. In order to alleviate these identifibiality challenges we used weakly informative priors for the variance and length-scale parameters defined through the principles proposed by Gelman (2006), Simpson et al. (2017) and Fuglstad et al. (2018). Our priors for the covariance function parameters favor small variance and large length-scale parameter values. The former penalizes for large magnitudes and the latter penalizes for wiggly responses along covariates or space. In case of covariate responses this is ecologically justified since according to ecological niche theory species’ responses to environmental covariates are typically either monotonic or have only one mode (Ovaskainen et al., 2016). Moreover, Fuglstad et al. (2018) show that these type of priors work well for spa- tial random effects with Mat´ ern covariance functions in general. Favoring longer spatial length-scales is justified also from the model identifibiality point of view. Identifiability between fixed and spatial random effects is of less concern if spatial random effect varies with larger spatial range than the environmental covariates (Paciorek, 2010, Section 3). imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 25 Since the environmental covariates in Finnish coastal region vary with relatively small spatial range, our weakly informative prior effectively favors models that explain this variation with covariate responses instead of with spatial random effects. Our model comparison suggests that the models with both covariate effects and spatial random effects lead to most reliable inference on environmental responses and spatial random effects. A model’s extrapolation performance is governed mainly by the environmental covariates so better extrapolation performance indicates more reliable response func- tions. Similarly more accurate interpolation performance indicates more reliable joint effect of environmental covariates and spatial random effect. An evident challenge with multivariate additive GP is the computation. The core element in the multivariate additive GP is the covariance matrix induced by the GP components. With many species and sampling sites the covariance matrix can become so large that it makes the implementation infeasible. Here we used Laplace method to speed up the computation by decreasing the number of costly posterior density calculations compared to full MCMC. However, in order to scale up the methods for large data sets involving thousands of sampling sites, the model would need to be implemented even more efficiently. One option to reduce the computational time could be to exploit the property that linear LMC can be parameterized through species specific conditional distributions (Gelfand et al., 2004). We could also implement multivariate GPs with sparse GP approximations (Vanhatalo et al., 2010; Alvarez et al., 2010) and replace the full rank LMC model with latent factor model (Thorson et al., 2015; Ovaskainen et al., 2017). However, we leave these considerations for future. 7.3 Summary Our JSDM based on multivariate additive GP combines the key ideas from semi- parametric SSDMs and state-of-the-art parametric JSDMs. In our case study, it showed superior predictive performance compared to existing GP based SSDMs and paramet- ric SSDMs and JSDMs in both interpolation and extrapolation. Hence, the multivariate additive GP can be seen as the first step towards integration of the semi-parametric SSDMs and hierarchical JSDMs. We propose also an efficient approach for inference by utilizing Laplace approximation and gradient based optimization for hyperparame- ters. The multivariate additive GP model is not restricted only to species distribution modeling but can be used in wide variety of other applications as well. Supplementary Material The supplementary material contains additional mathematical formulation of the method- ology proposed in this paper and additional figures and tables for the case study analysis. (http://www.some-url-address.org/dowload/0000.zip). Acknowledgments This work has bee funded by the Academy of Finland (grant 317255) and Research Funds of the University of Helsinki (decision No. 465/51/2014). imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 26 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 References Altartouri, A., Nurminen, L., and Jolma, A. (2014). “Modeling the role of the close- range effect and environmental variables in the occurrence and spread of Phragmites australis in four sites on the Finnish coast of the Gulf of Finland and the Archipelago Sea.” Ecology and Evolution , 4: 987–1005. 21 Alvarez, M. A., Luengo, D., Titsias, M. K., and Lawrence., N. D. (2010). “Efficient multi- output Gaussian processes through variational inducing kernels.” JMLR Workshop and conference proceedings , 9: 25–32. 25 Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2015). Hierarchical Modelling and Analysis for Spatial Data . Chapman Hall/CRC, second edition. 6 Barnard, J., McCulloch, R., and Meng, X.-L. (2000). “Modelling covariance matrices in terms of standard deviations and correlations, with applications to shrinkage.” Statistical Sinica . 9, 21 Bergstr¨ om, U., Olsson, J., Casini, M., Eriksson, B. K., Fredriksson, R., Wennhage, H., and Appelberg, M. (2015). “Stickleback increase in the Baltic Sea – A thorny issue for coastal predatory fish.” Estuarine, Coastal and Shelf Science , 163: 134–142. 3, 4 Bose, M., Hodges, J. S., and Banerjee, S. (2018). “Toward a diagnostic toolkit for linear models with Gaussian-process distributed random effects.” Biometrics , 74(3): 863–873. 24 Busse, S. and Snoeijs, P. (2002). “Gradient responses of diatom communities in the Bothnian Bay, northern Baltic Sea.” Nova Hedwigia , 3-4: 501–525. 4 Bystr¨ om, P., Bergstr¨ om, U., Hj¨ alten, A., St˚ ahl, S., Jonsson, D., and Olsson, J. (2015). “Declining coastal piscivore populations in the Baltic Sea: Where and when do stick- lebacks matter?” Ambio, 44: 462–471. 20 Candolin, U., Engstr¨ omOst, J., and Salesto, T. (2008). “Humaninduced eutrophication enhances reproductive success through effects on parenting ability in sticklebacks.” Oikos , 117: 459–465. 20 Chib, S. and Greenberg, E. (1998). “Analysis of multivariate probit models.” Biometrika , 85(2): 347–361. 2 Clark, J. S., Gelfand, A., Woodall, C. W., and Zhu, K. (2014). “More than the sum of the parts: forest climate response from joint species distribution models.” Ecological Applications , 24(5): 990–999. 2, 21, 24 Clark, J. S., Nemergut, D., Seyednasrollah, B., Turner, P. J., and Zhang, S. (2016). “Generalized joint attribute modeling for biodiversity analysis: median-zero, multi- variate, multifarious data.” Ecological Monographs , 87(1): 34–56. 23 Cressie, N. and Wikle, C. K. (2011). Statistics for Spatial-Temporal Data . Wiley Series in Probability and Statistics. 6 Dunstan, P. K., Foster, S. D., Hui, F. K. C., and Warton, D. I. (2013). “Finite Mixture of Regression Modeling for High-Dimensional Count and Biomass Data in Ecology.” imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 27 Journal of Agricultural, Biological, and Environmental Statistics , 18(3): 357–375. 2, Duvenaud, D., Nickisch, H., and Rasmussen, C. (2011). “Additive Gaussian Processes.” Neural Information Processing Systems . 7 Elith, J. and Leathwich, J. R. (2009). “Species Distributions Models: Ecological Ex- planation and Predictions Across Space and Time.” The Annual Review of Ecology, Evolution and Systematics , 40(677-697). 2 Eriksson, B. K., Ljunggren, L., Sandstrom, A., Johansson, G., Mattila, J., Rubach, A. E., R˚ aberg, S., and Snickars, M. (2009). “Declines in predatory fish promote bloomforming macroalgae.” Ecological Applications , 19: 1975e1988. 4 Eriksson, B. K., Rubach, A., Batsleer, J., and Hillebrand, H. (2012). “Cascading preda- tor control interacts with productivity to determine the trophic level of biomass ac- cumulation in a benthic food web.” Ecological Research , 27: 203e210. 4 Fox, E. B. and Dunson, D. B. (2015). “Bayesian Nonparametric Covariance Regression.” Journal of Machine Learning research , 16(1): 2501–2542. 22 Fuglstad, G.-A., Simpson, D., Lindgren, F., and Rue, H. (2018). “Constructing priors that penalize the complexity of Gaussian random fields.” Journal of the American Statistical Association , 1(1): 1–8. 24 Gelfand, A. E., Jr, J. A. S., Wu, S., Latimer, A., Lewis, P. O., Rebelo, A. G., and Holder, M. (2006). “Explaining Species Distribution Patterns through Hierarchical Modelling.” Bayesian Analysis , 1(1): 41–92. 2, 6 Gelfand, A. E., Schmidt, A. M., Banerjee, S., and Sirmans, C. F. (2004). “Nonstationary multivariate process modeling through spatially varying coregionalization.” Test , 13(2): 263–312. 8, 10, 25 Gelman, A. (2006). “Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper).” Bayesian Analysis , 1(3): 515–534. 24 Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis . Chapman & Hall/CRC, third edition. 17 Golding, N. and Purse, B. V. (2016). “Fast and flexible Bayesian species distribution modelling using Gaussian processes.” Methods in Ecology and Evolution , 7: 598–608. 2, 3, 7, 8, 23 Guisan, A., Edwards, T. C., and Hastie, T. (2002). “Generalized linear and generalized additive models in studies of species distributions: setting the scene.” Ecological Modelling , 157(2-3): 89–100. 23 Guisan, A., Tingley, R., Baumgartner, J. B., Naujokaitis-Lewis, I., Sutcliffe, P. R., Tulloch, A. I. T., Regan, T. J., Brotons, L., McDonald-Madden, E., Mantyka-Pringle, C., Martin, T. G., Rhodes, J. R., Maggini, R., Setterfield, S. A., Elith, J., Schwartz, M. W., Wintle, B. A., Broennimann, O., Austin, M., Ferrier, S., Kearney, M. R., Possingham, H. P., and Buckley, Y. M. (2013). “Predicting species distributions for conservation decisions.” Ecology Letters , 16(12): 1424–1435. 21 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 28 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 Hartmann, M., Hosack, G. R., Hillary, R. M., and Vanhatalo, J. (2017). “Gaussian process framework for temporal dependence and discrepancy functions in Ricker-type population growth models.” Ann. Appl. Statist., 11(3): 1375–1402. 2, 9 Hastie, T. and Tibishirani, R. (1986). “Generalize additive models.” Statistical Science , 1(3): 297–318. 7 Hodges, J. S. and Reich, B. J. (2010). “Adding Spatially-Correlated Errors Can Mess Up the Fixed Effect You Love.” The American Statistician , 64(4): 325–334. 24 Hudd, R., Lehtonen, H., and Kurttila, I. (1988). “Growth and abundance of fry; factors which influence the year-class strength of whitefish (Coregonus lavaretus widegreni ) in the southern Bothnian Bay (Baltic).” Finnish Fisheries Research , 9: 213–220. 20 Hui, F. K. C., Warton, D. I., Foster, S. D., and Dunstan, P. K. (2013). “To mix or not to mix: Comparing the predictive performance of mixture models vs. separate species distribution models.” Ecology , 94(9): 1913–1919. 2, 21 Kallasvuo, M., Vanhatalo, J., and Veneranta, L. (2017). “Modeling the spatial dis- tribution of larval fish abundance provides essential information for management.” Canadian Journal of Fisheries and Aquatic Sciences , 74: 636–649. 2, 3, 7, 23, 24 Kass, R. E. and Raftery, A. E. (1995). “Bayes Factors.” Journal of the American Statistical Association , 90(430): 773–795. 16 Kotta, J., Vanhatalo, J., J¨ anes, H., Orav-Kotta, H., Rugiu, L., Jormalainen, V., Bobsien, I., Viitasalo, M., Virtanen, E., Sandman, A. N., Isaeus, M., Leidenberger, S., Jonsson, P., and Johannesson, K. (2019). “Integrating experimental and distribution data to predict future species patterns.” Scientific Reports , 9(1): 1821. 2, 23 Kurowicka, D. and Cooke, R. (2003). “A parameterization of positive definite matrices in terms of partial correlation vines.” Linear Algebra and its Applications , 372(Sup- plement C): 225–251. 12 Latimer, A. M., Wu, S., Gelfand, A. E., and Silander, Jr., J. A. (2006). “Building Statistical Models to Analyze Species Distributions.” Ecological Applications , 16(1): 33–50. 2, 24 Lef´ ebure, R., Larsson, S., and Bystr¨ om, P. (2011). “A temperature dependent growth model for the threespined stickleback Gasterosteus aculeatus.” Journal of fish biology , 79: 1815–1827. 20 Leskel¨ a, A., Hudd, R., Lehtonen, H., Huhmarniemi, A., and Sandstr¨ om, O. (1991). “Habitats of whitefish (Coregonus lavaretus (L.) s.l.) larvae in the Gulf of Bothnia.” Aqua Fennica , 21: 145–151. 20 Lewandowski, D., Kurowicka, D., and Joe, H. (2009). “Generating random correla- tion matrices based on vines and extended onion method.” Journal of Multivariate Analysis , 100(9): 1989–2001. 12 Lindley, D. V. (2002). “Seeing and Doing: The Concept of Causation.” International Statistical Review , 70(2): 191–214. 11 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 29 Liu, J. and Vanhatalo, J. (2018). “Bayesian model-based spatiotemporal survey design for log-Gaussian Cox process.” arXiv e-prints , arXiv:1808.09200. 7 Lopez, H. F. (2000). “Bayesian Analysis in Latent Factor and Longitudinal Models.” Ph.D. thesis, Institute of Statistics and Decision Sciences, Duke University. 21 Lopez, H. F., Salazar, E., and Gamerman, D. (2008). “Spatial Dynamic Factor Analy- sis.” Bayesian Statistics , 3(4): 759–792. 21 MacKenzie, B. R., Gislason, H., M¨ ollmann, C., and K¨ oster, F. W. (2007). “Impact of 21st century climate change on the Baltic Sea fish community and fisheries.” Global Change Biology , 13: 1348–1367. 23 Mardia, K. V. and Goodall, C. R. (1993). “Spatio-Temporal analysis of Multivariate Environmental Monitoring Data.” Multivariate Environmental Statistics , 347–386. 8, 10 M.Cingi, S., Kein¨ anen, and Vuorinen, P. J. (2010). “Elevated water temperature impairs fertilization and embryonic development of whitefish Coregonus lavaretus.” Journal of Fish Biology , 76: 502–521. 20 Meier, H. E. M., Hordoir, R., Andersson, H. C., Dieterich, C., Eilola, K., ..., B. G. G., and Schimanke, S. (2012). “Modeling the combined impact of changing climate and changing nutrient loads on the Baltic Sea environment in an ensemble of transient simulations for 19612099.” Climate Dynamics , 39: 2421–2441. 20 Muller, ¨ R. (1992). “Trophic state and its implications for natural reproduction of salmonid fish.” In Dynamics and Use of Lacustrine Ecosystems . Springer, Dordrecht. Nelder, J. A. and Wedderburn, R. W. M. (1972). “Generalized Linear Models.” Journal of the Royal Statistical Association , 135(3): 370–384. 2 Nickisch, H. and Rasmussen, C. E. (2008). “Approximations for binary Gaussian process classification.” Journal of Machine learning , 9: 2035–2078. 9 Odum, E. P. (1953). Fundamentals of ecology . Wiley Subscription Services, Inc., A Wiley Company. 13 O’Hagan, A. (1978). “Curve Fitting and Optimal Design for Prediction.” Journal of Royal Statistical Society B , 40(1): 1–42. 2 Ovaskainen, O., de Knegt, H. J., and del Mar Delgado, M. (2016). Quantitative Ecology and Evolutionary Biology -Integrating models with data . Oxford University Press. 24 Ovaskainen, O., Hottola, J., and Siitonen, J. (2010). “Modeling species co-occurence by multivariate logistic regression generates new hypotheses on fungal interaction.” Ecology , 91(9): 2514–2521. 23 Ovaskainen, O. and Soininen, J. (2011). “Making more out of sparse data: Hierarchical modelling of species communities.” Ecology , 92(2): 289–295. 2 Ovaskainen, O., Tikhonov, G., Norberg, A., Guillaume Blanchet, F., Duan, L., Dunson, D., Roslin, T., and Abrego, N. (2017). “How to make more out of community data? imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 30 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 A conceptual framework and its implementation as models and software.” Ecology Letters , 20(5): 561–576. 2, 8, 21, 22, 24, 25 Paciorek, C. J. (2010). “The Importance of Scale for Spatial-Confounding Bias and Precision of Spatial Regression Estimators.” Statistical Science , 25(1): 107–125. 24 Pearl, J. (2009). “Causal inference in statistics: An overview.” Statistics Surveys , 3: 96–146. 11 Peltonen, H., Vinni, M., Lappalainen, A., and Pnni, J. (2004). “Spatial feeding patterns of herring (Clupea harengus L.), sprat (Sprattus sprattus L.), and the three-spined stickleback (emphGasterosteus aculeatus L.) in the Gulf of Finland, Baltic Sea.” ICES Journal of Marine Science , 61: 966–971. 20 Pitk¨ anen, H., Peuraniemi, M., Westerbom, M., Kilpi, M., and von Numers, M. (2013). “Longterm changes in distribution and frequency of aquatic vascular plants and charo- phytes in an estuary in the Baltic Sea.” Annales Botanica Fennica , 50: 1e54. 21 Pollock, L. J., Tingley, R., Morris, W. K., Golding, N., Hara, R. B. O., Parris, K. M., Vesk, P. A., and McCarthy, M. A. (2014). “Understanding co-occurence by modelling species simultaneously with joint species distribution model (JSDM).” Methods in Ecology and Evolution , 5: 397–406. 2, 8, 23 Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning . The MIT Press. 2, 7, 12 Record, S., Fitzpatrick, M. C., Finley, A. O., Veloz, S., and Ellison, A. M. (2013). “Should species distribution models account for spatial autocorrelation? A test of model projections across eight millennia of climate change.” Global Ecology and Biogeography , 22(6): 760–771. 21 Reusch, T. B. H., Dierking, J., Andersson, H. C., Bonsdorff, E., Carstensen, J., Casini, M., Czajkowski, M., Hasler, B., Hinsby, K., Hyyti¨ ainen, K., Johannesson, K., Jomaa, S., Jormalainen, V., Kuosa, H., Kurland, S., Laikre, L., MacKenzie, B. R., Margon- ski, P., Melzner, F., Oesterwind, D., Ojaveer, H., Refsgaard, J. C., Sandstr¨ om, A., Schwarz, G., Tonderski, K., Winder, M., and Zandersen, M. (2018). “The Baltic Sea as a time machine for the future coastal ocean.” Science Advances , 4(5). 3 R¨ onnberg, C. and Bonsdorff, E. (2004). “Baltic Sea eutrophication: area-specific eco- logical consequences.” Hydrobiologia , 514: 227–241. 4 Rue, H. and Marino, S. (2009). “Approximate Bayesian Inference for latent Gaussian models by using integrated Laplace approximantions.” Journal of the Royal Statistical Society , 71(2): 319–392. 9 Shelton, A. O., Thorson, J. T., Ward, E. J., and Feist, B. E. (2014). “Spatial semipara- metric models improve estimates of species abundance and distribution.” Canadian Journal of Fisheries and Aquatic Sciences , 71(July): 1655–1666. 23 Sieben, K., Ljunggren, L., Bergstr¨ om, U., and Eriksson, B. K. (2011). “A meso-predator release of stickleback promotes recruitment of macroalgae in the Baltic Sea.” Journal of Experimental Marine Biology and Ecology , 397: 79e84. 20 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 31 Simpson, D. P., Rue, H. v., Riebler, A., Martins, T. G., and Sørbye, S. H. (2017). “Penal- ising model component complexity: A principled, practical approach to constructing priors.” Statistical Science , 32(1): 1–28. 24 Taylor-Rodr´ ıguez, D., Kaufeld, K., Schliep, E. M., Clark, J. S., and Gelfand, A. E. (2017). “Joint Species Distribution Modeling: Dimension Reduction Using Dirichlet Processes.” Bayesian Analysis , 12(4): 939–967. 22 Thorson, J. T., Scheuerell, M. D., Shelton, A. O., See, K. E., Skaug, H. J., and Kris- tensen, K. (2015). “Spatial factor analysis: a new tool for estimating joint species distributions and correlations in species range.” Methods in Ecology and Evolution , 6(6): 627–637. 2, 21, 22, 24, 25 Tikhonov, G., Abrego, N., Dunson, D., and Ovaskainen, O. (2017). “Using joint species distribution models for evaluating how species-to-species associations depend on the environmental context.” Methods in Ecology and Evolution , 8(4): 443–452. 22 Tokuda, T., Goodrich, B., Mechelen, I. V., and Gelman, A. (2012). “Visualizing Distri- butions of Covariance Matrices.” 9, 21 Vanhatalo, J., Pietil¨ ainen, V., and Vehtari, A. (2010). “Approximate inference for disease mapping with sparse Gaussian processes.” Statistics in Medicine , 29(15): 1580–1607. 3, 9, 12, 25 Vanhatalo, J., Riihim¨ aki, J., Hartikainen, J., Jyl¨ anki, P., Tolvanen, V., and Vehtari, A. (2013). “GPstuff : Bayesian Modeling with Gaussian Processes.” Journal of Machine Learning Research , 14: 1175–1179. 12 Vanhatalo, J., Veneranta, L., and Hudd, R. (2012). “Species distribution modelling with Gaussian processes: A case study with youngest stages of sea spawning whitefish (Coregonus lavatus L. s.l.) larvae.” Ecological Modelling , (228): 49–58. 2, 5, 7, 8, 16, 20, 23, 24 Vehtari, A., Mononen, T., Tolvanen, V., Sivula, T., and Winther, O. (2016). “Bayesian Leave-One-Out Cross-Validation Approximations for Gaussian Latent Variable Mod- els.” Journal of Machine Learning Research , 17: 1–38. 12, 15 Vehtari, A. and Ojanen, J. (2012). “A survey of Bayesian predictive methods for model assessment, selection and comparison.” Statistics Surveys , 6: 141–228. 15 Veneranta, L., Hudd, R., and Vanhatalo, J. (2013). “Reproduction areas of sea-spawning coregionids reflect the environment in shallow coastal areas.” Marine Ecology Progress Series , 477: 231–250. 3, 4, 5, 16, 20 Warton, D. I., Blanchet, F. G., O’Hara, R. B., Ovaskainen, O., Taskinen, S., Walker, S. C., and Hui, F. K. (2015). “So Many Variables: Joint Modeling in Community Ecology.” Trends in Ecology and Evolution , 30(12): 766–779. 2 Warton, D. I. and Shepherd, L. C. (2010). “Poisson point process models solve the ”pseudo-absence problem” for presence-only data in ecology.” Annals of Applied Statistics , 4(3): 1383–1402. 6 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 32 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 Wikle, C. K. (2003). “Hierarchical Models in Environmental Science.” International Statistical Review , 71(2): 181–199. 6 Zhang, H. (2004). “Inconsistent Estimation and Asymptotically Equal Interpolations in Model-Based Geostatistics.” Journal of the American Statistical Association , 99(465): 250–261. 24 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Bayesian Analysis (2018) , Number , pp. 1 Supplementary Material: Additive multivariate Gaussian process for joint species distribution modeling with heterogeneous data ∗ † ‡ Jarno Vanhatalo , Marcelo Hartmann and Lari Veneranta 1 The posterior expectation and variance for new observations First we approximate the logistic function with p(f ) ≈ p ˜(f ) = aΦ(f/v )+(1−a)Φ(f/v ) 1 2 where a ∈ (0, 1) and v , v ∈ (0,∞) are parameters of the approximation (see Demi- 1 2 −4 denko, 2004). This approximation is used since it has small error bound (≤ |10 |, Demidenko (2004)) and can considerably speed-up marginalization over f for large datasets. Using Lemma 2 in the Section 4, we obtain, μ μ j,∗ j,∗ √ √ E[Y (x , s )| y,η, Λ] = z aΦ + z (1− a)Φ (1.1) j ∗ ∗ j,∗ j,∗ 2 2 Σ +v Σ +v j,∗ j,∗ 1 2 V[Y (x , s )| y,η, Λ] = E[Y (x , s )| y,η, Λ]− E[Y (x , s )| y,η, Λ] j ∗ ∗ j ∗ ∗ j ∗ ∗ 2 2 + (z − z )E[p ˜ (f )] j,∗ ∗ j,∗ where, Φ(·) is the cumulative distribution of the standard Gaussian random variable and 2 2 2 E[p ˜ (f )] = a F (μ |V ) + (1− a) F (μ |V ) + 2a(1− a)F (μ |V ) (1.2) ∗ 2 1,1 2 2,2 2 1,2 j,∗ j,∗ j,∗ F (μ |V ) is the zero-mean 2-dimensional Gaussian cumulative distribution function 2 m,n j,∗ evaluated at μ = [μ μ ] with covariance matrix given by j,∗ j,∗ j,∗ Σ + v Σ j,∗ j,∗ V = . (1.3) m,n Σ Σ + v j,∗ j,∗ In the case of Negative-Binomial model, we use the moment generating function of a Gaussian random variable to obtain the unconditional expectation of a future outcome w.r.t to latent variable, i.e. μ +Σ /2 j,∗ j,∗ E[Y (x , s )| y,η, Λ] = z e (1.4) j ∗ ∗ j,∗ h i μ +Σ /2 2 2μ +Σ Σ /2 j,∗ j,∗ j,∗ j,∗ j,∗ V[Y (x , s )| y,η, Λ] = z e + z e e (r ˆ + 1)/r ˆ − 1 . j ∗ ∗ j,∗ j j j,∗ Department of Mathematics and Statistics and Organismal and Evolutionary Biology Research Program, University of Helsinki, jarno.vanhatalo@helsinki.fi Department of Mathematics and Statistics, University of Helsinki, marcelo.hartmann@helsinki.fi Natural Resources Institute Finland, Finland c 2018 International Society for Bayesian Analysis imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 arXiv:1809.02432v2 [stat.ME] 8 Aug 2019 2 Supplement: JSDMs with additive multivariate Gaussian processes 2 Conditional predictions Recall that, in the main paper, the distribution of Y (x , s )| y conditioned on J ∗ ∗ 1 J hyperparameters (they were ommited from the notation for simplicity) is given by π(y | y ) = π(y | y , f )π(y , f )df /π(y ) (2.1) J ,∗ J ,∗ J ,∗ J ,∗ J J ,∗ J 1 J 1 1 J 1 2 1 2 2 2 The second term in the numerator of the integrand of (2.1) is the same as, π(y , f ) = π(f | f , y )π(f | y )π(y )df . (2.2) J ,∗ J ,∗ J J J J 1 1 2 J 2 J J 2 2 2 2 2 Now, f | f , y also only depends on f . Plugging (2.2) into (2.1) we get J ,∗ J J 1 2 J 2 π(y | y ) = π(y | f )π(f | f )π(f | y )df df (2.3) J ,∗ J ,∗ J J J J ,∗ J ,∗ J J ,∗ 1 1 2 2 J 2 1 1 2 1 2 where last two terms in (2.3) are Gaussians. The first one is the conditional Gaus- sian density function w.r.t. the predictor function values f , that is, π(f | f ) = J J ,∗ J 2 1 2 −1 −1 T N (f |C C f , C − C C C ) and the second one is the J ,∗ J ,J ∗ J J ∗,∗ J ,J ∗ 1 1 2 2 1 1 2 J ,J J ,J J ,J ∗ 2 2 2 2 1 2 Laplace approximation using the scenario species data related to the set J , that is, −1 −1 π(f | y ) ≈ N f |C ∇ log π(y |f ), (C + W ) . Plugging this ap- J J J ,J J J 2 J 2 2 2 J 2 2 2 2 J ,J 2 2 proximate density function in the equation (2.3) gives π(y | y ) = π(y | f )N (f |μ , Σ )df (2.4) J ,∗ J J ,∗ J ,∗ J ,∗ |J |J J ,∗ 1 2 1 1 1 2 2 1 with μ = C ∇ log π(y |f ) J ,J ∗ J |J 1 2 J 2 2 2 −1 −1 T Σ = C − C (C + W ) C (2.5) |J J ∗,∗ J ,J ∗ J ,J 2 1 1 2 2 2 J J ,J ∗ 2 1 2 where C is the covariance between species in the setJ at the point (x , s ) and the J ,J ∗ 1 ∗ ∗ 1 2 species in the setJ in their respective covariates and spatial locations. ∇ log π(y |f ) 2 J J 2 correspond to the gradient of the log-likelihood function for species related to the set J . C is the covariance matrix of f at (x , s ). C is the covariance matrix of J ∗,∗ J ,∗ ∗ ∗ J ,J 1 1 2 2 f and W is the negative Hessian matrix of the negative logarithm of the likelihood J J 2 2 functions related to species in the set J . Now, to calculate the predictive mean vector and predictive covariance matrix we follow the steps used in deriving unconditional expectation and unconditional variances as in the main paper. However we exclude them for brevity. 3 Unconstrained parametrization of covariance matrices Let the matrix Σ be a covariance matrix of dimension J . In terms of variances and cor- 1 1 2 2 2 2 2 2 2 relations, we rewrite Σ = diag(σ , . . . , σ ) P diag(σ , . . . , σ ) , where σ , j = 1, . . . , J 1 J 1 J j imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 3 are variances and these are transformed as σ = exp(δ ). The correlation matrix P is written in terms of its Cholesky decomposition, that is P = LL , where L is the lower triangular Cholesky decomposition. In our case, the upper triangular Cholesk decomposition is given by   1 z z ··· z 1,2 1,3 1,J   Q Q Q   1 1 1 1 1 1 2 2 2  2 2 2 0 (1− z ) z (1− z ) ··· z (1− z ) 2,3 2,J i,2 i,3 i=1 i=1 i=1 i,J     L = (3.1)   . . . .  . . . . . . . . .     J−1 0 0 0 ··· (1− z ) i=1 i,J 0 0 where each z ∈ (−1, 1). Now, since each z can freely vary in the interval (−1, 1) i,i i,i without violate the positive definiteness property of P (see Kurowicka and Cooke, 0 0 2003; Lewandowski et al., 2009), we map each z to the real line as z = 2/[1 + i,i i,i exp(−aδ 0 )]− 1, where δ 0 ∈ R. i,i i,i The derivative of the log π(P ) (recall the prior for the correlation matrix in the main paper) w.r.t. each δ 0 is given by i,i J−1 J X X ∂ ∂ ∂ log π(P|v) = log π(P|v) ρ 0 (3.2) j,j ∂δ 0 ∂ρ 0 ∂δ 0 i,i j,j i,i j=1 j =j+1 and the partial derivatives of log π(P|v) w.r.t to ρ 0 are obtained as, j,j −1 −1 0 0 log π(P|v) = (v − 1)(J − 1)− 1 {P } − v {P } (3.3) j,j c,c rr ∂ρ j,j r=1 : r∈{ / j,j } 0 0 0 0 0 where c = j − 1 and c = j if r < j and, j = 1 or r > j . If r < j and r < j then −1 −1 −1 0 0 0 c = j− 1 and c = j − 1. {P } 0 is the entry (j, j ) of the matrix P and {P } 0 is j,j c,c rr 0 −1 the entry (c, c ) of the inverse matrix P , where P is a matrix obtained by removing rr rr the r’th row and column of P . The partial derivatives of each ρ 0 w.r.t to δ 0 are j,j i,i T T 0 0 0 obtained from ∂P/∂δ = (∂L/∂δ )L + L(∂L/∂δ ) . i,i i,i i,i Lastly, we find the derivatives of the logarithm of the absolute value of the Jacobian determinant w.r.t each δ (see equation (11) in Lewandowski et al. 2009). i,i J−1 J Y Y ∂ ∂(ρ , . . . , ρ ) ∂ dz 0 1,2 J−1,J j,j 2 (J−j−1)/2 log = log (1− z 0 ) j,j 0 0 0 ∂δ ∂(δ , . . . , δ ) ∂δ dδ i,i 1,2 J−1,J i,i j,j j=1 j =j+1 2 2 d z 0/dδ 0 0 0 z dz i,i i,i i,i i,i = −(J − i− 1) + (3.4) 1− z dδ 0 dz 0/dδ 0 i,i i,i i,i i,i imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 4 Supplement: JSDMs with additive multivariate Gaussian processes 4 Gaussian integrals Lemma 1. Let’s denote by Φ(·) the standard-Gaussian cumulative distribution function 2 2 and N (·|μ, σ ) the Gaussian density function with parameters (μ, σ ) ∈ R× R . Then the following holds x− m N (x|μ, σ ) Φ dx = F (μ |m , V ) (4.1) N N N r=1 −∞ where F (·|c,C) is the N -dimensional Gaussian cumulative distribution function with parameters (c,C) ∈ R ×R, with R the space of positive-definite matrices (covariance T N N matrices). Furthermore, m = [m ··· m ] ∈ R , μ = μ1 ∈ R , v > 0 ∀r and N 1 N N r V is a covariance matrix given by,   2 2 2 v + σ . . . σ   . . . . . V = (4.2) N   . . 2 2 2 σ . . . v + σ Proof. To show (4.1), start writing the left-hand side of the equation in full. Rewrite the integrand as the product of non-standard Gaussian density functions as well as the regions of integration, i.e., ∞ x x Z Z Z 2 2 2 ··· N (y |m , v ) . . .N (y |m , v )N (x|μ, σ )dy ··· dy dx. (4.3) 1 1 N N 1 N 1 N −∞−∞ −∞ Rewrite again using the following transformation [x, y ,··· , y ] = [w + μ, z + w + 1 N 1 m ,··· , z + w + m ] and note that |∂(x, y ,··· , y )/∂(w, z ,··· , z )| = 1. After 1 N N 1 N 1 N changing variables, group the different terms in the exponentials together to have ( " #) μ−m μ−m ∞ N 1 Z Z Z 1 2 (z +w) 1 r w ··· exp − + dz . . . dz dw (4.4) 2 2 1 N c v σ r=1 −∞ −∞ −∞ (N +1)/2 where c = σ(2π) v . Now, the expression inside the squared bracket is a r=1 quadratic form which is written with the following matrix form, N N N N X X X X (z +w) r w 2 1 1 z w z r r + = w + + w + z + 2 2 2 2 2 r 2 2 v σ v σ v v v r r r r r r=1 r=1 r=1 r=1     P P N N z 1 1 r   w + + 2 2 2 r=1 v σ r=1 v r r w       w z 1 z   1 2 2     v v = 1 1     .     .   w z N 2 2 v v N N imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 5 P    N   1 1 1 1 + ··· 2 2 2 2 w r=1 v σ w v v 1 N   1 1     ··· 0 z z 1  2 2  1 v v     1 1   = (4.5)     . . . . .  .  . .   . . . .   .  .  . . . . 1 1 z z N 0 ··· N 2 2 v v N N therefore (4.4) is the same as       N   1 1 1 1 + ···  2 2 2 2  w r=1 v σ w  v v  1 N   μ−m μ−m ∞ N 1     Z Z Z 1 1      ··· 0 z z 1  2 2  1 1 v v     1 1 1   ··· exp − dz dw     . . . . . c  .  . .  2   . . . .    .  .  .  . . .   −∞ −∞ −∞    1 1  z z N 0 ··· N 2 2 v v N N (4.6) The integrand in (4.6) has the full form of the multivariate Gaussian density. To identify this we need to find the closed-form covariance matrix from the precision matrix in (4.6) 2 N +1 and show that the determinant of the covariance matrix is given by c /(2π) . Write h i 1 1 1 1 the precision matrix as block matrix such that A = + , B = ··· , 2 2 2 2 r=1 v σ v v r 1 N 1 1 C = B and D = diag ,··· , . Use the partitioned matrix inversion lemma 2 2 v v 1 N −1 −1 2 (Strang and Borre, 1997, equation 17.44) to get the blocks, (A − BD C ) = σ , −1 −1 −1 2 −1 −1 −1 2 T −1 (BD C−A) BD = −σ [1··· 1], D C (BD C−A) = −σ [1··· 1] and D + −1 −1 −1 −1 2 2 2 2 D C (A−BD C ) BD where its main diagonal equals to [v +σ ,··· , v +σ ] and 1 N all off-diagonal elements are given by σ . Put everything together to have the covariance matrix   2 2 2 σ −σ ··· −σ 2 2 2 2   −σ v + σ ··· σ   (4.7)  . .  . . .   −σ . . 2 2 2 2 −σ σ ··· v + σ −1 2 2 2 N +1 whose determinant equals to 1/[det(D) det(A− BD C )] = σ v = c /(2π) r=1 r by the partitioned matrix determinant lemma (e.g., Rasmussen and Williams, 2006). Finally, in (4.6), interchange the order of integration with Fubini-Tonelli theorem (Fol- land, 2013) and integrate w.r.t. w to get       2 2 2 μ−m μ−m z 0 v + σ ··· σ N 1 Z Z 1       . . . . . . . . . ··· N , dz ··· dz (4.8)       1 N . . . . 2 2 2 −∞ −∞ z 0 σ ··· v + σ that equals to       2 2 2 μ m v + σ ··· σ       . . . . . . . . . F     ,  (4.9) . . . . 2 2 2 μ m σ ··· v + σ and therefore the equality (4.1) holds. imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 6 Supplement: JSDMs with additive multivariate Gaussian processes Lemma 2. Let N (·|μ , Σ) be the N -dimensional Gaussian density function with mean parameter μ and covariance matrix Σ. Then the following holds true, F (x|m , V)N (x|μ , Σ)dx = F (μ |m , V + Σ) (4.10) N N N N N N where F (·|c,C) is the N -dimensional Gaussian cumulative distribution function with T N T N parameters (c,C). Furthermore, m = [m ··· m ] ∈ R , μ = [μ ··· μ ] ∈ R , N 1 N 1 N V and Σ are covariance matrices. Proof. Let’s rewrite the left-hand side of (4.10) in full, x x n 1 Z Z Z ··· N (y|m, V)N (x|μ , Σ)dydx (4.11) −∞ −∞ T T where y = [y ,··· , y ] . Let’s use the following transformation [x ,··· , x , y ,··· , y ] 1 n 1 N 1 N = [w + μ ,··· , w + μ , z + w + m ,··· , z + w + m ] . The Jacobian of this 1 1 N N 1 1 1 N N N transformation simplifies to |∂(x ,··· , x , y ,··· , y )/ ∂(w ,··· , w , z ,··· , z )| = 1 N 1 N 1 N 1 N 1. Rewrite the above equation to get, μ −m μ −m N N 1 1 Z Z Z ··· N (w|− z, V)N (w| 0, Σ)dzdw (4.12) −∞ −∞ T T where z = [z ··· z ] and w = [w ··· w ] . Note that the product of two multivariate 1 N 1 N Gaussian density functions is another unnormalized multivariate Gaussian (see, e.g., Rasmussen and Williams, 2006, Appendix A). Therefore we write μ −m μ −m N N 1 1 Z Z Z ··· N (z| 0, V + Σ)N (w|c, C )dzdw (4.13) −∞ −∞ −1 −1 −1 −1 where c = −C V z and C = (V + Σ ) . Interchange the order of integration with Fubini-Tonelli theorem (Folland, 2013) and integrate w.r.t w to get that, μ −m μ −m N N 1 1 Z Z ··· N (z| 0, V + Σ)dz = F (μ |m , V + Σ) (4.14) N N −∞ −∞ which completes the proof. The above closed-form integrals extend many equalities in the table of Owen (1980). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 7 References Demidenko, E. (2004). Mixed Models: Theory and Application . John Wiley & Sons. Folland, G. (2013). Real Analysis: Modern Techniques and Their Applications . Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts. Wiley. Kurowicka, D. and Cooke, R. (2003). “A parameterization of positive definite matrices in terms of partial correlation vines.” Linear Algebra and its Applications , 372(Sup- plement C): 225–251. Lewandowski, D., Kurowicka, D., and Joe, H. (2009). “Generating random correla- tion matrices based on vines and extended onion method.” Journal of Multivariate Analysis , 100(9): 1989–2001. Owen, D. B. (1980). “A table of normal integrals.” Communications in Statistics - Simulation and Computation , 9(4): 389–419. URL http://dx.doi.org/10.1080/03610918008812164 Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning . The MIT Press. Strang, G. and Borre, K. (1997). Linear Algebra, Geodesy and GPS . Wellesley- Cambridge Press. imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 8 Supplement: JSDMs with additive multivariate Gaussian processes Appendix A: Extra results Standard models LOO Cross Validation 5-fold Cross validation 1) Univariate GP h(x), univ. (s) -2.230 (0.082) -2.465 (0.094) diatom.algae -3.942 (0.225) -4.002 (0.225) filame.algae -4.568 (0.141) -4.692 (0.146) macro-veg -1.909 (0.234) -2.092 (0.240) threespine-sb -1.477 (0.157) -1.624 (0.154) ninespine-sb -1.334 (0.151) -1.464 (0.149) white-fish -2.840 (0.200) -3.092 (0.216) vendance -1.634 (0.210) -2.196 (0.314) 2) Univariate GP h(x), multiv. (s) -2.081 (0.080) -2.480 (0.100) diatom.algae -3.578 (0.232) -4.010 (0.217) filame.algae -4.104 (0.159) -4.680 (0.135) macro-veg -1.733 (0.223) -2.148 (0.252) threespine-sb -1.385 (0.149) -1.637 (0.156) ninespine-sb -1.232 (0.139) -1.489 (0.151) white-fish -2.782 (0.203) -3.092 (0.213) vendance -1.537 (0.204) -2.215 (0.316) 3) Multiv. GP h(x), multiv. (s) -1.669 (0.080) -2.316 (0.086) diatom.algae -2.307 (0.112) -3.947 (0.217) filame.algae -1.244 (0.751) -4.623 (0.140) macro-veg -1.289 (0.150) -1.950 (0.248) threespine-sb -1.210 (0.140) -1.509 (0.156) ninespine-sb -1.184 (0.147) -1.389 (0.149) white-fish -3.399 (0.393) -2.964 (0.205) vendance -2.041 (0.451) -1.837 (0.247) 4) Multiv. GP h(x) only -2.228(0.089) -2.590(0.012) diatom.algae -4.033(0.235) -4.545(0.379) filame.algae -4.697(0.156) -4.865(0.187) macro-veg -1.754(0.226) -2.514(0.379) threespine-sb -1.418(0.173) -1.494(0.172) ninespine-sb -1.355(0.183) -1.393(0.168) white-fish -2.866(0.212) -3.192(0.271) vendance -1.608(0.222) -2.458(0.400) 5) Multiv. (s) only -2.351(0.087) -2.500(0.086) diatom.algae -4.129(0.200) -4.231(0.182) filame.algae -4.607(0.119) -4.729(0.109) macro-veg -2.255(0.254) -2.800(0.210) threespine-sb -1.471(0.174) -1.567(0.168) ninespine-sb -1.375(0.178) -1.473(0.152) white-fish -3.147(0.181) -3.226(0.191) vendance -1.675(0.235) -1.864(0.260) Table 1: Model comparison with leave-one-out (LOO) and 5-fold cross validation using the mean point wise log marginal predictive density statistics (and its standard error) for models 1-5 (see Section 5.3 in the main text). The bolded numbers show the overall performance over all species and the indented text shows the species specific predictions. imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 9 Standard models LOO Cross Validation 5-fold Cross validation 6) Univ. quadr. h(x), univ. (s) -2.262 (0.083) -2.517 (0.095) diatom.algae -3.941(0.228) -4.035(0.237) filame.algae -4.608(0.138) -4.741(0.150) macro-veg -1.955(0.239) -2.233(0.300) threespine-sb -1.490(0.158) -1.598(0.156) ninespine-sb -1.337(0.151) -1.501(0.140) white-fish -2.885(0.204) -3.137(0.215) vendance -1.705(0.215) -2.274(0.315) 7) Univ. quadr. h(x), multiv. (s) -2.117 (0.084) -2.484 (0.109) diatom.algae -3.568(0.243) -3.994(0.225) filame.algae -4.104(0.170) -4.717(0.137) macro-veg -1.763(0.229) -2.269(0.294) threespine-sb -1.360(0.165) -1.517(0.171) ninespine-sb -1.233(0.143) -1.484(0.142) white-fish -2.853(0.210) -3.098(0.251) vendance -1.668(0.226) -2.208(0.340) 8) Multiv. quadr. h(x), multiv. (s) -2.105 (0.084) -2.416 (0.099) diatom.algae -3.561(0.242) -4.009(0.242) filame.algae -4.090(0.177) -4.745(0.170) macro-veg -1.742(0.229) -2.197(0.280) threespine-sb -1.346(0.165) -1.476(0.173) ninespine-sb -1.217(0.149) -1.441(0.146) white-fish -2.845(0.201) -3.045(0.241) vendance -1.663(0.226) -2.080(0.311) 9) Multiv. quadr. h(x) only -10.246(1.297) -9.305(1.008) diatom.algae -48.063(10.943) -32.673(6.132) filame.algae -52.309(8.539) -42.283(6.498) macro-veg -7.139(1.439) -19.895(6.421) threespine-sb -1.441(0.178) -1.494(0.177) ninespine-sb -1.375(0.192) -1.400(0.158) white-fish -2.899(0.215) -3.209(0.275) vendance -1.677(0.227) -2.311(0.364) Table 2: Model comparison with leave-one-out (LOO) and 5-fold cross validation using the mean point wise log marginal predictive density statistics (and its standard error) for models 6-9 (see Section 5.3 in the main text). The bolded numbers show the overall performance over all species and the indented text shows the species specific predictions. imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 10 Supplement: JSDMs with additive multivariate Gaussian processes Pairwise model comparison LOO Cross Validation 5-fold Cross validation Model 8 - Model 6 0.156(0.019) 0.101(0.030) diatom.algae 0.379(0.106) 0.025(0.020) filame.algae 0.537(0.097) -0.003(0.037) macro-veg 0.214(0.214) 0.135(0.065) threespine-sb 0.144(0.032) 0.012(0.023) ninespine-sb 0.120(0.026) 0.059(0.017) white-fish 0.036(0.039) 0.091(0.045) vendance 0.043(0.032) 0.194(0.031) Model 8 - Model 7 0.012(0.004) 0.052(0.010) diatom.algae 0.006(0.016) -0.015(0.028) filame.algae 0.014(0.021) -0.027(0.046) macro-veg 0.021(0.012) 0.072(0.028) threespine-sb 0.015(0.008) 0.041(0.015) ninespine-sb 0.016(0.011) 0.043(0.017) white-fish 0.008(0.011) 0.052(0.019) vendance 0.006(0.006) 0.128(0.033) Table 3: Pairwise comparison of leave-one-out (LOO) and 5-fold cross validation point wise log predictive densities of multivariate parametric (quadratic environmental re- sponses) model (model 8) and parametric models without any interspecific correlations (model 6) or with interspecific correlations only in spatial random effect (model 7). We report the mean (and its standard error) of the differences in point wise log predic- tive densities at test locations. The bolded numbers show the overall performance and normal text shows the species specific predictions. Pairwise model comparison LOO Cross Validation 5-fold Cross validation Model 3 - model 1 0.561(0.126) 0.150(0.123) diatom.algae 1.637(0.151) 0.054(0.029) filame.algae 3.974(0.495) 0.070(0.027) macro-veg 0.620(0.129) 0.141(0.029) threespine-sb 0.267(0.097) 0.115(0.017) ninespine-sb 0.151(0.097) 0.075(0.019) white-fish -0.300(0.127) 0.129(0.026) vendance 0.051(0.073) 0.359(0.077) Model 3 - model 2 0.413(0.123) 0.164(0.017) diatom.algae 1.271(0.136) 0.065(0.029) filame.algae 3.496(0.504) 0.057(0.020) macro-veg 0.444(0.011) 0.198(0.031) threespine-sb 0.175(0.087) 0.128(0.198) ninespine-sb 0.049(0.085) 0.100(0.021) white-fish -0.357(0.011) 0.128(0.026) vendance -0.041(0.055) 0.377(0.078) Table 4: Pairwise comparison of leave-one-out (LOO) and 5-fold cross validation point wise log predictive densities of multivariate GP model (model 3) and GP models without any interspecific correlations (model 1) or with interspecific correlations only in spatial random effect (model 2). We report the mean (and its standard error) of the differences in point wise log predictive densities at test locations. The bolded numbers show the overall performance and normal text shows the species specific predictions. imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 11 imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Table 5: Hyperparameters estimates from covariates covariance functions Diatom-algae Filam-algae Macro-veg Threespine-sb Ninespine-sb White-fish Vendance SDM | JSDM SDM | JSDM SDM | JSDM SDM | JSDM SDM | JSDM SDM | JSDM SDM | JSDM σ 1.92 | 2.19 1.96 | 1.87 1.63 | 1.62 1.70 | 1.64 2.18 | 1.78 1.57 | 1.30 2.61 | 2.47 Opennes ` 1.56 | 1.59 1.29 | 1.31 1.51 | 1.26 1.90 | 2.23 2.12 | 1.86 1.98 | 1.61 0.64 | 0.92 Distance to σ 2.24 | 1.84 1.55 | 1.48 2.87 | 2.64 2.54 | 2.23 1.98 | 1.43 1.66 | 1.32 1.79 | 1.37 deep ` 1.09 | 1.18 1.16 | 1.06 1.39 | 1.87 1.04 | 1.39 1.48 | 1.80 2.02 | 1.33 1.32 | 1.09 σ 1.89 | 1.72 1.51 | 1.63 2.14 | 1.85 2.11 | 1.49 1.89 | 1.79 3.37 | 2.26 2.59 | 2.37 Sandy bottom ` 1.49 | 1.43 1.48 | 1.24 1.30 | 0.84 0.71 | 0.46 1.93 | 1.60 1.15 | 0.95 0.91 | 1.06 index σ 2.36 | 2.37 2.16 | 1.74 3.26 | 2.54 1.70 | 1.63 1.99 | 1.86 1.64 | 1.09 6.69 | 7.52 Ice breakup ` 1.11 | 1.18 0.75 | 1.06 0.62 | 1.87 1.32 | 1.39 2.22 | 1.80 1.87 | 1.33 0.70 | 1.09 week σ 1.62 | 1.53 1.55 | 1.64 2.93 | 2.68 1.91 | 1.46 1.72 | 1.42 1.95 | 1.75 2.53 | 2.48 Chlorophyl-a ` 1.11 | 1.46 0.75 | 0.78 0.62 | 0.49 1.32 | 1.76 2.22 | 1.71 1.87 | 1.30 0.70 | 0.66 σ 1.68 | 1.42 1.56 | 1.30 1.69 | 2.26 1.96 | 1.71 1.90 | 2.34 1.67 | 2.67 1.65 | 1.55 Distance to ` 1.59 | 1.76 1.63 | 1.42 1.10 | 0.80 0.76 | 0.65 1.23 | 0.73 0.36 | 0.41 1.45 | 1.63 nearest river σ 0.81 | 1.01 0.69 | 1.07 0.69 | 0.81 1.77 | 0.87 1.08 | 1.00 2.01 | 1.00 0.75 | 1.00 Bottom type | | | | | | | σ | | | | | | 2.87 | 2.17 Salinity ` | | | | | | 1.24 | 1.37 σ 10.9 | 9.35 7.31 | 6.00 4.62 | 4.67 6.87 | 6.42 5.66 | 5.74 8.11 | 7.39 9.32 | 6.85 Spatial ` 0.47 | 0.45 0.74 | 1.06 0.61 | 0.44 0.44 | 0.43 0.78 | 0.95 0.15 | 0.22 2.14 | 2.45 12 Supplement: JSDMs with additive multivariate Gaussian processes Figure 1: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for diatomo algae predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 13 Figure 2: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for filamentous algae predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 14 Supplement: JSDMs with additive multivariate Gaussian processes Figure 3: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for macrovegetation predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 15 Figure 4: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for whitefish predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 16 Supplement: JSDMs with additive multivariate Gaussian processes Figure 5: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for ninespine-stickleback predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 17 Figure 6: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for threespine-stickleback predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 18 Supplement: JSDMs with additive multivariate Gaussian processes Figure 7: Posterior predictive variance for the latent field f (x, s)| y, η ˆ, λ diatomo algae predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 19 Figure 8: Posterior predictive variance for the latent field f (x, s)| y, η ˆ, λ vendace pre- dicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 20 Supplement: JSDMs with additive multivariate Gaussian processes Figure 9: Posterior predictive variance for the latent field f (x, s)| y, η ˆ, λ for filamentous algae predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 21 Figure 10: Posterior predictive variance for the latent field f (x, s)| y, η ˆ, λ for macroveg- etation predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 22 Supplement: JSDMs with additive multivariate Gaussian processes Figure 11: Posterior predictive variance for the latent field f (x, s)| y, η ˆ, λ for whitefish predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 23 Figure 12: Posterior predictive variance for the latent field f (x, s)| y, η ˆ, λ ninespine- stickleback predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 24 Supplement: JSDMs with additive multivariate Gaussian processes Figure 13: Posterior predictive variance for the latent field f (x, s)| y, η ˆ, λ for threespine- stickleback predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019

Journal

StatisticsarXiv (Cornell University)

Published: Sep 7, 2018

There are no references for this article.