Access the full text.

Sign up today, get DeepDyve free for 14 days.

Statistics
, Volume 2020 (2005) – May 1, 2020

/lp/arxiv-cornell-university/rejoinder-for-the-discussion-of-the-paper-a-novel-algorithmic-approach-X5EXsz3SuX

- ISSN
- 1936-0975
- eISSN
- ARCH-3347
- DOI
- 10.1214/18-BA1141
- Publisher site
- See Article on Publisher Site

Rejoinder for the discussion of the paper "A novel algorithmic approach to Bayesian Logic Regression" ∗ † ‡ Aliaksandr Hubin , Geir Storvik and Florian Frommlet 1 Introduction We would like to begin this rejoinder with expressing our sincere gratitude to all of the discussants for their interesting and thought-provoking comments and remarks. We also feel heartily thankful to the editorial board of Bayesian Analysis for giving us the opportunity to publish our paper entitled "A novel algorithmic approach to Bayesian logic regression" (Hubin et al., 2020a) as a discussion article. Logic regression is a tool to model non-linear relationships between binary covariates and some response variable by constructing predictors as Boolean combinations. The number of possible logic expressions grows exponentially with the number of binary variables involved, making the model search signi cantly harder with the increasing complexity of Boolean combinations. Due to Boolean equivalence, it is in fact almost impossible to specify the full model space a priori even for a relatively small number of covariates. Our primary goal is to identify those logic expressions which are associated with the response variable. To this end, we want to estimate posterior probabilities of logic expressions within the framework of generalized linear models. The major contributions of our paper are two-fold: Firstly, we have introduced novel model priors for Bayesian logic regression (BLR), which yield good power to detect important logic expressions while controlling the number of false positive discoveries. Secondly, we have introduced a novel genetically modi ed mode jumping Markov chain Monte Carlo (GMJMCMC) algorithm to eciently explore the space of logic regression models. The main idea of GMJMCMC is to embed the mode jumping Markov chain Monte Carlo (MJMCMC) algorithm (Hubin and Storvik, 2018) into the iterative setting of a genetic algorithm. Populations for the genetic algorithm consist of relatively small sets of logic expressions. Any such subset forms a well de ned model space which allows to run MJMCMC. The population is then regularly updated in such a way that the algo- rithm is guaranteed to be irreducible in the model space of all logic regression models. This is required for asymptotic unbiasedness of the estimated posterior probabilities, as we will discuss in more detail below. Although GMJMCMC is not a proper MCMC algorithm (in the sense that its stationary distribution does not coincide with the target distribution of interest), renormalized estimates of the posterior probabilities are readily available. Norwegian Computing Center, aliaksandr.hubin@nr.no Department of mathematics, University of Oslo, geirs@math.uio.no Department of Medical Statistics (CEMSIIS), Medical University of Vienna, o- rian.frommlet@meduniwien.ac.at ba-preprint ver. 0000/00/00 file: output.tex date: 2 Rejoinder for the discussion The discussants have pointed out several interesting extensions and open problems. We have structured the rejoinder according to dierent topics while trying to address all the points raised by the discussants. We also provide several interesting extensions of the model. Finally, we give a brief tutorial on the relevant part of our R-package EMJM- CMC http://aliaksah.github.io/EMJMCMC2016/ dealing with BLR. This should fa- cilitate the practical application of our new methodology developed in Hubin et al. (2020a). 2 Applications of Bayesian logic regression We very much appreciate that Ruczinski et al. (2020) have pointed out important ap- plications of logic regression outside of genetics. Our emphasis on genetic applications was not meant to indicate limitations of the usefulness of logic regression in other areas but rather re ects our own previous research interests. Also, the applications mentioned by Bogdan et al. (2020) in the context of multicolored graphical models sound quite interesting. We are however more sceptical whether logic regression, in whichever form, will ever be applicable directly to association studies of outbred populations, where the number of genetic variants is much larger than in controlled populations. For large numbers of binary covariates, already the number of pairwise logic expressions becomes prohibitively large to apply logic regression, both in terms of algorithmic feasibility and in terms of having sucient power while controlling type I error. Realistic applications of logic regression (with the aim of identifying true logic expressions) will most likely be restricted to applications with a few hundred binary covariates unless technologic ad- vances allow one day to eciently resolve the NP hard combinatorial problem of model search. However, one might consider bagging and boosting to obtain scalable versions of logic regression for prediction. In this rejoinder, we also discuss extensions of Bayesian logic regression, allowing for non-binary predictors and latent Gaussian variables to be included into the model. This could further extend the applications of Bayesian logic regression methodology to such elds as epidemiology, spatio-temporal statistics, environmetrics and econometrics. For example, in Hubin et al. (2020c), a model with latent Gaussian processes (where a subset of predictors are binary) was used for the analysis of DNA methylation. The paper discusses the potential of using logic expressions of the binary predictors as a direction for further research. With the extensions of BLR provided in this rejoinder, it would become feasible to perform logic regression in the settings of Hubin et al. (2020c). Both Ruczinski et al. (2020) and Bogdan et al. (2020) commented on the lack of correlated regressors in our simulation studies. This was mainly due to the fact that for the sake of comparison we wanted to use the scenarios from Fritsch (2006). For the more complex scenarios, we simply extended these scenarios by adding logic expressions of a higher order. In our sensitivity analysis, though, we considered one scenario with correlations (but only for one true leaf ), where reasonably good results were obtained when the correlation of a mis-speci ed leaf r was being varied from 0.1 to 1. However, speci cally to respond to the remarks from Bogdan et al. (2020), who hypothesised that our approach would only work under independence, we will provide some additional simulation results, where we consider regressors with dierent correlation structures. ba-preprint ver. 0000/00/00 file: output.tex date: A. Hubin et al. 3 Figure 1: Correlation structure of the simulated covariates with a general correlation structure (left) and from QTL back-cross (right). 2.1 Simulation study with correlated regressors In this study, we simulate the data using p = 50 regressors with two dierent types of correlation structure: The rst one is rather general and uses fairly weak correlations, whereas the second one is typical for QTL mapping and gives very strong correlations. For the rst scenario, we consider covariates which are marginally distributed according to X Bernoulli(0:5); j 2 f1; :::; 50g. The correlation matrix is obtained using the approach from Joe (2006), which allows to generate positive de nite matrices where all pairwise correlations are i.i.d. from a Beta distribution B(a; a) linearly transformed to the interval (-1, 1). The parameter of the Beta distribution equals a = alphad+(p2)=2, where alphad > 0 can be chosen. In our case for p = 50 and alphad = 5=2 it holds that the pairwise correlation lies between -0.2 and 0.2 with probability 0.85 and between -0.3 and 0.3 with probability 0.97. Correlations with an absolute value larger than 0.4 are extremely unlikely. Multivariate binary random variables X ; j 2 f1; :::; pg with such correlation structures are then simulated by thresholding normal distributions as described by Leisch et al. (1998). A typical correlation structure of covariates generated by this approach is shown in the left panel of Figure 1. The second scenario is based on the classical back-cross design for QTL mapping. We used the R/QTL package (Broman et al., 2003) to generate a map of 5 chromosomes of dierent lengths ranging from 100cM to 40cM with 10 equidistant markers per chromo- some, see Figure 2. For experimental populations, there is a direct relationship between the genetic distance between markers on the same chromosome and their correlation as described in any textbook on QTL mapping (Chen, 2016). The corresponding cor- relation structure from simulated genotypes of n = 1000 individuals from a back-cross design is illustrated by the heatmap in the right panel of Figure 1. One can see that the correlations between markers on the same chromosome are very strong, getting close to 0.9 for neighbouring markers. Response variables Y are simulated from the data generating model of Scenario 5 from Hubin et al. (2020a), where Y N (; 1), with = 1 + 1:5L + 3:5L + 9L + 7L : (1) 1 2 3 4 ba-preprint ver. 0000/00/00 file: output.tex date: 4 Rejoinder for the discussion Figure 2: Genetic map of markers on ve chromosomes of dierent length (given in centiMorgan). For the second scenario of our simulation study, these marker positions are used to simulate genotype data from a back-cross design. The closer markers on the same chromosome are lying the stronger will be the correlation of the corresponding genotype data. The exact de nition of the trees L L is given in Table 1 below and is equivalent 1 4 to the de nition in Table 2 of our original article. For the QTL mapping scenario, the responses were generated for each simulation replicate after randomly permuting the order of the genetic markers. In this way, we considered dierent patterns of correlations between the leaves of the data generating model. For both correlation structures, we generated N = 100 datasets with n = 1000 observations. Every data set was analysed with the Jereys prior and with the robust g-prior using GMJMCMC with the same tuning parameters as in Scenario 5 of the original article. We appreciate the comment of Schwender and Ickstadt (2020) that for the higher order interactions L and L the eect sizes are unrealistically large. However, as il- 3 4 lustrated by our sensitivity analysis, if one wants to have sucient power do detect more complex logic expressions with realistic eect sizes then one will need a much ba-preprint ver. 0000/00/00 file: output.tex date: A. Hubin et al. 5 larger sample size. This would be potentially feasible for real data analysis (by means of simply collecting more observations) but not for a simulation study with hundreds of simulation runs. In any case, our goal here is to show that correlated regressors are not an impediment for our approach. General QTL Jef. R. g Jef. R. g L = X 1.00 1.00 0.83 0.85 1 37 L = X ^ X 0.98 0.99 0.82 0.81 2 2 9 L = X ^ X ^ X 0.96 0.99 0.92 0.92 3 7 12 20 L = X ^ X ^ X ^ X 0.66 0.66 0.20 0.24 4 4 10 17 30 Overall Power 0.90 0.91 0.69 0.71 FP 0.78 0.73 2.02 2.01 FDR 0.13 0.13 0.39 0.38 WL 9 6 108 98 Table 1: Results for the additional simulation scenarios with correlated binary covari- ates. Power for individual trees, overall power, the expected number of false positives (FP) and FDR are compared for GMJMCMC using either Jereys prior or the robust g-prior under the general correlation structure and the correlation structure from QTL mapping with back-cross design. Table 1 summarizes the results of our simulations with correlated regressors. For the rst scenario with the general structure, correlations are ranging between 0 and 0.5 in absolute values. Comparing the results with Table 2 from the original manuscript which was based on independent regressors the dierences are relatively small. Only for L , there is a decrease in power, for Jereys prior from 0.89 to 0.66 and for the robust g-prior from 0.9 to 0.66. On the other hand, the number of false positives increases for Jereys prior from 37 to 78 and for the robust g-prior from 28 to 73. It has to be expected that the performance of logic regression becomes a little worse under correlation but GMJMCMC is still behaving very well for the scenario with a general correlation structure. Jereys prior and robust g-prior perform almost equally well with only a very slight advantage of the latter. The results in Table 1 for our second correlation structure from QTL mapping are based on the strict de nition that only discoveries of trees from the data-generating model itself are counted as true positives. While there is some loss of power, the results for the rst three logic expressions are still quite satisfactory. Only for L , the estimated power becomes unacceptably low. At the same time, the number of false positives, as well as the number of wrongly detected leaves, increases substantially. For QTL mapping, the correlation between neighboring markers often is so strong, that it becomes extremely dicult to distinguish between them. For that reason, in simulation studies for QTL-mapping, one often takes the approach that the detection of a marker strongly correlated with a QTL is still counted as a true positive. If we take such an approach and consider markers within a range of 15cM as correct representatives of a leaf from ba-preprint ver. 0000/00/00 file: output.tex date: 6 Rejoinder for the discussion the data generating model then we get slightly better results. In particular, the number of wrong leaves goes down from 108 to 50 for Jereys prior and from 98 to 58 for the robust g-prior. Extending the window for de ning true positives would further reduce the number of wrongly detected leaves. 3 Prior related aspects Clarte and Robert (2020) criticize several aspects of our choice of priors. We fully agree that the use of improper priors is debatable and should be done with great care. We had stated this explicitly already in the original article. From a theoretical point of view, our preference would be mixtures of g-priors. As a representative, the robust g-prior is implemented within our package. However, given the strong popularity of the BIC criterion, we wanted to study the performance of this choice as well. Our description of BIC as an approximation of the marginal likelihood under Jereys prior could indeed have included a discussion of its weak points. As Clarte and Robert (2020) remark, the good performance of the BIC choice is most likely connected with applying the Laplace approximation of the marginal likelihood. However, in the case of the Gaussian linear model, the approximation is exact. The main empirical point, though, is that in all our examples from the original manuscript, the BIC measure as an approximation for the marginal density performed better than the analytical expression under the robust g-prior, both in terms of eval- uation metrics and speed. Under the correlated designs provided in this rejoinder, the robust g-prior slightly outperforms Jereys prior in terms of evaluation metrics but the BIC choice still performs rather well. Moreover, the running time of GMJMCMC under Jereys prior (having all of the tuning parameters of the algorithm xed) is still signi cantly shorter. Note that the main contributions of our approach are: a) introducing novel model priors and b) a new search algorithm, whilst for the choice of the parameter priors and the calculation of the marginal densities we are using already established procedures. For example, our approach is fully compatible with integrated nested Laplace approx- imations (INLA) (Rue et al., 2009) and all of the parameter priors available there can be used. More generally, the R-package we have developed allows the users to easily specify their own method of calculating the marginal likelihood (whatever they prefer and/or what is available for their speci c model: analytical integration, Monte Carlo based approximation, or other approximations) for their own choice of parameter priors. This exibility allows extending the method easily to broader classes of logic regression models. In Section 5, for instance, we describe an extension to latent Gaussian models with both logic and non-logic covariates, where alternative types of parameter priors are possible and the marginal likelihood is computed via integrated nested Laplace approximations (INLA) (Rue et al., 2009). Also note, that in both Bayarri et al. (2012) and Li and Clyde (2018), priors on models are indirectly obtained through priors on the regression parameters. In our approach, we include speci c priors on model complexity as well. This is done via equations (3) and (4) in the main paper. The theoretical properties of combining model ba-preprint ver. 0000/00/00 file: output.tex date: A. Hubin et al. 7 and parameter priors de nitely require further distinguished research, which, we feel, lies slightly outside the scope of this rejoinder. 4 Algorithmic aspects Given that one of the main contributions of this manuscript was the development of the GMJMCMC algorithm, it is no surprise that many comments of the discussants were concerned with the algorithm. We will start with replying to some questions which are simple to answer, then give a more detailed recap of the MJMCMC algorithm (Hubin and Storvik, 2018) and nally discuss some questions on the parameter settings of GMJMCMC. Ruczinski et al. (2020) wondered whether covariates which are not logic can be easily combined with Boolean combinations in the model. The answer is yes. We will discuss this extension in Section 5.2 and provide an example in the tutorial in Section 6.3. Ruczinski et al. (2020) also suggested a simple two-stage approach where one rst checks whether the covariates have any association with the response variable at all and one only then applies logic regression. This is, of course, a viable approach which can be easily adopted. In practice, this could save resources by avoiding to run computationally costly inference on BLR. Whilst Bogdan et al. (2020) are rather reserved with respect to the proposed algo- rithm, we believe that most of their concerns are actually based on misunderstandings of the algorithm and we are glad to have the opportunity to clarify some of these points. The question of correlated regressors has been addressed in Section 2.1 of this rejoinder, where we have seen that GMJMCMC works reasonably well even when regressors are heavily dependent. Furthermore, Bogdan et al. (2020) were wondering about a lack of treatment for tautologies. This can be easily addressed because, in fact, our implemen- tation of GMJMCMC is taking care of Boolean equivalence already when generating new trees. In particular, as we discuss in Section 2.3 of the paper, "for all three op- erators it holds that if the newly generated tree is already present in S then it is not considered for S but rather a new replacement tree is proposed instead." What we do t+1 in practice is to check whether newly generated trees have correlation 1 with any tree within S , which for suciently large sample size will correspond to logic equivalence. Consequently, tautologies within a GMJMCMC chain are simply not allowed. Bogdan et al. (2020) also wonder why the sum in the denominator of (0.2) contains only M = 10000 models based on d trees from the nal stage of the algorithm. This is fin indeed one of the implemented options in our package (though M does not have to be fin 10000). The reason for this choice is to avoid having either too large and/or too densely lled hash tables (as a data structure), both of which become quite slow to handle. Whilst this introduces some undesired limitations, it remains an important pragmatic decision to make. The number of logic trees grows exponentially with the number of leaves involved and the number of models grows exponentially with the number of logic trees. Hence, even for the small examples with p = 50, the size of a hash table including all visited models and their statistics can become prohibitively large to be used in practice. That would be even more acute for larger p's. As an alternative, one could use ba-preprint ver. 0000/00/00 file: output.tex date: 8 Rejoinder for the discussion the best N models from all T generations, where N is nite and of reasonable size. H H But in this case, when the hash table is lled, the worst models must be deleted to allow new ones to be included. In practice, this strategy would become extremely slow. One has to read from, write to and delete from the almost full hash table, which will be also very large. One would either have to create some novel hashing/dehashing functions which make this approach ecient or devise an alternative data structure which is especially designed for the problem at hand. Given the complexity of enumerating logic expressions due to logic equivalence and due to the super-exponential growth of the number of models with respect to the number of leaves involved. We would expect this to be a ground breaking task in the eld of algorithms and data structures. Bogdan et al. (2020) raised the question of why we need MJMCMC for the nal population of GMJMCMC when for d = 15 full enumeration is feasible. The simple answer is that for many applications one needs much larger d to obtain reliable results, see for example the remark after Theorem 1 of Hubin et al. (2020a) and also Figure 1, panel 3, from the sensitivity analysis of Hubin et al. (2020a). For larger d, a full enu- meration will no longer be possible, whilst we would like to oer a generally functioning algorithm. Bogdan et al. (2020) additionally say: "Also, a huge random reduction of the nal model space leads to substantially dierent results for dierent parallel runs of the algorithm. Therefore the authors aggregate results from dierent runs using a weighting scheme speci ed in equation (15) of their paper. In our opinion it seems more reasonable to estimate the posterior probabilities of dierent models simply by including all models visited in dierent runs in denominator of (0.2)." We agree that in principle this is a reasonable approach, which we, in fact, suggested in Section 2.3 of our paper. There, however, we also discussed the drawback that this approach is computationally more costly because one has to transfer a large amount of information from dierent models between the cores. Finally, Bogdan et al. (2020) promote using synchronization between the cores via priority queues. Whilst we nd the idea interesting, we are a little bit sceptical whether it would actually work. When compared to embarrassing parallelization, synchronization between the processes in practice often slows down the inference instead of speeding it up (Chai and Bose, 1993; Kukanov, 2008). There, of course, a lot depends on the back-end used for implementation. We currently do not have the capacity to try this approach ourselves, but we would like, by all means, to encourage Bogdan et al. (2020) or other future researchers of BLR to test this idea. We would be very happy if using synchronization via priority queues could lead to an objectively better and faster inference algorithm for BLR than GMJMCMC. 4.1 Mode jumping Markov Chain Monte Carlo Both Bogdan et al. (2020) and Clarte and Robert (2020) seem to be slightly confused with respect to the MJMCMC algorithm (Hubin and Storvik, 2018), which we did not describe in detail in Hubin et al. (2020a). We thus brie y discuss the main ideas of MJMCMC to clarify certain misunderstandings. In Hubin and Storvik (2018), a proper MCMC algorithm for the search through a xed limited model space was proposed. The algorithm deals with the multimodality in the space of models through mode jumping proposals. The mode jumping MCMC ba-preprint ver. 0000/00/00 file: output.tex date: A. Hubin et al. 9 (MJMCMC) algorithm relies upon the idea of making smart moves between local ex- trema with a reasonable frequency. Local MCMC is performed in the absolute majority of steps. For the rest, a large move in the model space (which is likely to hit a model with very low posterior probability) is made, followed by local optimization. The goal of the latter step is to reach a local optimum in a dierent part of the model space. Then the proposal is randomized around this optimum and the transition to the proposed model is either accepted or rejected according to a Metropolis-Hastings acceptance probability. The convergence properties of the suggested Markov chain is proven through a re ne- ment of the results of Tjelmeland and Hegstad (2001). Its limiting distribution is shown to correspond to the marginal model posterior probabilities. Further extensions of the algorithm allowing for parallel computing and using mixtures of proposals were also suggested. MJMCMC is described in more detail in Algorithm 1 below, where we consider M = ( ; ::: ) to be associated with models in the given discrete model space (here 1 p 2 f0; 1g indicates whether covariate x is included in the model). We assume that j j marginal likelihoods p(YjM ) are available for a given M , and then use MJMCMC to explore p(MjY ). By Bayes formula p(YjM )p(M ) p(MjY ) = : (2) 0 0 p(YjM )p(M ) M 2 In order to calculate p(MjY ) we have to iterate through the whole model space , which becomes computationally infeasible for large p. The ordinary Monte Carlo estimate is (i) based on a number of MJMCMC samples M ; i = 1; :::; W : (i) pe(MjY ) = 1(M = M ) ! p(MjY ); (3) W!1 i=1 where 1() is the indicator function. An alternative named the renormalized model (RM) estimate by Clyde et al. (2011), is p(YjM )p(M ) pb(MjY ) = P 1(M 2 V); (4) 0 0 p(YjM )p(M ) M 2V where now V is the set of all models visited at least once during the MJMCMC run. Assuming the Markov chain eventually will visit all possible models, also pb(MjY ) will converge to p(MjY ). Note that this estimate also can utilize all models that are visited, not only those that have been accepted. This answers the comment of Bogdan et al. (2020), who presumed that we include only models accepted by MJMCMC into V. Although both (4) and (3) are asymptotically consistent, (4) will often be the preferable estimator since the convergence of the MCMC based approximation (3) is typically much slower, see Clyde et al. (2011). We now describe the MJMCMC algorithm in more detail. We aim at approximating p(MjY ) by means of searching for some subspace V of which makes the approxima- tion (4) as precise as possible. Models with high values of p(YjM ) are important to be included. This means that modes and near modal values of marginal likelihoods are ba-preprint ver. 0000/00/00 file: output.tex date: 10 Rejoinder for the discussion particularly important for the construction of V and missing them can dramatically in uence our estimates. Note that these considerations are equally important for the standard MCMC estimate (3). The main dierence is that when using (3) the number of times a speci c model is visited is important, for (4) it is enough that a model is visited at least once. In this context, the denominator of (4) becomes an extremely relevant measure for the quality of the search. It should be as large as possible in order to capture the probability mass from all the local optima of the posterior distribution, whilst at the same time the size of V should be low in order to save computational time. Algorithm 1 Mode jumping MCMC 1: Generate a large jump M according to a proposal distribution q (M jM ). 0 0 2: Perform a local optimization, de ned through M q (M jM ). k k 3: Perform a small randomization to generate the proposal M q (M jM ). 4: Generate backwards auxiliary variables M q (M jM ), M q (M jM ). 0 l 0 k o k 0 5: Put M with probability r (M; M ; M ; M ); mh k 0 k M = M otherwise, where p(M jy)q (MjM ) r k r (M; M ; M ; M ) = min 1; : (5) mh k p(Mjy)q (M jM ) Algorithm 1 describes in detail the mode jumping step within the MJMCMC algo- rithm. In the rst step, a large change in the model space is made through the proposal distribution q . This will typically lead to a model with little support in the data, so in step 2 a local optimization is performed in order to obtain a better model. Due to the need for a proper Metropolis-Hastings probability derived through a backwards move (step 4), a randomization, through q , of the local optima is needed for the reverse move back to the original model to be possible. Step 5 speci es the acceptance probability which is shown in Hubin and Storvik (2018) to satisfy the detailed balance equation with respect to p(MjY ). Hopefully, this detailed discussion of MJMCMC fully resolves the confusion of Clarte and Robert (2020), who, in their discussion, presume the following: "While we have not read the refered article on MJMCMC in detail, a rst comment is that the name itself is somewhat unsuitable, as indeed the algorithm does not sample from a distribution but only explores its surface." We would like to emphasize that the MJMCMC is not incorporating any of the ideas of Tabu search algorithms (Glover et al., 1995), which are not allowing to return to the previously visited models. This should also clarify another misleading presumption by Clarte and Robert (2020): "In a self-avoiding mode, keeping track of all the previous states visited by the chain ensures that those states will never be visited again. As we are in a discrete setting, this implies that once a mode has been visited the algorithm is constrained to eventually visit another mode, even if the potential between the modes is almost zero." ba-preprint ver. 0000/00/00 file: output.tex date: A. Hubin et al. 11 Convergence of GMJMCMC The MJMCMC algorithm, in the setting of BLR, only gives convergence within each of the restricted search spaces (populations) that it considers. We apply the MJMCMC as an inner iteration within the GMJMCMC algorithm where the space of models is dynamically modi ed. Given that the movement within and between the search spaces is irreducible with respect to the whole model space, which is shown in Theorem 1 of Hubin et al. (2020a), the GMJMCMC provides the estimates equivalent to (4). They also converge towards the right model probabilities. This fully resolves another concern from Clarte and Robert (2020) who stated that the renormalized estimator of the marginal posterior model probabilities "does not provide the theoretical security of (asymptotic) unbiasedness that is attained with MCMC method." 4.2 Parameter settings The choice of the tuning parameters for the algorithm is de nitely an important problem as indicated by Ruczinski et al. (2020) and Clarte and Robert (2020). Whilst there is not (and cannot be) any uniformly best choice of tuning parameters of GMJMCMC, we will try to brie y indicate some strategies allowing to manually choose reasonable values of the most important tuning parameters of the algorithm. Regarding the choice of the population size d and the maximal number of variables in a model k , we give some max guidance in Remark 1 after Theorem 1 in Hubin et al. (2020a): "When d > 0 (which is the N covariates with largest marginal inclusion probability in S ), some restrictions init 1 on the possible search spaces are introduced. However, when d d k , any model 1 max of maximum size k will eventually be visited. If d d < k , then every model of max 1 max size up to d d plus some of the larger models will eventually be visited, although the model space will get some additional constraints. In practice, it is more important that dd k , where k is the size of the true model. Unfortunately, neither k nor d are 1 1 known in advance, and one has to make reasonable choices of k and d depending on max the problem one analyses." Also, note that we provide some sensitivity analysis of d in Section 3.1 of the main article. Regarding the maximal depth of logic expressions C , one should use some prior max knowledge on the complexity of logic expressions. It also depends upon the individual hypotheses the researcher has. At the same time, using unreasonably large C is max prohibitive computationally and also unrealistic in terms of power to detect too complex trees. When combining two Boolean expressions, rst a decision is made whether it will be combined through an and or an or operation (with P specifying the probability and for and ) and thereafter a decision is made whether the logic not is applied to it (with probability P ). In our experience, the actual values of these tuning parameters will not not in uence the result very much with respect to nding the right expressions within the equivalence classes. However, simpler expressions (within the equivalence classes) are usually obtained when choosing somewhat larger P and somewhat smaller P . and not We recommend the choice P = 0:9 and P = 0:1. and not ba-preprint ver. 0000/00/00 file: output.tex date: 12 Rejoinder for the discussion The tuning parameter is used to determine which variables should be removed min from the current population with probability one minus the current approximation of the marginal inclusion probability of these variables. should be chosen in such a min way that it is on the one hand possible to get rid of unimportant trees, while at the same time avoiding the deletion of potentially important trees. Concerning the question of Ruczinski et al. (2020) on the choice of M and T and the resulting chain length, fin max we provided some guidance in Theorem A.1 in Section A.2 of Hubin et al. (2020b). There, we proved convergence guarantees also for xed T and M when increasing max fin the number of parallel chains of GMJMCMC. Thus, apparently, there exists a natural trade o: the more chains one can aord running in parallel the fewer resources could be used within each chain and vice versa - the less parallel chains one runs - the larger T and M are required. max fin The choice of the tuning parameters for the examples from Hubin et al. (2020a) are provided in Section A.1 of Hubin et al. (2020b). These values might be considered for problems of similar dimensionality, eect sizes and correlations between covariates. At the same time, we cannot provide a strict stopping criterion for GMJMCMC or a general rule for the choice of its parameters. Experimental tuning for dierent applications might be bene cial. If one has enough computational resources, grid search or an adaptation of Bayesian optimization for the tuning parameters of GMJMCMC (Snoek et al., 2012) can be considered. Alternatively, one might consider some kind of adaptive learning of the algorithm's tuning parameters similarly to Hubin (2019). More details on these possibilities are beyond the scope of this rejoinder. 5 Various extensions of BLR and GMJMCMC In this section, we brie y present extensions of the logic regression model. Some of these extensions are further discussed in the tutorial of Section 6 of the rejoinder. A more detailed description of the proposed extensions, including theoretical support and real applications, are material for a future publication. 5.1 Predictions with BLR As mentioned in the discussion section of Hubin et al. (2020a), our method is directly applicable to prediction as well. In particular, the standard Bayesian model averaging can be easily applied. Thus, one can approximate the posterior probability of some parameter/variable via model averaging by p ^( j Y ) = p( j M; Y )p ^(M j Y ) ; (6) M2V where might be, for example, the predictor of unobserved data based on a speci c set of covariates. Given estimates of model posterior probabilities, other prediction pro- cedures such as the median probability model (Barbieri and Berger, 2004) or the most probable model can be also easily adopted, yielding: p ^( j Y ) = p( j M ; Y ) ; (7) ba-preprint ver. 0000/00/00 file: output.tex date: A. Hubin et al. 13 where M is the selected median probability or the most probable a posteriori model. 5.2 BLR with non-binary covariates Responding to a question from Ruczinski et al. (2020), we can allow non-binary xed eects to be included in the model. For this extension, we simply replace equation (2) in Hubin et al. (2020a) with: q q+q X X h ( (X )) = + L + z ; (8) j j j j j jq j=1 j=q+1 where z ; l 2 f1; :::; q g are non-binary covariates which are not allowed to form logic expressions. In this formulation of the Bayesian logic regression, the model includes q +q possible components. The priors on the additional components ; j 2 fq + 1; :::; q + q g are of form (4) from Hubin et al. (2020a) with c(z ) = 1; j 2 fq + 1; :::; q + q g. This jq results in the following joint model prior: q+q Y c(L ) 0 j j j j =q+1 p(M ) / a a : (9) j=1 In terms of model inference, the GMJMCMC is adopted, where modi cations, mutations and reductions are only allowed for the Boolean terms. 5.3 BLR with non-binary covariates and latent Gaussian variables We also mentioned in Hubin et al. (2020a) that it is straight-forward to extend our approach for generalized linear mixed models. Here, we will formally describe this ex- tension by including both xed eects for non-binary covariates and latent Gaussian variables, which can be used to model correlation structures between observations (in space and time) and over-dispersion. For this extension, we further update equation (8), q q+q X X X h ( (X )) = + L + z + ; (10) j j j j j jq ik j=1 j=q+1 k=1 where z ; l 2 f1; :::; q g are non-binary covariates which are not allowed to form logic expressions and = ( ; :::; ) N (0; ) are latent Gaussian variables. The la- k 1k nk n k tent Gaussian variables with covariance matrices allow to model dierent correlation structures between individual observations (e.g. auto-regressive models or various other spatio-temporal models). The matrices typically depend only on a few parameters, so that in practice one has = ( ). Whilst the model priors (9) are still valid, k k parameter priors here need to be adjusted as j N (0; I e ); (11) p p ( ): (12) k k ba-preprint ver. 0000/00/00 file: output.tex date: 14 Rejoinder for the discussion Here, all kind of hyper-parameters of priors compatible with INLA (Rue et al., 2009) can in principle be chosen. This allows to eciently compute the marginal likelihoods of individual models using the INLA approach (Rue et al., 2009; Hubin and Storvik, 2016). 6 A tutorial on GMJMCMC for BLR Finally, we provide a brief tutorial on how to apply our approach in practice. Our code should be run under Linux. One would need to incorporate some sort of extra hacks (see https://bit.ly/37tf3cm) to be able to run the code under Windows (due to the limitations of the standard parallel::mclapply R function which is applied within the library). 6.1 Installing the packages We start by preparing the R environment for running our approach to BLR. The R- script below will install all packages that are needed to run the code. Depending on which R packages you have already installed, running this script might take a while. Then we install the EMJMCMC package from GitHub. 1 #******************************************************************** 2 # install all packages which will be needed for the EMJMCMC package 3 source("https://raw.githubusercontent.com/aliaksah/EMJMCMC2016/master/ 4 R/load_dependencies/loaddeps.R") 5 #******************************************************************** 6 # (currently works only under Linux) 7 install.packages("https://github.com/aliaksah/EMJMCMC2016/blob/master/ 8 EMJMCMC_1.4.2_R_x86_64-pc-linux-gnu.tar.gz?raw=true", 9 repos = NULL, type="source") 10 #******************************************************************** One might want to restart R before proceeding to have a clean environment. After having the package installed we can load EMJMCMC. 1 #******************************************************************** 2 # load the EMJMCMC package 3 library(EMJMCMC) 4 #******************************************************************** Additionally, we will need the following three packages for the tutorial, which you might have to install from CRAN. ba-preprint ver. 0000/00/00 file: output.tex date: A. Hubin et al. 15 1 #******************************************************************** 2 # load other packages needed to simulate and illustrate data 3 # if necessary these packages first have to be installed from CRAN 4 library(clusterGeneration) 5 library(bindata) 6 library(ggplot2) 7 #******************************************************************** 6.2 Running BLR with weakly correlated covariates We rst generate some binary data with the general correlation structure from the rst scenario of the simulation study above. 1 #******************************************************************** 2 # set the seed 3 set.seed(040590) 4 # construct a correlation matrix for M = 50 variables 5 M = 50 6 m = clusterGeneration::rcorrmatrix(M,alphad=2.5) 7 # simulate 1000 binary variables with this correlation matrix 8 X = bindata::rmvbin(1000, margprob = rep(0.5,M), bincorr = m) 9 #******************************************************************** The following code generates the heat-map of Figure 1 which illustrates the non-trivial correlations of the simulated binary variables. 1 #******************************************************************** 2 # prepare the correlation matrix in the melted format 3 melted_cormat = reshape2::melt(cor(X)) 4 # plot the heat-map of the correlations 5 ggplot2::ggplot(data = melted_cormat, 6 ggplot2::aes(x=Var1, y=Var2, fill=value)) + 7 ggplot2::geom_tile() + 8 ggplot2::theme(axis.title.x = ggplot2::element_blank(), 9 axis.title.y = ggplot2::element_blank(), 10 axis.text.x = ggplot2::element_blank()) 11 #******************************************************************** Next, we simulate the responses according to Scenario 4 from Hubin et al. (2020a), but with correlated binary covariates. ba-preprint ver. 0000/00/00 file: output.tex date: 16 Rejoinder for the discussion 1 #******************************************************************** 2 # simulate Gaussian responses from a model with two-way interactions 3 # which is considered in S.4 of the paper 4 df = data.frame(X) 5 df$Y=rnorm(n = 1000,mean = 1+1.43*(df$X5*df$X9)+ 6 0.89*(df$X8*df$X11)+0.7*(df$X1*df$X4),sd = 1) 7 #******************************************************************** Before performing logic regression with GMJMCMC one might like to have a look at the documentation of the R function LogicRegr : 1 #******************************************************************** 2 help("LogicRegr") 3 #******************************************************************** The following code runs inference on BLR with 32 parallel threads of GMJMCMC, where we are rst using the robust g-prior and then Jereys prior. Depending on the cluster each of these might run for some time from several minutes to more than half an hour. If you are running the code on a home PC or a laptop, please reduce ncores parameter to something reasonable for your machine (e.g. set ncores = 3). 1 #******************************************************************** 2 # specify the initial formula 3 formula1 = as.formula(paste(colnames(df)[M+1],"~ 1 + ", 4 paste0(colnames(df)[-c(M+1)],collapse = "+"))) 5 #******************************************************************** 6 # Bayesian logic regression with the robust-g-prior 7 res4G = LogicRegr(formula = formula1, data = df, 8 family = "Gaussian", prior = "G", report.level = 0.5, 9 d = 15,cmax = 2,kmax = 15, p.and = 0.9, p.not = 0.1, p.surv = 0.2, 10 ncores = 32) 11 #******************************************************************** 12 # Bayesian logic regression with the Jeffreys prior 13 res4J = LogicRegr(formula = formula1, data = df, 14 family = "Gaussian", prior = "J", report.level = 0.5, 15 d = 15, cmax = 2,kmax = 15, p.and = 0.9, p.not = 0.1, p.surv = 0.2, 16 ncores = 32) 17 #******************************************************************** We obtain the following results using the robust g-prior: ba-preprint ver. 0000/00/00 file: output.tex date: A. Hubin et al. 17 1 #******************************************************************** 2 # print the results for the robust g-prior 3 print(base::rbind(c("expressions","probabilities"),res4G$feat.stat)) 4 [,1] [,2] 5 [1,] "expressions" "probabilities" 6 [2,] "I(((X5))&((X9)))" "1" 7 [3,] "I(((X1))&((X4)))" "1" 8 [4,] "I(((X11))&((X8)))" "0.999999645314492" 9 #******************************************************************** and rather similar results with the Jereys prior: 1 #******************************************************************** 2 #print the results for the Jeffreys prior 3 print(base::rbind(c("expressions","probabilities"),res4J$feat.stat)) 4 [,1] [,2] 5 [1,] "expressions" "probabilities" 6 [2,] "I(((X11))&((X8)))" "0.999999774980675" 7 [3,] "I(((X1))&((X4)))" "0.999999520871822" 8 [4,] "I(((X5))&((X9)))" "0.999873046960372" 9 #******************************************************************** 6.3 Additional non-binary xed eects and predictions Ruczinski et al. (2020) asked whether it would be possible to include covariates in the model which are not a part of the logic expressions. Furthermore, Schwender and Ickstadt (2020) are interested in whether the model can be easily used for predictions. These options are currently not implemented in the LogicRegr function, which we would like to keep as simple as possible. At the same time, these tasks can be easily performed by a general call of the EMJMCMC::pinferunemjmcmc function which is available in our package. This routine is however much more advanced and requires, at this time, expert knowledge to be used. First, we will generate an additional Poisson distributed covariate age which is then used as an additional additive eect in the data generating logic regression model. For the sake of brevity we perform the analysis here only with Jereys prior. 1 #******************************************************************** 2 # simulate Gaussian responses from a model with two-way interactions 3 # and an age effect which is an extension of S.4 of the paper 4 Xp = data.frame(X) ba-preprint ver. 0000/00/00 file: output.tex date: 18 Rejoinder for the discussion 5 Xp$age = rpois(1000,lambda = 34) 6 Xp$Y=rnorm(n = 1000,mean = 1+0.7*(Xp$X1*Xp$X4) + 7 0.89*(Xp$X8*Xp$X11)+1.43*(Xp$X5*Xp$X9) + 2*Xp$age, sd = 1) 8 #******************************************************************** We will not only perform model inference but also show how to make predictions with the EMJMCMC package. To this end we will randomly divide the data into a training set (900 observations) and a testing set (100 observations). 1 #******************************************************************** 2 teid = sample.int(size =100,n = 1000,replace = F) 3 test = Xp[teid,] 4 train = Xp[-teid,] 5 #******************************************************************** The function pinferunemjmcmc has more capabilities than performing logic regres- sion. First, one might want to see its arguments: 1 #******************************************************************** 2 help("pinferunemjmcmc") 3 #******************************************************************** The following call of pinferunemjmcmc performs logic regression using 30 cores. Note that the non-binary covariate is not a part of the formula passed to the function, but is rather speci ed through runemjmcmc:params$latnames = "I (age)". Also, one might expect this to run slightly longer than previous examples, particularly because keeping track of the coecients for prediction takes some additional time. Further, many of the input options used are explained in the help pages of pinferunemjmcmc. If one is not interested in predictions, runemjmcmc:params$save:beta = F , predict = F and test:data = NULL should be set (this will decrease inference time for the same training data sample and other tuning parameters xed). 1 #******************************************************************** 2 # specify the link function 3 g = function(x) x 4 #******************************************************************** 5 # specify the parameters of the custom estimator function 6 estimator.args = list(data = train, n = dim(train)[1], 7 m =stri_count_fixed(as.character(formula1)[3],"+"),k.max = 15) 8 #******************************************************************** ba-preprint ver. 0000/00/00 file: output.tex date: A. Hubin et al. 19 9 # specify the parameters of gmjmcmc algorithm 10 gmjmcmc.params = list(allow_offsprings=1,mutation_rate = 250, 11 last.mutation=10000, max.tree.size = 5, Nvars.max =15, 12 p.allow.replace=0.9,p.allow.tree=0.01,p.nor=0.01,p.and = 0.9) 13 #******************************************************************** 14 # specify some advenced parameters of mjmcmc 15 mjmcmc.params = list(max.N.glob=10, min.N.glob=5, max.N=3, min.N=1, 16 printable = F) 17 #******************************************************************** 18 # run the inference of BLR with a non-binary covariate and predicions 19 res.alt = pinferunemjmcmc(n.cores = 30, report.level = 0.2, 20 num.mod.best = 100,simplify = T,predict = T,test.data = test, 21 link.function = g, 22 runemjmcmc.params = list(formula = formula1,latnames = c("I(age)"), 23 data = train,estimator = estimate.logic.lm.jef, 24 estimator.args =estimator.args, 25 recalc_margin = 249, save.beta = T,interact = T,outgraphs=F, 26 interact.param = gmjmcmc.params, 27 n.models = 10000,unique = T,max.cpu = 4,max.cpu.glob = 4, 28 create.table = F,create.hash = T,pseudo.paral = T,burn.in = 100, 29 print.freq = 1000, 30 advanced.param = mjmcmc.params)) 31 #******************************************************************** Below a list of the logic expressions and non-logic covariates that were found to be of importance is listed. There, we clearly see that all features from the data-generative model are detected without any false positive discoveries. 1 #******************************************************************** 2 print(base::rbind(c("expressions","probabilities"),res.alt$feat.stat)) 3 [,1] [,2] 4 [1,] "expressions" "probabilities" 5 [2,] "I(((X5))&((X9)))" "1" 6 [3,] "I(age)" "0.999999999999998" 7 [4,] "I(((X11))&((X8)))" "0.999999990458405" 8 [5,] "I(((X1))&((X4)))" "0.99999997999928" 9 #******************************************************************** 1 p To assess the quality of prediction we use two criteria, RMSE = n (Y Y ) i=1 i i 1 p ^ ^ and MAE = n jY Y j, where Y are responses in the test data, Y are model i=1 i i i i averaged predictions of them, and n is the size of the test data set. ba-preprint ver. 0000/00/00 file: output.tex date: 20 Rejoinder for the discussion 1 #******************************************************************** 2 print(sqrt(mean((res.alt$predictions-test$Y)^2))) 3 [1] "0.8835489" 4 print(mean(abs((res.alt$predictions-test$Y)))) 5 [1] "0.6904736" 6 #******************************************************************** We want to compare the performance of BLR in this example with a simple standard ap- proach, namely ridge regression (Zou and Hastie, 2005), combined with model selection according to AIC. In the script below we run ridge regression and perform prediction on the test data set. 1 #******************************************************************** 2 library(HDeconometrics) 3 ridge = ic.glmnet(x = train[,-51],y=train$Y,family = "gaussian", 4 alpha = 0) 5 predict.ridge = predict(ridge$glmnet,newx = as.matrix(test[,-51]), 6 type = "response")[,which(ridge$glmnet$lambda == ridge$lambda)] 7 print(sqrt(mean((predict.ridge-test$Y)^2))) 8 [1] "1.061406" 9 print(mean(abs((predict.ridge-test$Y)))) 10 [1] "0.865467" 11 #******************************************************************** We nally compute the evaluation metrics for prediction based on the expectations of the data-generative (true) model for the test data: 1 #******************************************************************** 2 tmean = 1+2*test$age+0.7*(test$X1*test$X4) + 3 0.89*(test$X8*test$X11)+1.43*(test$X5*test$X9) 4 print(sqrt(mean((tmean - test$Y)^2))) 5 [1] "0.8671786" 6 print(mean(abs((tmean - test$Y)))) 7 [1] "0.6850737" 8 #******************************************************************** We clearly see that for this speci c example logic regression signi cantly outperforms the ridge regression baseline with respect to both RMSE and MAE. This is not surprising given that the data generative process has multiple non-linear eects. Moreover, the predictions obtained by the BLR model are extremely close to the predictions from the means of the data generative model. ba-preprint ver. 0000/00/00 file: output.tex date: A. Hubin et al. 21 7 Comparison with other approaches Several other approaches were mentioned by the discussants. Ruczinski et al. (2020) mentioned that simulated annealing for logic regression could be equipped with a pe- nalized likelihood criterion following from the priors used in our setting. Schwender and Ickstadt (2020) pointed out certain similarities of GMJMCMC with Genetic Program- ming for Association Studies as well as logic Feature Selection. Bogdan et al. (2020) mentioned the recently developed non-reversible MCMC algorithms as well as parallel tempering MCMC algorithms. It would be most interesting to compare all these dier- ent algorithms with GMJMCMC but we believe this would need substantial additional eort and goes far beyond the scope of this rejoinder. We leave these possibilities open as topics for further research. 8 Conclusions We would like to thank once again all of the discussants for their valuable and insightful feedback. We are happy to have provoked so many questions, comments and remarks. We hope that we managed to shed light on the majority of them in this rejoinder. Moreover, we provided some useful extension of Bayesian logic regression method here. The discussions also motivate multiple directions for further research, which are outside the scope of this rejoinder. However, we hope this research will be in future performed in close collaboration with the discussants. References Barbieri, M. M. and Berger, J. O. (2004). \Optimal predictive model selection." The annals of statistics , 32(3): 870{897. Bayarri, M. J., Berger, J. O., Forte, A., Garc a-Donato, G., et al. (2012). \Criteria for Bayesian model choice with application to variable selection." The Annals of statistics , 40(3): 1550{1577. Bogdan, M., Miasojedow, B., and Wallin, J. (2020). \Discussion of "A Novel Algorithmic Approach to Bayesian Logic Regression" by A., Hubin, G. Storvik and F. Frommlet." Bayesian Analysis . Broman, K. W., Wu, H., Sen, S., and Churchill, G. A. (2003). \R/qtl: QTL mapping in experimental crosses." Bioinformatics , 19(7): 889{890. Chai, J. S. and Bose, A. (1993). \Bottlenecks in parallel algorithms for power system stability analysis." IEEE Transactions on Power Systems , 8(1): 9{15. Chen, Z. (2016). Statistical methods for QTL mapping . Chapman and Hall/CRC. Clarte, G. and Robert, C. (2020). \A discussion on "A Novel Algorithmic Approach to Bayesian Logic Regression"." Bayesian Analysis . Clyde, M. A., Ghosh, J., and Littman, M. L. (2011). \Bayesian Adaptive Sampling for ba-preprint ver. 0000/00/00 file: output.tex date: 22 Rejoinder for the discussion Variable Selection and Model Averaging." Journal of Computational and Graphical Statistics , 20(1): 80{101. Fritsch, A. (2006). \A Full Bayesian Version of Logic regression for SNP Data." Ph.D. thesis, Diploma Thesis. Glover, F., Kelly, J. P., and Laguna, M. (1995). \Genetic algorithms and tabu search: hybrids for optimization." Computers & Operations Research , 22(1): 111{134. Hubin, A. (2019). \An adaptive simulated annealing EM algorithm for inference on non- homogeneous hidden Markov models." In Proceedings of the International Conference on Arti cial Intelligence, Information Processing and Cloud Computing , 1{9. Hubin, A. and Storvik, G. (2016). \Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA)." ArXiv:1611.01450v1. | (2018). \Mode jumping MCMC for Bayesian variable selection in GLMM." Compu- tational Statistics & Data Analysis , 127: 281{297. Hubin, A., Storvik, G., and Frommlet, F. (2020a). \A novel algorithmic approach to Bayesian Logic Regression." Bayesian Analysis . | (2020b). \Supplementary material for "A novel algorithmic approach to Bayesian Logic Regression"." URL https://projecteuclid.org/download/suppdf_1/euclid.ba/1545296448 Hubin, A., Storvik, G., Grini, P., and Butenko, M. (2020c). \A Bayesian binomial regression model with latent Gaussian processes for modelling DNA methylation." Austrian Journal of Statistics , 49-50. Joe, H. (2006). \Generating random correlation matrices based on partial correlations." Journal of Multivariate Analysis , 97(10): 2177{2189. Kukanov, A. (2008). \Why a simple test can get parallel slowdown." URL https://software.intel.com/en-us/blogs/2008/03/04/ why-a-simple-test-can-get-parallel-slowdown Leisch, F., Weingessel, A., and Hornik, K. (1998). \On the generation of correlated arti cial binary data." Li, Y. and Clyde, M. A. (2018). \Mixtures of g-priors in generalized linear models." Journal of the American Statistical Association , 113(524): 1828{1845. Ruczinski, I., Kooperberg, C., and LeBlanc, M. (2020). \Invited comment on Article by Hubin, Storvik and Frommlet." Bayesian Analysis . Rue, H., Martino, S., and Chopin, N. (2009). \Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations." Journal of the Royal Statistical Sosciety , 71(2): 319{392. Schwender, H. and Ickstadt, K. (2020). \Discussion of A Novel Algorithmic Approach to Bayesian Logic Regression." Bayesian Analysis . Snoek, J., Larochelle, H., and Adams, R. P. (2012). \Practical Bayesian optimization of ba-preprint ver. 0000/00/00 file: output.tex date: A. Hubin et al. 23 machine learning algorithms." In Advances in neural information processing systems , 2951{2959. Tjelmeland, H. and Hegstad, B. K. (2001). \Mode jumping proposals in MCMC." Scandinavian Journal of Statistics , 28(1): 205{223. Zou, H. and Hastie, T. (2005). \Regularization and variable selection via the elastic net." Journal of the royal statistical society: series B (statistical methodology), 67(2): 301{320. ba-preprint ver. 0000/00/00 file: output.tex date:

Statistics – arXiv (Cornell University)

**Published: ** May 1, 2020

Loading...

You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!

Read and print from thousands of top scholarly journals.

System error. Please try again!

Already have an account? Log in

Bookmark this article. You can see your Bookmarks on your DeepDyve Library.

To save an article, **log in** first, or **sign up** for a DeepDyve account if you don’t already have one.

Copy and paste the desired citation format or use the link below to download a file formatted for EndNote

Access the full text.

Sign up today, get DeepDyve free for 14 days.

All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.