Access the full text.
Sign up today, get DeepDyve free for 14 days.
References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.
THE AMERICAN STATISTICIAN 2023, VOL. 00, NO. 0, 1–12: General https://doi.org/10.1080/00031305.2023.2203177 Sandra Siegfried , Lucas Kook , and Torsten Hothorn Institut für Epidemiologie, Biostatistik und Prävention, Universität Zürich, Zürich, Switzerland ABSTRACT ARTICLE HISTORY We introduce a generalized additive model for location, scale, and shape (GAMLSS) next of kin aiming Received August 2022 Accepted April 2023 at distribution-free and parsimonious regression modeling for arbitrary outcomes. We replace the strict parametric distribution formulating such a model by a transformation function, which in turn is estimated KEYWORDS from data. Doing so not only makes the model distribution-free but also allows to limit the number of linear Additive models; Conditional or smooth model terms to a pair of location-scale predictor functions. We derive the likelihood for contin- distribution function; Model uous, discrete, and randomly censored observations, along with corresponding score functions. A plethora selection; Regression trees; of existing algorithms is leveraged for model estimation, including constrained maximum-likelihood, the Smoothing; Transformation original GAMLSS algorithm, and transformation trees. Parameter interpretability in the resulting models models is closely connected to model selection. We propose the application of a novel best subset selection procedure to achieve especially simple ways of interpretation. All techniques are motivated and illustrated by a collection of applications from different domains, including crossing and partial proportional hazards, complex count regression, nonlinear ordinal regression, and growth curves. All analyses are reproducible with the help of the tram add-on package to the R system for statistical computing and graphics. 1. Introduction standard deviation σ(x) as Location-scale regression has its roots in two-sample compar- y − μ(x) P(Y ≤ y | X = x) = , isons, where one extends the location model for some distribu- σ(x) tion function under treatment F(y−μ) by adding a scale param- with being the standard normal cumulative distribution func- eter σ to the location shift μ,that is F (y − μ)/σ ,in compari- tion. Without relying on such a prior assumption of a parametric son to the distribution function F(y) under no treatment. One of distribution D,Tostesonand Begg (1988,eq. 1) introduced a the earliest contributions is Lepage’s test (Lepage 1971), which is distribution-free location-scale ordinal regression model in the essentially a combination of the Wilcoxon and Ansary-Bradley context of receiver operating characteristic (ROC) analysis for statistics. Generalized additive models for location, scale, and ordinal responses Y ∈{y < y < ··· < y } formulated as a 1 2 K shape (GAMLSS, Rigby and Stasinopoulos 2005;Stasinopou- conditional cumulative distribution function los and Rigby 2007) can be motivated as a generalization of the two-sample location-scale model to the regression setup, ϑ − μ(x) P(Y ≤ y | X = x) = F , that is, with covariate-dependent location and scale parameters, σ(x) μ(x) and σ(x), and also potentially other parameters ν(x) and k = 1, ... , K −1(1) τ(x) describing skewness and kurtosis. Thus, GAMLSS allow explanatory variables to aeff ct multiple moments of a variety with intercept thresholds ϑ depending on the kth response of parametric distributions and can be understood as an early category, parameters ϑ ≤ ϑ being monotonically nonde- k k+1 forerunner of “distributional” regression models (Kneib et al. creasing. The model features two model terms, μ(x) and σ(x), 2023). and is den fi ed by a cumulative distribution function F.Tosteson For a continuous response variable Y with explanatory vari- and Begg (1988)discuss normal (F = )and logit models −1 ables X = x, GAMLSS are characterized by a parametric (F = logit ) in more detail. The latter corresponds to the distributionD with typically no more than four parameters μ(x) “non-linear” odds model discussed in McCullagh (1980,sec. for location, σ(x) for scale, ν(x) for skewness, and τ(x) for 6.1), which was later extended in terms of “partial proportional kurtosis.For thesimplest case assuming anormal distribution odds models” (Peterson and Harrell 1990). A very attractive D = N(μ(x), σ(x) ) for the conditional response of Y ∈ R,the feature of such models is their distribution-free nature and model can be written in terms of the conditional mean μ(x) and easily comprehensible covariate-dependence through location- CONTACT Torsten Hothorn Torsten.Hothorn@R-project.org Institut für Epidemiologie, Biostatistik und Prävention, Universität Zürich, Hirschengraben 84, CH-8001 Zürich, Switzerland. Supplementary materials for this article are available online. Please go to www.tandfonline.com/r/TAS. © 2023 The Author(s). Published with license by Taylor & Francis Group, LLC. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The terms on which this article has been published allow the posting of the Accepted Manuscript in a repository by the author(s) or with their consent. 2 S. SIEGFRIED, L. KOOK, AND T. HOTHORN scale parameters μ(x) and σ(x).However,they lackthe broad 2.1. Interpretation applicability of the GAMLSS family, for example to censored, In all models (2), positive values of the location parameter μ(x) bounded or mixed discrete-continuous responses. Inspired by correspond to larger values of the response and smaller values of location-scale ordinal regression our primary aim is to develop the scale parameter σ(x) are associated with smaller variability a distribution-free and parsimonious flavor of GAMLSS. of the response and thus result in more “concentrated” condi- We propose a generalization of location-scale ordinal regres- tional distributions (Figure 1). Fitted models can conveniently sion by introducing a smooth parsimonious parameterization of be inspected on the scale of the conditional distribution, sur- the intercept thresholds in terms of a transformation function, vival, density, (cumulative) hazard, odds, or quantile functions. allowing to estimate distribution-free location-scale models for Statements beyond these general facts and interpretation of continuous, discrete, and potentially censored or truncated out- μ(x) and σ(x) in particular depend on the specific choice of comes in a uniefi d maximum likelihood framework (Hothorn F.Suitable choices for F include inverses of common link func- et al. 2018). This framework of location-scale transformation −1 −1 −1 tions, such as F = = probit , F = logit , F = cloglog , models allows one to model the impact of explanatory variables −1 or F = loglog .For σ(x) ≡ 1, the model reduces to well- on the location and the dispersion of the response distribution, −1 established regression models. For F = cloglog ,one obtains without relying on distributional assumptions. We demonstrate a proportional hazards model, a proportional reverse-time haz- the practical merits of such an approach by applications of −1 ards model is defined by F = loglog ,and a proportional odds (a) maximum-likelihood estimation in stratiefi d models and −1 model given by the choice F = logit .For F = , μ(x) = models forcrossing orpartially proportional hazards, (b) novel E(h(Y | ϑ ) | x) is the conditional mean of the h-transformed location-scale regression trees, (c) transformation models with response. Anoverviewonthese models andinterpretation of smooth nonlinear location-scale parameters for growth-curve μ(x) is available from Hothorn et al. (2018, Table 1). analysis, and discuss (d) model selection issues arising in these Under certain circumstances, these simple ways of interpre- contexts. tation carry over to location-scale models of form (2). Con- −1 sider Cox’ proportional hazards model (F = cloglog )for acontinuous survival time. Achangefrom x to x ˜ is reflected 2. Model by the dier ff ence μ(x ˜) − μ(x) on the scale of the log-hazard For univariate and at least ordered responses variables Y ∈ we functions conditional on x and x ˜, respectively. The introduction propose to study regression models describing the conditional of a scale parameter σ(x) to this model does not aeff ct this distribution function of Y given explanatory variables X = x as form of interpretation as long as σ(x) = σ(x ˜),owing to the −1 fact that in model (2) μ(x) is not multiplied with σ(x) .For a −1 P(Y ≤ y | X = x) = F σ(x) h(y | ϑ ) − μ(x) , proportional odds model, μ(x) − μ(x) is the vertical dier ff ence between the two conditional log-odds functions. Therefore, if y ∈ .(2) model interpretation on these scales is important for certain explanatory variables, one should try to omit these variables The model is characterized by (i) a monotonically increasing from the scale term. transformation function h : → R depending on parameters Another form of model interpretation can be motivated from ϑ ∈ R , (ii) a cumulative distribution function F : R →[0, 1] probabilistic index models (Thas et al. 2012), which describe of some random variable with log-concave Lebesgue density theimpactofa transition from x to x ˜ by the probabilistic index on the real line, (iii) a covariate-dependent location parameter P(Y ≤ Y | x, x ˜). This probability can be derived from trans- μ(x) ∈ R, and (iv) a covariate-dependent scale parameter formation models (Sewak and Hothorn 2023). For the probit σ(x) ∈ R . The model is distribution-free in the sense that −1 location-scale transformation model (σ (x) h(y | ϑ )−μ(x)), a unique transformation function h exists for every baseline for example, the probabilistic index has a simple form, distribution P(Y ≤ y | X = x ),that is, a distribution conditional on explanatory variables x with μ(x ) = 0and 0 0 ˜ ˜ P Y ≤ Y | x, x ˜ = P h(Y | ϑ ) ≤ h(Y | ϑ ) | x, x ˜ σ(x ) = 1. In this case, the transformation function is given by −1 h(y | ϑ ) = F (P(Y ≤ y | X = x )). Conditional distributions σ(x ˜)μ(x ˜) − σ(x)μ(x) = , arising from changing x to x are linear in h on the scale of the 0 2 2 σ(x ˜) + σ(x) −1 −1 link function F (P(Y ≤ y | X = x)) = σ(x) h(y | ϑ )−μ(x). with two independent draws from this model, the rfi st, Y,con- Unknowns to be estimated are the parameters ϑ defining the ditional on x and the second, Y, conditional on x ˜.Especially transformation function, the location function μ(x),and the in cases where an explanatory variable affects both the loca- scale function σ(x),whereas F is chosen a priori. The applica- tion term μ(x) but also the scale term σ(x),the probabilis- bility to ordered, count, or continuous outcomes possibly under tic index may serve as a comprehensive measure to describe random censoring, its distribution-free nature, and the location- the impact of changes in the covariate congfi uration on the scale formulation allowing simple interpretation of the impact response’s distribution. explanatory variables have on the response’ distribution shall be discussed in the following. Figure 1 illustrates the flexibility of the location-scale transformation model on the scale of the link 2.2. Parameterization −1 function, that is F (P(Y ≤ y | X = x)), and of the conditional distribution function P(Y ≤ y | X = x) for dier ff ent values of We in general express the transformation function in terms of μ(x) and σ(x). P basis functions h(y | ϑ ) = a(y) ϑ.For absolute continuous THE AMERICAN STATISTICIAN 3 Figure 1. Location-scale transformation model. The transformation (left) and cumulative distribution function (right) are shown for the baseline configura tion (i.e., μ(x) = 0and σ(x) = 1) and different values of the location parameter μ(x) and of the scale parameter σ(x). responses Y ∈ ⊆ R, the transformation function h can be that a(y ) ϑ = ϑ depending on the category k = 1, ... , K −1. k k conveniently parameterized in terms of a polynomial in Bern- A nonparametric version assigns one parameter to each unique stein form h(y | ϑ ) = a (y) ϑ (McLain and Ghosh 2013; value of the outcome in the same way. In all cases, monotonicity Bs,P−1 Hothorn et al. 2018). The basis functions a (y) ∈ R are of h can be implemented by the constraints ϑ ≤ ϑ , p ∈ Bs,P−1 p p+1 specicfi beta densities (Farouki 2012) and it is straightforward to 1, ... , P − 1(Hothorn etal. 2018). obtain derivatives and integrals of h(y | ϑ ) = a (y) ϑ with Bs,P−1 respect to y and, under suitable constraints, a monotonically increasing transformation function h (Hothorn et al. 2018). For 2.3. Likelihood count responses Y ∈{0, 1, 2, ... } this transformation function is evaluated for integer values only, that is, h(y | ϑ ) (Siegfried From model (2), the log-likelihood contribution and Hothorn 2020). For ordered categorical responses Y ∈ (ϑ, μ(x ), σ(x )) of an observation (y , x ) with y ∈ R i i i i i i {y < ··· < y } the transformation function is den fi ed such given as a function of the unknown parameters ϑ, μ(x ),and 1 K i 4 S. SIEGFRIED, L. KOOK, AND T. HOTHORN σ(x ) is i Algorithm 1 Best subset selection for location-scale transforma- tion models. −1 N J N log f σ(x ) h(y | ϑ ) − μ(x ) (3) Require: Data {(y , x )} ∈ ( ×R ) , max. support size s ∈{1, .. . ,2J}, i i i max i i i=1 max. splicing size k ≤ s , tuning threshold τ ∈ R for s = max max s + −1 + log σ(x ) + log h (y | ϑ ) . i i 1, ... , s max 1: Fit unconditional model: ϑ ← arg max (ϑ,0, 1) ϑ ∈R i=1 2: Compute bivariate score residuals: Evaluating this expression requires the Lebesgue density f = F ∂ −1 and the derivative h (y | ϑ ) = a (y) ϑ of the transformation (r , r ) ← ϑ, β, exp(γ ) loc sc i i ∂(β, γ) ϑ =ϑ,β=0,γ =0 function with respect to y. For a discrete, left-, right- or interval- censored observation (y , y ¯ ] the exact log-likelihood contribu- i i 3: for s = 1, 2, ... , s do max ¯ 4: Initialize active set: tion (ϑ, μ(x ), σ(x )) = log{P(Y ∈ (y , y ¯ ]| x )} is i i i i i i A = i : 1 |cor(x , r )|≥ |cor(x , r )| + s j loc i loc −1 j=1 log F σ(x ) h(y ¯ | ϑ ) − μ(x ) i i i 2J 1 |cor(x , r )|≥ |cor(x , r )| ≤ s , k sc i sc −1 −F σ(x ) h(y | ϑ ) − μ(x ) .(4) i i i k=J+1 0 0 I ={1, .. . ,2J}\A s s For observed categories y , the datum is specified by (y , y ] 5: for m = 0, 1, 2, ... do k k−1 k 6: Run Algorithm 2 in Zhu et al. (2020): and for counts y ∈ N it is (y , y ¯ ]= (y − 1, y ]. For random i i i i i right-censoring at time t it is given by (y , y ¯ ]= (t , ∞) and for m+1 m+1 m+1 m+1 m+1 i i i i ˆ ˆ ϑ , β , γ ˆ , A , I ← s s s s s left-censoring at time t by (y , y ¯ ]= (0, t ]. i i i i m m m m m ˆ ˆ Splicing(ϑ , β , γ ˆ , A , I , k , τ ) ¯ max s s s s s s For the important special case of i = 1, ... , N indepen- dent realizations from model (2) with linear location term m+1 m+1 m m 7: if A , I = A , I then s s s s −1 μ(x) = x β and log-linear form for the scale term σ(x) = 8: stop 9: end if exp(x γ ), the unknown parameters ϑ, β,and γ can be esti- 10: end for mated simultaneously by maximizing the corresponding log- m+1 m+1 m+1 m+1 ˆ ˆ ˆ ˆ ˆ 11: ϑ , β , γ ˆ , A ← ϑ , β , γ ˆ , A s s s s s s s s likelihood under suitable constraints 12: end for 13: Choose optimal support size based on SIC: −1 ˆ ˆ −1 (ϑ , β , γ ˆ ) = arg max ϑ, x β, exp(x γ ) N i N N i i ˆ ˆ s ˆ = arg min − ϑ , x β , exp(x γ ˆ ) + i s i s s ϑ,β,γ i=1 s i=1 subject to ϑ ≤ ϑ , p ∈ 1, ... , P − 1. p p+1 β , γ (log 2J)(log log N) s s Score functions and Hessians as well as conditions for ˆ ˆ 14: return (ϑ , β , γ ˆ , A ) ˆ ˆ ˆ ˆ s s s s likelihood-based inference can be derived from the expressions given in Hothorn et al. (2018) for models defined in terms of ϑ and β. 3. Inference for Applications Motivated by applications from dier ff ent domains we detail the estimation of location-scale transformation models, including 2.4. Model Selection important aspects of model evaluation, interpretation and test- Model selection in this framework can be performed by includ- ing. The wide range of applications of the model framework ing an L penalty in the likelihood implied by model (2) is further exempliefi d by contrasting it to established location- scale models. −1 In Section 3.1.1 we outline the estimation of location- max ϑ, x β, exp(x γ ) , i i scale transformation models from the perspective of a strati- P J J ϑ ∈R ,β∈R ,γ ∈R i=1 fied model. Section 3.1.2 presents the application of location- scale models to survival data in the presence of crossing hazards, subject to β , γ ≤ s, furtherintroducingalocation-scalealternativetothecommonly used log-rank test. Interpretability of location-scale transforma- using an adaptation of the sequencing-and-splicing technique tion models is exempliefi d in Section 3.1.3 assessing seasonal suggested by Zhu et al. (2020). Here, s ∈{1, ... ,2J} denotes a and annual patterns of deer-vehicle collisions. The estimation fixed support size and · denotes the L norm. The parameters of nonlinear, tree-based location-scale transformation models of the transformation function ϑ remain unpenalized. When the is discussed for self-reported orgasm frequencies of Chinese support size s is unknown, s is tuned by minimizing a high- women in Section 3.2.Inspired by the GAMLSS framework, dimensional information criterion (SIC). The procedure is sum- Section 3.3 describes the estimation of a distribution-free ver- marized in Algorithm 1. Further information on the choice of sion of additive models featuring smooth covariate-dependent the initial active set and the inclusion of unpenalized parameters location and scale terms. The application of model selection in is given inthe supplementarymaterialC. this framework is exemplified in Section 3.4. THE AMERICAN STATISTICIAN 5 All analyses can be replicated using the tram R package the location-scale model does not make assumptions about the (Hothorn et al. 2023)with transformation function h and thus the distribution of blood loss for vaginal deliveries. However, the location and scale terms gov- R> demo("stram", package = "tram") ern the discrepancy between blood loss distributions for the two modes of delivery, and thus this approach can be characterized 3.1. Maximum Likelihood as being distribution-free but not model-free. Due tothe practical challengesinmeasuring bloodlossin 3.1.1. Stratification the hectic environment of a delivery ward, interval-censored Haslinger et al. (2020) reported data from measuring post- observations were reported and the corresponding interval- partum blood loss Y ∈ R during 676 vaginal deliveries censored negative log-likelihood (4) is minimized by Aug- and 632 caesarean sections at the University Hospital Zurich, mented Lagrangian Minimization (Madsen et al. 2004)toesti- Switzerland. Aiming to contrast blood loss during vaginal deliv- mate the parameters ϑ, β,and γ simultaneously. Visual inspec- eries or cesarean sections the conditional distributions can be tion of distribution and density functions as well as the in- estimated by straticfi ation, for example. sample log-likelihoods in Figure 2 shows that the two models are In the following we estimate such a stratiefi d model with practically identical. The two estimated conditional distribution two separate transformation functions h(y | delivery = functions cross around 1000 ml, which is only possible due vaginal) = a (y) ϑ and h(y | delivery = cesarean) = Bs,15 vaginal to estimation of two separate transformation functions h(y | a (y) ϑ as polynomials inBernstein form. Inasim- Bs,15 cesarean delivery) or via the inclusion of the delivery mode dependent ilar spirit, a location-scale transformation model with transfor- scale term exp(γ ). mation function a (y) ϑ for vaginal deliveries and transfor- Bs,15 We can also compute the probabilistic index here, which indi- mation function exp(γ )a (y) ϑ − β for cesarean sections Bs,15 cates that a randomly selected woman having a vaginal delivery can be den fi ed. Transformation functions of both models were has a probability of 0.71 (95% conden fi ce interval: 0.68–0.74) denfi edon the probit scale. Thestratiefi d, distribution- and for a lower blood loss compared to a randomly selected woman model-free approach estimates 2 × P = 32 parameters whereas undergoing a cesarean section. the distribution-free location-scale model consists of only P + 2 = 18 parameters, providing a lower-dimensional alternative 3.1.2. Crossing Hazards to stratification. Owing to Weierstrass’ theorem, polynomials in In the following we reanalyze a two-arm randomized con- Bernstein form with suci ffi ently large order P − 1can approx- trolled trial of 90 patients with gastric cancer (Schein and Gas- imate any continuous function on an interval and therefore trointestinal Tumor Study Group 1982). Trial patients received Figure 2. Stratification. Distribution (top) and density ( bottom) of postpartum blood loss conditional on delivery mode estimated by the stratified transform ation model (left) and location-scale transformation model (right). In addition, the empirical cumulative distribution function is shown in the top row, in-sample log-likelihoods are given in the bottom row. 6 S. SIEGFRIED, L. KOOK, AND T. HOTHORN Figure 3. Crossing hazards. The survivor functions of the two groups estimated by the nonparametric Kaplan-Meier method (step function) are shown along the estimates from the location-scale Weibull model (left) and the distribution-free location-scale transformation model (right). either chemotherapy and radiotherapy (intervention group) MacKenzie (2017) for crossing-hazards problems, also leads to or chemotherapy alone (control group). The nonparametric arejection with p-value = 0.032. Kaplan-Meier estimates of the survivor functions of both groups An alternative location-scale test can be motivated in analogy in Figure 3 reveal crossing of the curves at approximately to the log-rank test. The bivariate permutation score test for γ 1000 days, and thus nonproportional hazards. and β for testing the null γ = β = 0 is defined based on the Nonproportional hazards are a common violation of a stan- unconditional model P(T > t) = exp[− exp{h(t | ϑ )}],that is, dard model assumption in survival analysis necessitating tai- themodel tfi tedunder theconstraint γ = β = 0. Thus, the like- −1 lored models to express dieff rences in survival times T ∈ R lihood contribution of the ith subject is (ϑ, β, exp(γ ) ) = by interpretable parameters. We suggest a location-scale trans- (ϑ,0,1). The maximum-likelihood estimator is formation model of the form ϑ = arg max (ϑ,0,1), P(T > t | control) = exp[− exp{h(t | ϑ )}] i=1 subject to ϑ ≤ ϑ , p ∈ 1, ... , P − 1. P(T > t | intervention) = exp − exp exp(γ )h(t | ϑ ) − β . p p+1 The individual score contributions are defined as For γ = 0, themodel reducestoaproportional hazards −1 model with log-hazard ratio β. A distribution-free version can ∂ ϑ, β, exp(γ ) be implemented by choosing a polynomial in Bernstein form for r = ∈ R . ∂(β, γ) the transformation function h(t | ϑ ) = a (t) ϑ.AWeibull Bs,6 ϑ =ϑ,β=0,γ =0 model corresponds to a log-linear transformation function h(t | Note that the rfi st element of r is the log-rank score for the ith ϑ ) = ϑ + ϑ log(t), which was introduced as a special case 1 2 individual and a bivariate linear test statistic is simply the sum in the GAMLSS framework (Rigby and Stasinopoulos 2005, of thescoresinthe intervention group. Aeft r appropriate stan- Table 1) and later investigated in more detail by Burke and dardization, maximum-type statistics or quadratic forms can be MacKenzie (2017)and Peng et al. (2020)using the equivalent used to obtain p-values from the asymptotic or approximate parameterization ϑ + exp(ϑ ) log(t) for the control and ϑ + 1 2 1 permutation distribution (Hothorn et al. 2006). The log-rank exp(ϑ + γ) log(t) + β for the intervention group. Further test alone does not lead to a rejection at 5% (p-value = 0.638) extensions of the Weibull location-scale model were studied in but the bivariate test does (p-value = 0.002 for the maximum- Burke et al. (2020b) and a nonparametric approach, leaving h type and p-value = 0.001 for the quadratic form). completely unspecied fi , was theoretically discussed by Zeng and Lin (2007)and Burke et al. (2020a). 3.1.3. Partial Proportional Hazards Model parameters for both models were estimated by max- We analyze a time series of daily deer-vehicle collisions (DVCs) imizing the likelihood den fi ed by ( 3) for death times and like- involving roe deer that were documented over a period of 10 lihood (4) for right-censored observations. The nonparametric years (2002–2011) in Bavaria, Germany (Hothorn et al. 2015). Kaplan-Meier estimates in Figure 3 are overlaid with survivor In total, 341,655 DVCs were reported over 3652 days, with daily functions obtained from the Weibull model (log-linear h,left counts 16 ≤ Y ≤ 210. panel) and the distribution-free location-scale model (h being As a benchmark, we tfi ted a location transformation model a polynomial in Bernstein form, right panel). Both models show crossing survivor curves and the more flexible model appears to P(Y > y | day = d,year) have better tfi . However, in the context of a randomized trial, = exp − exp h(y | ϑ ) − β − β − s(d | β) year weekday(d) a test for the null hypothesis of equal survivor curves is more important than model tfi . The likelihood ratio tests lead to a featuring log-hazard ratios for the year (baseline 2002) and day rejection of the null at 5% in either model (p-value = 0.034 for of week (baseline Monday) and a seasonal eeff ct s(d | β) mod- the Weibull model and p-value = 0.011 for the distribution- eled as a superposition of sinusoidal waves of dier ff ent frequen- free model). The bivariate Wald-test, proposed by Burke and cies (this is a simplification of a model discussed in Siegfried and THE AMERICAN STATISTICIAN 7 Hothorn 2020). Two location-scale models expressing P(Y > and explanatory variables, (iii) find an appropriate binary split y | day = d,year) by in the explanatory variable strongest correlated to the scores, (iv) proceed recursively. The novelty here is that location- scale trees pay attention to bivariate location-scale scores exclu- exp − exp exp(s(d | γ ))h(y | ϑ ) − β − β − s(d | β) year weekday(d) sively, instead of the P scores for the transformation parameters were additionally estimated. Because the year does not aeff ct ϑ (as in Hothorn and Zeileis 2021). As in Section 3.1.2,the the scale term, parameters β are interpretable as log-hazard year unconditional model ratios common to all days 1, ... , 365 within a year. In this P(Y ≤ y) = F exp(γ )a(y) ϑ − β , sense, the model is a partial proportional hazards model. As in Section 3.1.2,westudyadistribution-freeversion(h in Bernstein subject to β = γ = 0 form) and a more restrictive Weibull model (log-linear h)which, is tfi ted in each node of the tree by optimizing the likelihood for counts, is appliedtothe greatest integer y less than or equal to the cut-off point y. The correct interval-censored like- lihood (4) for count data was used in the three cases to estimate ϑ = arg max (ϑ,0,1), the unknown model parameters ϑ, γ , β , β ,and β year weekday(d) ϑ i=1 simultaneously (Siegfried and Hothorn 2020)by Augmented subject to ϑ ≤ ϑ , p ∈ 1, ... , P − 1. p p+1 Lagrangian Minimization (Madsen et al. 2004). The bivariate score contributions are den fi ed by Assessing the temporal changes in DVC risk across a year, all three models are displayed on the quantile scale in Figure 4 −1 ∂ ϑ, β, exp(γ ) for a hypothetical Monday in 2002. The curves reveal well- r = ∈ R . known seasonal patterns of increased DVC risk in April, July, ∂(β, γ) and August due to increased animal activity. The plots further ϑ =ϑ,β=0,γ =0 indicate that for the location transformation model large median Permutation tests are then applied to assess the association values are associated with larger dispersion. This is not the between the jth explanatory variable based on a quadratic form case for the other two models, indicating a certain degree of Q(j)×2 collapsing the linear test statistic g (x )r ∈ R , j i i=1 underdispersion. The median annual risk pattern is very similar where g (x ) is a Q(j)-dimensional vector representing the jth j i in all three models, however, the distribution-free location- explanatory variable of the ith subject. The bivariate score allows scale transformation model reveals smaller variance compared the tree to detect location and scale eeff cts, for the model in to the other two models. Figure 5 on the logit scale. A p-value is computed for all j = The partial proportional hazards location-scale transforma- 1, ... , J explanatory variables and thevariablewithminimum tion model further allows investigation of the general trend p-value is selected for splitting. of DVCs over a decade. From the log-hazard ratios β we year The location-scale transformation tree (Figure 5)indicates computed multiple comparisons of hazard ratios comparing that higher orgasm frequencies were in general reported from subsequent years, with multiplicity control. Table 1 is in line higher educated, happier, and younger females. In this subgroup, with an increasing DVC risk from 2002 to 2004, followed by a the coastal south region was associated with a tendency to higher plateau in 2005 and 2006, a further risk increase in 2007, and reported orgasm frequencies compared to the rest of China. then plateauing in the remaining years. 3.3. Transformation Additive Models for Location and 3.2. Location-Scale Transformation Trees Scale Pollet and Nettle (2009) analyzed the self-reported orgasm fre- The head-circumference growth chart obtained from the Dutch quency of 1533 Chinese women with current male partners. growth study (Fredriks et al. 2000) is one of the standard exam- The ordinal outcome Y was reported in terms of ordered cat- ples in the GAMLSS literature. The top panel of Figure 6 shows egories: never < rarely < sometimes < often < always. To the head-circumference quantiles for boys conditional on age assess the eeff ct of explanatory variables on the distribution of obtained from tfi ting a GAMLSS with Box-Cox- t distribution, reported orgasm frequencies, we reanalyze this data using a tree- featuring four model terms μ(age), σ(age), ν(age),and τ(age) structured location-scale transformation model. Explanatory (reproducing Figure 16 in Stasinopoulos and Rigby 2007). In our variables included in the model are: partner income, partner reanalysis, we replace the four parameter Box-Cox-t GAMLSS height, duration of the relationship, respondents age (Rage), with a distribution-free transformation additive model for loca- dieff rence between education and wealth between both partners, tion and scale (TAMLS) featuring a conditional distribution the respondents education (Reducation: no school < primary function school < lower-middle school < upper-middle school < junior −1 college < university), health, happiness (Rhappy: very unhappy P(Y ≤ y | Age = age) = σ(age) a (y) ϑ − μ(age) Bs,6 < not too unhappy < relatively happy < very happy) and place of living (Rregion). for head-circumference Y ∈ R . We apply a modification of the transformation tree induc- In contrast to the GAMLSS, there is no need to assume a tion algorithm by Hothorn and Zeileis (2021)toestimate specicfi parametric distribution in the TAMLS and only two the location-scale transformation tree: (i) Fit an unconditional instead offoursmooth terms have to be estimated. In this transformation model, (ii) assess the correlation of model scores model, the transformation parameters ϑ can be understood as 8 S. SIEGFRIED, L. KOOK, AND T. HOTHORN Figure 4. Partial proportional hazards. Three annual quantile functions (0.25, 0.50, and 0.75th quantile) for DVCs (for a hypothetical Monday in 2002) estimated by three transformation models of increasing complexity. The in-sample log-likelihoods of the corresponding models are given in the panels. nuisance parameters. We employ the Rigby and Stasinopoulos parameters ϑ given μ and σ controlled by the RS algorithm. (RS) algorithm (Rigby and Stasinopoulos 2005)developed for More specicfi ally, for candidate functions μ and σ,the profile GAMLSS to estimate the two smooth terms μ(age) and σ(age) likelihood over ϑ is given by in our TAMLS. For a given likelihood depending on a loca- tion and scale term, this algorithm allows estimation of these two (μ(·), σ(·)) = arg max (ϑ, μ(x ), σ(x )) i i i terms in a structured additive way. We shield the more complex i=1 formulation of our model from the RS algorithm by setting-up a subject to ϑ ≤ ϑ , p ∈ 1, ... , P − 1. p p+1 profile likelihood which, under the hood, estimates the nuisance THE AMERICAN STATISTICIAN 9 Figure 5. Location-scale transformation tree. Female orgasm frequency in heterosexual relationships as a function of questionnaire variables reported by the female respondent. Table 1. Partial proportional hazards. 3.4. Model Selection Year Hazard ratio 95% CI In the following we aim to assess the effect of explanatory 2003–2002 0.66 0.53–0.81 variables on the medical demand by the elderly, that is, number 2004–2003 0.74 0.60–0.91 of physician visits Y = 0, 1, 2, ... forpatients aged 66 orolder, 2005–2004 0.92 0.75–1.13 using a sample from the United States National Medical Expen- 2006–2005 1.12 0.91–1.38 2007–2006 0.58 0.47–0.72 diture Survey conducted in 1987 and 1988 (Deb and Trivedi 2008–2007 0.88 0.72–1.09 1997). 2009–2008 0.99 0.80–1.21 For such applications, location-scale transformation mod- 2010–2009 0.99 0.80–1.21 2011–2010 1.08 0.88–1.32 els (2) are especially attractive for parameter interpretation when linear location and scale terms are considered, and variables NOTE: Estimates and corresponding simultaneous 95% confidence intervals (CI) of multiplicative changes in hazards by year. Hazard ratios smaller one indicate of special interest are present in the location term only (Sec- increasing DVCs when comparing two subsequent years. tion 2.1). If continuous explanatory variables x are present in themodel,aparameteridenticfi ation issue ariseswhich has previously been discussed in the GAMLSS context (Rigby et al. We used log-likelihood contributions (3) in this specific applica- 2019, p. 60). In a location-scale model, tion. Augmented Lagrangian Minimization (Madsen et al. 2004) was used to estimate ϑ given μ(·) and σ(·).Thepenalizedprofile P(Y ≤ y | X = x) = F exp(x γ )h(y | ϑ ) − x β likelihood wasoptimized with respecttothe twofunctions μ(·) and σ(·) in Step 2a(i) of the RS algorithm (Appendix B, Rigby the intercept, which is implicit in the transformation function and Stasinopoulos 2005). The in-sample log-likelihood of the h(y | ϑ ) = h(y | ϑ ) − β ,mustnot be multiplied with four term Box-Cox-t GAMLSS is slightly larger than the one 0 the scale term and an explicit intercept must be added to the of the distribution-free TAMLS, but the conditional quantile location term, changing the model to sheets obtained from thetwo models arevery close andhardly distinguishable for boys older than 2.5 years (Figure 6). ¯ ¯ Models assuming additivity of multiple smooth terms for P(Y ≤ y | X = x) = F exp(x γ )h(y | ϑ ) − β − x β . the location eeff ct μ(x) = m (x) and the scale effect j=1 (5) −2log(σ (x)) = s (x) can be tfi ted by maximizing the l=1 same profile likelihood using the RS algorithm or L boosting The two models are not equivalent, but adding β to h leads to an 2 0 (for GAMLSS, Mayr et al. 2012). In this sense, transformation unidentified parameter when γ is close to zero and omitting β models introduce a novel distribution-free member to the oth- leads to dier ff ent model tfi s when a constant is added to a contin- erwise strictly parametric GAMLSS family. uous explanatory variable (e.g., when defining a suitable baseline 10 S. SIEGFRIED, L. KOOK, AND T. HOTHORN Figure 6. Transformation additive models for location and scale (TAMLS). Conditional quantiles of head circumference along age estimated by the Box-Cox-t GAMLSS (BCT GAMLSS, top panel) and the TAMLS (bottom panel). The former model comprises four and the latter model two smooth terms. Table 2. Model selection. distribution). For discrete parameterizations the expression for ¯ ¯ h simplifies to h(y | ϑ ) = a(y) ϑ with ϑ ≡ 0. For polynomials ML BSS in Bernstein form we have the following expression for h(y | ϑ ): ˆ ˆ Variable Level β γˆ β γˆ Health poor 0.3315 −0.0912 0.3997 — a (y) ϑ = a (y)ϑ excellent −0.3722 0.0417 −0.2214 — Bs,P−1 p p Sex male −0.1585 −0.2669 −0.1251 — p=1 Insurance yes 0.2675 0.2541 0.2800 0.2272 P−1 P Chronic 0.2542 0.1799 0.2729 0.2260 −1 School 0.0234 −0.0175 0.0233 — = (a (y) − a (y))ϑ + P ϑ p P p p p=1 p=1 NOTE: Location and scale parameter estimates, β and γˆ , from applying the two ¯ ¯ estimation procedures, maximum likelihood (ML) or best subset selection (BSS), = h(y | ϑ ) − β to a location-scale transformation model including the following explanatory ¯ ¯ variables: Health (poor < average (baseline) < excellent), Sex (female (baseline), and h(y | ϑ ) dy = 0because a (·) are densities. The model male), Insurance (no (baseline), yes), Chronic (number of chronic conditions) and parameters ϑ, γ,and β can be estimated by maximizing the School (number of years of education). Variables which were dropped when likelihood (aeft r suitable adjustment to the constraints). How- applying best subset selection are indicated by —. ever, for the sake of interpretability we aim to drop variables from the scale term whenever possible and therefore apply the L penalty (detailed in Section 2.4,implemented in pack- Applying the two estimation procedures, maximum age tramvs (Kook 2023)) on γ to the likelihood of model (5). likelihood and best subset selection, to a location-scale THE AMERICAN STATISTICIAN 11 −1 All models discussed here are “distributional” in the sense transformation model (with F = cloglog ) estimating that they formulate a proper distribution function. Via appro- the eeff ct of self-perceived health status (Health), sex (Sex), priate constraints, our software implementation ensures that t- fi insurance coverage (Insurance), and the number of chronic ted models also directly correspond to conditional distribution conditions (Chronic) and years of education of patients (School) functions. This feature allows straightforward parametric boot- on the frequency of physician visits, allows for a head-to-head strap implementations. Alternative suggestions for location- comparison of the parameter estimates (Table 2). In the best scale ordinal regression do not necessarily lead to estimates subset location-scale transformation model the variables Health, which can be interpreted on the probability scale (Cox 1995; Sex, and School are dropped from the scale term allowing to Tutz and Berger 2017, 2020). interpret their eeff cts in terms of (log-)hazard ratios. For the ˆ Algorithmically, we stand on the shoulders of giants, variable Sex, for example, the corresponding exp(−β) = 1.1333 because only minor modifications to well-established algo- can be interpreted as hazard ratio comparing the hazards of male rithms were necessary for enabling parameter estimation. We patients to the hazards of female patients, all other variables didn’t fully explore all possibilities here, and for example being equal. location-scale transformation forests building on Hothorn and Zeileis (2021) or functional gradient boosting for this class 4. Discussion (Hothorn 2020) are interesting algorithms for smooth interac- Tosteson and Begg (1988) introduced the notion of distribution- tion modeling in potentially high-dimensional covariate spaces. free location-scale regression in the context of ROC analysis. While they wereableto estimate a corresponding model for Supplementary Materials ordinal responses, they contemplated that for models (2), “there is, as yet, no software for accommodating continuous test results, The supplementary material for this article includes the following: (A) computational details, (B) a re-analysis of the example from Tosteson & which are common outcomes for laboratory tests” (Tosteson Begg (1988), (C) details on the best subset selection algorithm, and (D) and Begg 1988). With the introduction of a smooth transfor- simulation experiments. mation function and corresponding software implementation in the tram add-on package (Hothorn et al. 2023), we address Acknowledgments this long-standing issue. We derive likelihood and score con- tributions for all response types and discuss suitable inference The authors would like to thank Christoph Blapp and Klaus Steigmiller for procedures for various functional forms. embedding location-scale transformation models into the literature and for In a broader context, we contribute a new distribution-free a proof-of-concept implementation. member to the rich family of location-scale models. The model is unique in the sense that data analysts do not have to com- Author’s contributions mit themselves to a parametric family of distributions before tfi ting the model. The flexibility of our approach comes from SS andTHdeveloped the model andits parameterization. SS wrote the the pair of location and scale terms allowing interpretability of manuscript, analyzed all applications, and performed the simulation study. LK implemented best subset regression for linear location-scale transforma- conditional distributions on various scales, including propor- tion models. TH implemented the likelihood and score function in package tional odds or hazards. Despite the distribution-free nature, we mlt. All authors revised and approved the n fi al version of the manuscript. parameterize the model such that simple maximum-likelihood estimation for all types of responses becomes feasible. There- Disclosure Statement fore, our implementation handles arbitrary responses, including bounded, mixed discrete-continuous, and randomly censored The authors report there are no competing interests to declare. outcomes, in a native way. Among other diverse applications, our flexible approach can help to generalize Weibull location- Funding scale models previously studied as a model for crossing haz- ards using GAMLSS, allows for over- or underdispersion to be SS and TH acknowledge financial support by the Swiss National Science explained by covariates in complex count regression models, Foundation, Grant No. 200021_184603. adds a notion of dispersion to regression trees for complex responses, provides means to reduce the complexity of growth- ORCID curve models, and has important applications in ROC analysis (Sewak and Hothorn 2023). Sandra Siegfried https://orcid.org/0000-0002-7312-1001 Special care with respect to parameter interpretability is Lucas Kook https://orcid.org/0000-0002-7546-7356 Torsten Hothorn https://orcid.org/0000-0001-8301-0471 needed when formulating the model. Parameters in linear loca- tion terms are interpretable as log-odds or log-hazard ratios as long as there is no corresponding scale parameter. Thus, model References selection becomes vitally important should the data analyst be Burke, K., and MacKenzie, G. (2017), “Multi-Parameter Regression Survival interested in direct parameter interpretation. A novel approach Modeling: An Alternative to Proportional Hazards,” Biometrics, 73, 678– to best subset selection was presented and empirically evaluated. 686. DOI:10.1111/biom.12625.[6] Model interpretation is possible on other scales (e.g., probabilis- Burke, K., Eriksson, F., and Pipper, C. B. (2020), “Semiparametric Multipa- tic indices or conditional quantiles), yet constitutes probably the rameter Regression Survival Modeling,” Scandinavian Journal of Statis- biggest challenge of location-scale transformation models. tics, 47, 555–571. DOI:10.1111/sjos.12416.[6] 12 S. SIEGFRIED, L. KOOK, AND T. HOTHORN Burke, K., Jones, M. C., and Noufaily, A. (2020), “A Flexible Para- McCullagh, P. (1980), “Regression Models for Ordinal Data,” metric Modelling Framework for Survival Analysis,” Journal of the Journalofthe RoyalStatistical Society, Series B, 42, 109–127. Royal Statistical Society, Series C, 69, 429–457. DOI:10.1111/rssc.12398. DOI:10.1111/j.2517-6161.1980.tb01109.x.[1] [6] McLain, A. C., and Ghosh, S. K. (2013), “Efficient Sieve Maximum Likeli- Cox, C. (1995), “Location-Scale Cumulative Odds Models for Ordinal Data: hood Estimation of Time-Transformation Models,” Journal of Statistical A General Non-linear Model Approach,” Statistics in Medicine, 14, 1191– Theory and Practice, 7, 285–303. DOI:10.1080/15598608.2013.772835. 1203. DOI:10.1002/sim.4780141105.[11] [3] Deb, P., and Trivedi, P. K. (1997), “Demand for Medical Care by Peng, D., MacKenzie, G., and Burke, K. (2020), “A Multiparameter Regres- the Elderly: A Finite Mixture Approach,” Journal of Applied sion Model for Interval-Censored Survival Data” Statistics in Medicine, Econometrics, 12, 313–336. DOI:10.1002/(sici)1099-1255(199705) 39, 1903–1918. DOI:10.1002/sim.8508.[6] 12:3<313::aid-jae440>3.0.co;2-g.[9] Peterson, B., and Harrell, F. E. (1990), “Partial Proportional Odds Models Farouki, R. T. (2012), “The Bernstein Polynomial Basis: A Centen- for Ordinal Response Variables,” Journalof the RoyalStatistical Society, nial Retrospective,” Computer Aided Geometric Design, 29, 379–419. Series C, 39, 205–217. DOI:10.2307/2347760.[1] DOI:10.1016/j.cagd.2012.03.001.[3] Pollet, T. V., and Nettle, D. (2009), “Partner Wealth Predicts Fredriks,A.M., vanBuuren, S.,Burgmeijer, R. J. F.,Meulmeester,J. F., Self-Reported Orgasm Frequency in a Sample of Chinese Beuker, R.J., Brugman, E.,Roede, M.J., Verloove-Vanhorick,S.P., Women,” Evolution and Human Behavior, 30, 146–151. and Wit, J. (2000), “Continuing Positive Secular Growth Change DOI:10.1016/j.evolhumbehav.2008.11.002.[7] in The Netherlands 1955–1997,” Pediatric Research, 47, 316–323. Rigby, R., Stasinopoulos, D. M., Heller, G., and De Bastiani, F. (2019), DOI:10.1203/00006450-200003000-00006.[7] Distributions for Modeling Location, Scale, and Shape: Using Haslinger, C., Korte, W., Hothorn, T., Brun, R., Greenberg, C., and Zim- GAMLSS in R, Boca Raton, FL: Chapman & Hall/CRC Press. mermann, R. (2020), “The Impact of Prepartum Factor XIII Activity DOI:10.1201/9780429298547.[9] on Postpartum Blood Loss,” Journal of Thrombosis and Haemostasis, 18, Rigby, R. A., and Stasinopoulos, D. M. (2005), “Generalized Additive Mod- 1310–1319. DOI:10.1111/jth.14795.[5] els for Location, Scale and Shape,” Journal of the Royal Statistical Society, Hothorn, T. (2020) , “Transformation Boosting Machines,” Statistics and Series C, 54, 507–554. DOI:10.1111/j.1467-9876.2005.00510.x.[1,6,8,9] Computing, 30, 141–152. DOI:10.1007/s11222-019-09870-4.[11] Schein, P. S., and Gastrointestinal Tumor Study Group. (1982), “A Compar- Hothorn, T., and Zeileis, A. (2021), “Predictive Distribution Mod- ison of Combination Chemotherapy and Combined Modality Therapy elling Using Transformation Forests,” Journal of Computational and for Locally Advanced Gastric Carcinoma,” Cancer, 49, 1771–1777. Graphical Statistics, 30, 144–148. DOI:10.1080/10618600.2021.1872581. DOI:10.1002/1097-0142(19820501)49:9<1771::aid-cncr2820490907> [7,11] 3.0.co;2-m.[5] Hothorn, T., Hornik, K., van de Wiel, M. A., and Zeileis, A. (2006), “A Lego Sewak, A., and Hothorn, T. (2023), “Estimating Transformations System for Conditional Inference,” The American Statistician, 60, 257– for Evaluating Diagnostic Tests with Covariate Adjustment,” 263. DOI:10.1198/000313006x118430.[6] Statistical Methods in Medical Research. Accepted for publication. Hothorn, T., Müller, J., Held, L., Möst, L., and Mysterud, A. (2015), DOI:10.1177/09622802231176030. [2,11] “Temporal PatternsofDeer-Vehicle CollisionsConsistent with Siegfried, S., and Hothorn, T. (2020), “Count Transformation Deer Activity Pattern and Density Increase but not General Models,” Methods in Ecology and Evolution, 11, 818–827. Accident Risk,” Accident Analysis & Prevention, 81, 143–152. DOI:10.1111/2041-210x.13383.[3,7] DOI:10.1016/j.aap.2015.04.037.[6] Stasinopoulos, D. M., and Rigby, R. A. (2007), “Generalized Additive Mod- Hothorn, T., Möst, L., and Bühlmann, P. (2018), “Most Likely els for Location Scale and Shape (GAMLSS) in R,” Journal of Statistical Transformations,” Scandinavian Journal of Statistics, 45, 110–134. Sow ft are , 23, 1–46. DOI:10.18637/jss.v023.i07.[1,7] DOI:10.1111/sjos.12291.[2,3,4] Thas, O., De Neve, J., Clement, L., and Ottoy, J.-P. (2012), “Probabilistic Hothorn, T., Barbanti, L., and Siegfried, S. (2023), tram:Transformation Index Models,” Journalofthe RoyalStatistical Society, Series B, 74, 623– Models. R package version 0.8-3, https://CRAN.R-project.org/package= 671. DOI:10.1111/j.1467-9868.2011.01020.x.[2] tram.[5,11] Tosteson, A. N. A., and Begg, C. B. (1988), “A General Regression Method- Kneib, T., Silbersdor,ff A., and Säfken, B. (2023), “Rage against the Mean ology for ROC Curve Estimation,” Medical Decision Making, 8, 204–215. – A Review of Distributional Regression Approaches,” Econometrics and DOI:10.1177/0272989x8800800309.[1,11] Statistics, 26, 99–123. DOI:10.1016/j.ecosta.2021.07.006.[1] Tutz, G., and Berger, M. (2017), “Separating Location and Dispersion in Kook, L. (2023), tramvs: Optimal Subset Selection for Transformation Mod- Ordinal Regression Models,” Econometrics and Statistics, 2, 131–148. els. R package version 0.0-4, available at https://CRAN.R-project.org/ DOI:10.1016/j.ecosta.2016.10.002.[11] package=tramvs.[10] (2020), “Non Proportional Odds Models are Widely Dispensable Lepage, Y. (1971), “A Combination of Wilcoxon’s and Ansari-Bradley’s – Sparser Modeling based on Parametric and Additive Location-Shift Statistics,” Biometrika, 58, 213–217. DOI:10.2307/2334333.[1] Approaches,” arXiv 2006.03914, arXiv.org E-Print Archive. [11] Madsen, K., Nielsen, H. B., and Tingleff, O. (2004), Optimization with Zeng, D., and Lin, D. Y. (2007), “Maximum Likelihood Estimation Constraints (2nd ed.), Technical University of Denmark. Available at in Semiparametric Regression Models with Censored Data,” http://www2.imm.dtu.dk/pubdb/p.php?4213.[5,7,9] Journalofthe RoyalStatistical Society, Series B, 69, 507–564. Mayr, A., Fenske, N., Hofner, B., Kneib, T., and Schmid, M. (2012), DOI:10.1111/j.1369-7412.2007.00606.x.[6] “Generalized Additive Models for Location, Scale and Shape for Zhu, J., Wen, C., Zhu, J., Zhang, H., and Wang, X. (2020), “A High Dimensional Data – A Flexible Approach based on Boost- Polynomial Algorithm for Best-Subset Selection Problem,” ing,” Journalofthe RoyalStatistical Society, Series C, 61, 403–427. Proceedings of the National Academy of Sciences, 117, 33117–33123. DOI:10.1111/j.1467-9876.2011.01033.x.[9] DOI:10.1073/pnas.2014241117.[4]
The American Statistician – Taylor & Francis
Published: Oct 2, 2023
Keywords: Additive models; Conditional distribution function; Model selection; Regression trees; Smoothing; Transformation models
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.