Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

Adaptive Regression by Mixing

Adaptive Regression by Mixing Adaptation over different procedures is of practical importance. Different procedures perform well under different conditions. In many practical situations, it is rather hard to assess which conditions are (approximately) satisfied so as to identify the best procedure for the data at hand. Thus automatic adaptation over various scenarios is desirable. A practically feasible method, named adaptive regression by mixing (ARM), is proposed to convexly combine general candidate regression procedures. Under mild conditions, the resulting estimator is theoretically shown to perform optimally in rates of convergence without knowing which of the original procedures work the best. Simulations are conducted in several settings, including comparing a parametric model with nonparametric alternatives, comparing a neural network with a projection pursuit in multidimensional regression, and combining bandwidths in kernel regression. The results clearly support the theoretical property of ARM. The ARM algorithm assigns weights on the candidate models–procedures via proper assessment of performance of the estimators. The data are split into two parts, one for estimation and the other for measuring behavior in prediction. Although there are many plausible ways to assign the weights, ARM has a connection with information theory, which ensures the desired adaptation capability. Indeed, under mild conditions, we show that the squared L2 risk of the estimator based on ARM is basically bounded above by the risk of each candidate procedure plus a small penalty term of order 1/n. Minimizing over the procedures gives the automatically optimal rate of convergence for ARM. Model selection often induces unnecessarily large variability in estimation. Alternatively, a proper weighting of the candidate models can be more stable, resulting in a smaller risk. Simulations suggest that ARM works better than model selection using Akaike or Bayesian information criteria when the error variance is not very small. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of the American Statistical Association Taylor & Francis

Adaptive Regression by Mixing

Journal of the American Statistical Association , Volume 96 (454): 15 – Jun 1, 2001
15 pages

Loading next page...
 
/lp/taylor-francis/adaptive-regression-by-mixing-FyDf00k9fI

References (47)

Publisher
Taylor & Francis
Copyright
© American Statistical Association
ISSN
1537-274X
eISSN
0162-1459
DOI
10.1198/016214501753168262
Publisher site
See Article on Publisher Site

Abstract

Adaptation over different procedures is of practical importance. Different procedures perform well under different conditions. In many practical situations, it is rather hard to assess which conditions are (approximately) satisfied so as to identify the best procedure for the data at hand. Thus automatic adaptation over various scenarios is desirable. A practically feasible method, named adaptive regression by mixing (ARM), is proposed to convexly combine general candidate regression procedures. Under mild conditions, the resulting estimator is theoretically shown to perform optimally in rates of convergence without knowing which of the original procedures work the best. Simulations are conducted in several settings, including comparing a parametric model with nonparametric alternatives, comparing a neural network with a projection pursuit in multidimensional regression, and combining bandwidths in kernel regression. The results clearly support the theoretical property of ARM. The ARM algorithm assigns weights on the candidate models–procedures via proper assessment of performance of the estimators. The data are split into two parts, one for estimation and the other for measuring behavior in prediction. Although there are many plausible ways to assign the weights, ARM has a connection with information theory, which ensures the desired adaptation capability. Indeed, under mild conditions, we show that the squared L2 risk of the estimator based on ARM is basically bounded above by the risk of each candidate procedure plus a small penalty term of order 1/n. Minimizing over the procedures gives the automatically optimal rate of convergence for ARM. Model selection often induces unnecessarily large variability in estimation. Alternatively, a proper weighting of the candidate models can be more stable, resulting in a smaller risk. Simulations suggest that ARM works better than model selection using Akaike or Bayesian information criteria when the error variance is not very small.

Journal

Journal of the American Statistical AssociationTaylor & Francis

Published: Jun 1, 2001

Keywords: Adaptive estimation; Combining procedures; Nonparametric regression

There are no references for this article.