×

Penalising model component complexity: a principled, practical approach to constructing priors. (English) Zbl 1442.62060

Summary: In this paper, we introduce a new concept for constructing prior distributions. We exploit the natural nested structure inherent to many model components, which defines the model component to be a flexible extension of a base model. Proper priors are defined to penalise the complexity induced by deviating from the simpler base model and are formulated after the input of a user-defined scaling parameter for that model component, both in the univariate and the multivariate case. These priors are invariant to reparameterisations, have a natural connection to Jeffreys’ priors, are designed to support Occam’s razor and seem to have excellent robustness properties, all which are highly desirable and allow us to use this approach to define default prior distributions. Through examples and theoretical results, we demonstrate the appropriateness of this approach and how it can be applied in various situations.

MSC:

62F15 Bayesian inference
62P10 Applications of statistics to biology and medical sciences; meta analysis

References:

[1] Aitchison, J. (2003). The Statistical Analysis of Compositional Data. The Blackburn Press, Caldwell, NJ. · Zbl 0688.62004
[2] Barnard, J., McCulloch, R. and Meng, X.-L. (2000). Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Statist. Sinica10 1281-1311. · Zbl 0980.62045
[3] Bayarri, M. J. and García-Donato, G. (2008). Generalization of Jeffreys divergence-based priors for Bayesian hypothesis testing. J. R. Stat. Soc. Ser. B Stat. Methodol.70 981-1003. · Zbl 1411.62042 · doi:10.1111/j.1467-9868.2008.00667.x
[4] Berger, J. (2006). The case for objective Bayesian analysis. Bayesian Anal.1 385-402. · Zbl 1331.62043 · doi:10.1214/06-BA115
[5] Berger, J. O., Bernardo, J. M. and Sun, D. (2009). The formal definition of reference priors. Ann. Statist.37 905-938. · Zbl 1162.62013 · doi:10.1214/07-AOS587
[6] Berger, J. O., Bernardo, J. M. and Sun, D. (2015). Overall objective priors. Bayesian Anal.10 189-221. · Zbl 1335.62040 · doi:10.1214/14-BA915
[7] Bernardinelli, L., Clayton, D. and Montomoli, C. (1995). Bayesian estimates of disease maps: How important are priors? Stat. Med.14 2411-2431.
[8] Bernardo, J.-M. (1979). Reference posterior distributions for Bayesian inference. J. Roy. Statist. Soc. Ser. B41 113-147. · Zbl 0428.62004
[9] Bernardo, J. M. (2011). Integrated objective Bayesian estimation and hypothesis testing. In Bayesian Statistics 9 1-68. Oxford Univ. Press, Oxford.
[10] Besag, J., York, J. and Mollié, A. (1991). Bayesian image restoration, with two applications in spatial statistics. Ann. Inst. Statist. Math.43 1-59. · Zbl 0760.62029 · doi:10.1007/BF00116466
[11] Bhattacharya, A., Pati, D., Pillai, N. S. and Dunson, D. B. (2012). Bayesian shrinkage. Preprint. Available at arXiv:1212.6088. · Zbl 1373.62368
[12] Bochkina, N. A. and Green, P. J. (2014). The Bernstein-von Mises theorem and nonregular models. Ann. Statist.42 1850-1878. · Zbl 1305.62112 · doi:10.1214/14-AOS1239
[13] Browne, W. J. and Draper, D. (2006). A comparison of Bayesian and likelihood-based methods for fitting multilevel models. Bayesian Anal.1 473-513 (electronic). · Zbl 1331.62126 · doi:10.1214/06-BA117
[14] Byrne, S. and Girolami, M. (2013). Geodesic Monte Carlo on embedded manifolds. Scand. J. Stat.40 825-845. · Zbl 1349.62186 · doi:10.1111/sjos.12036
[15] Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika97 465-480. · Zbl 1406.62021
[16] Castillo, I., Schmidt-Hieber, J. and van der Vaart, A. W. (2014). Bayesian linear regression with sparse priors. Preprint. Available at arXiv:1403.0735. · Zbl 1486.62197
[17] Castillo, I. and van der Vaart, A. (2012). Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Statist.40 2069-2101. · Zbl 1257.62025 · doi:10.1214/12-AOS1029
[18] Chib, S. and Greenberg, E. (1998). Analysis of multivariate probit models. Biometrika85 347-361. · Zbl 0938.62020 · doi:10.1093/biomet/85.2.347
[19] Consonni, G. and Veronese, P. (2008). Compatibility of prior specifications across linear models. Statist. Sci.23 332-353. · Zbl 1329.62331 · doi:10.1214/08-STS258
[20] Cui, Y., Hodges, J. S., Kong, X. and Carlin, B. P. (2010). Partitioning degrees of freedom in hierarchical and other richly parameterized models. Technometrics52 124-136. · doi:10.1198/TECH.2009.08161
[21] Dean, C. B., Ugarte, M. D. and Militino, A. F. (2001). Detecting interaction between random region and fixed age effects in disease mapping. Biometrics57 197-202. · Zbl 1209.62274 · doi:10.1111/j.0006-341X.2001.00197.x
[22] Draper, D. (2006). Coherence and calibration: Comments on subjectivity and “objectivity” in Bayesian analysis (comment on articles by Berger and by Goldstein). Bayesian Anal.1 423-427 (electronic). · Zbl 1331.62045 · doi:10.1214/06-BA116B
[23] Erisman, A. M. and Tinney, W. F. (1975). On computing certain elements of the inverse of a sparse matrix. Commun. ACM18 177-179. · Zbl 0296.65012 · doi:10.1145/360680.360704
[24] Evans, M. and Jang, G. H. (2011). Weak informativity and the information in one prior relative to another. Statist. Sci.26 423-439. · Zbl 1246.62007 · doi:10.1214/11-STS357
[25] Fong, Y., Rue, H. and Wakefield, J. (2010). Bayesian inference for generalized linear mixed models. Biostat.11 397-412. · Zbl 1437.62460
[26] Frühwirth-Schnatter, S. and Wagner, H. (2010). Stochastic model specification search for Gaussian and partial non-Gaussian state space models. J. Econometrics154 85-100. · Zbl 1431.62373 · doi:10.1016/j.jeconom.2009.07.003
[27] Frühwirth-Schnatter, S. and Wagner, H. (2011). Bayesian variable selection for random intercept modeling of Gaussian and non-Gaussian data. In Bayesian Statistics 9 (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.) 165-200. Oxford Univ. Press, Oxford.
[28] Fuglstad, G.-A., Simpson, D., Lindgren, F. and Rue, H. (2015). Interpretable priors for hyperparameters for Gaussian random fields. Preprint. Available at arXiv:1503.00256. · Zbl 1480.62194
[29] Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal.1 515-533 (electronic). · Zbl 1331.62139 · doi:10.1214/06-BA117A
[30] Gelman, A., Jakulin, A., Pittau, M. G. and Su, Y.-S. (2008). A weakly informative default prior distribution for logistic and other regression models. Ann. Appl. Stat.2 1360-1383. · Zbl 1156.62017 · doi:10.1214/08-AOAS191
[31] Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. and Rubin, D. B. (2013). Bayesian Data Analysis. CRC Press, London. · Zbl 1279.62004
[32] Genest, C., Weerahandi, S. and Zidek, J. V. (1984). Aggregating opinions through logarithmic pooling. Theory and Decision17 61-70. · Zbl 0541.90002 · doi:10.1007/BF00140056
[33] George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc.88 881-889.
[34] Geweke, J. (2006). Bayesian treatment of the independent Student-\(t\) linear model. J. Appl. Econometrics8 S19-S40.
[35] Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist.28 500-531. · Zbl 1105.62315 · doi:10.1214/aos/1016218228
[36] Ghosh, M. (2011). Objective priors: An introduction for frequentists. Statist. Sci.26 187-202. · Zbl 1246.62045 · doi:10.1214/10-STS338
[37] Ghosh, J., Li, Y. and Mitra, R. (2015). On the use of Cauchy prior distributions for Bayesian logistic regression. Preprint. Available at arXiv:1507.07170. · Zbl 1407.62276
[38] Goldstein, M. (2006). Subjective Bayesian analysis: Principles and practice. Bayesian Anal.1 403-420 (electronic). · Zbl 1331.62047 · doi:10.1214/06-BA116
[39] Guo, J., Rue, H. and Riebler, A. (2015). Bayesian bivariate meta-analysis of diagnostic test studies with interpretable priors. Preprint. Available at arXiv:1512.06217.
[40] Gustafson, P. (2005). On model expansion, model contraction, identifiability and prior information: Two illustrative scenarios involving mismeasured variables. Statist. Sci.20 111-140. · Zbl 1087.62037 · doi:10.1214/088342305000000098
[41] Hastie, T. and Tibshirani, R. (1987). Generalized additive models: Some applications. J. Amer. Statist. Assoc.82 371-386. · Zbl 0633.62067 · doi:10.1080/01621459.1987.10478440
[42] Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Monographs on Statistics and Applied Probability43. Chapman & Hall, London. · Zbl 0747.62061
[43] He, Y. and Hodges, J. S. (2008). Point estimates for variance-structure parameters in Bayesian analysis of hierarchical models. Comput. Statist. Data Anal.52 2560-2577. · Zbl 1452.62222
[44] He, Y., Hodges, J. S. and Carlin, B. P. (2007). Re-considering the variance parameterization in multiple precision models. Bayesian Anal.2 529-556. · Zbl 1331.62141 · doi:10.1214/07-BA221
[45] Henderson, R., Shimakura, S. and Gorst, D. (2002). Modeling spatial variation in leukemia survival data. J. Amer. Statist. Assoc.97 965-972. · Zbl 1048.62102 · doi:10.1198/016214502388618753
[46] Hodges, J. S. (2014). Richly Parameterized Linear Models: Additive, Time Series, and Spatial Models Using Random Effects. CRC Press, Boca Raton, FL. · Zbl 1282.62197
[47] Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Statist.33 730-773. · Zbl 1068.62079 · doi:10.1214/009053604000001147
[48] James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I 361-379. Univ. California Press, Berkeley, CA. · Zbl 1281.62026
[49] Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proc. Roy. Soc. London Ser. A.186 453-461. · Zbl 0063.03050 · doi:10.1098/rspa.1946.0056
[50] Jeffreys, H. (1961). Theory of Probability, 3rd ed. Clarendon Press, Oxford. · Zbl 0116.34904
[51] Johnson, V. E. and Rossell, D. (2010). On the use of non-local prior densities in Bayesian hypothesis tests. J. R. Stat. Soc. Ser. B Stat. Methodol.72 143-170. · Zbl 1411.62019
[52] Jones, M. C. and Pewsey, A. (2009). Sinh-arcsinh distributions. Biometrika96 761-780. · Zbl 1183.62019 · doi:10.1093/biomet/asp053
[53] Kamary, K. and Robert, C. P. (2014). Reflecting about selecting noninformative priors. Int. J. Appl. Comput. Math.3.
[54] Kamary, K., Mengersen, K., Robert, C. P. and Rousseau, J. (2014). Testing hypotheses via a mixture estimation model. Preprint. Available at arXiv:1412.2044.
[55] Kass, R. E. and Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. J. Amer. Statist. Assoc.90 928-934. · Zbl 0851.62020 · doi:10.1080/01621459.1995.10476592
[56] Kass, R. and Wasserman, L. (1996). The selection of prior distributions by formal rules. J. Amer. Statist. Assoc.91 1343-1370. · Zbl 0884.62007 · doi:10.1080/01621459.1996.10477003
[57] Kennedy, M. C. and O’Hagan, A. (2001). Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B Stat. Methodol.63 425-464. · Zbl 1007.62021 · doi:10.1111/1467-9868.00294
[58] Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. Ann. Math. Stat.22 79-86. · Zbl 0042.38403 · doi:10.1214/aoms/1177729694
[59] Lawson, A. B. (2006). Statistical Methods in Spatial Epidemiology, 2nd ed. Wiley, Chichester. · Zbl 1096.62118
[60] Lawson, A. B. (2009). Bayesian Disease Mapping: Hierarchical Modeling in Spatial Epidemiology, 2nd ed. CRC Press, Boca Raton, FL. · Zbl 1165.62083
[61] Le Cam, L. (1990). Maximum likelihood: An introduction. Int. Stat. Rev. 153-171. · Zbl 0715.62045 · doi:10.2307/1403464
[62] Lee, J. M. (2003). Smooth Manifolds. Springer, Berlin.
[63] Lid Hjort, N., Holmes, C., Müller, P. and Walker, S. G., eds. (2010). Bayesian Nonparametrics. Cambridge Series in Statistical and Probabilistic Mathematics28. Cambridge Univ. Press, Cambridge.
[64] Lindgren, F. and Rue, H. (2008). On the second-order random walk model for irregular locations. Scand. J. Stat.35 691-700. · Zbl 1199.60276 · doi:10.1111/j.1467-9469.2008.00610.x
[65] Lindgren, F., Rue, H. and Lindström, J. (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach. J. R. Stat. Soc. Ser. B Stat. Methodol.73 423-498. · Zbl 1274.62360
[66] Lindley, D. V. (1983). Empirical Bayes inference: Theory and applications: Comment. J. Amer. Statist. Assoc.78 61-62. · Zbl 0495.62009 · doi:10.1080/01621459.1981.10477731
[67] Liu, C. (2001). Baysian analysis of multivariate probit models—Discussion on the art of data augmentation by Van Dyk and Meng. J. Comput. Graph. Statist.10 75-81.
[68] Lu, H., Hodges, J. S. and Carlin, B. P. (2007). Measuring the complexity of generalized linear hierarchical models. Canad. J. Statist.35 69-87. · Zbl 1219.62114 · doi:10.1002/cjs.5550350108
[69] Lunn, D., Spiegelhalter, D., Thomas, A. and Best, N. (2009). The BUGS project: Evolution, critique and future directions. Stat. Med.28 3049-3067.
[70] Martins, T. G., Simpson, D., Lindgren, F. and Rue, H. (2013). Bayesian computing with INLA: New features. Comput. Statist. Data Anal.67 68-83. · Zbl 1471.62135
[71] Muff, S., Riebler, A., Held, L., Rue, H. and Saner, P. (2015). Bayesian analysis of measurement error models using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. C. Appl. Stat.64 231-252. · Zbl 07945058
[72] Natário, I. and Knorr-Held, L. (2003). Non-parametric ecological regression and spatial variation. Biom. J.45 670-688. · Zbl 1441.62446
[73] O’Hagan, A. and Pericchi, L. (2012). Bayesian heavy-tailed models and conflict resolution: A review. Braz. J. Probab. Stat.26 372-401. · Zbl 1319.62064 · doi:10.1214/11-BJPS164
[74] Palacios, M. B. and Steel, M. F. J. (2006). Non-Gaussian Bayesian geostatistical modeling. J. Amer. Statist. Assoc.101 604-618. · Zbl 1119.62321 · doi:10.1198/016214505000001195
[75] Park, T. and Casella, G. (2008). The Bayesian lasso. J. Amer. Statist. Assoc.103 681-686. · Zbl 1330.62292 · doi:10.1198/016214508000000337
[76] Pati, A. B. D., Pillai, N. S. and Dunson, D. B. (2014). Dirichlet-Laplace priors for optimal shrinkage. Preprint. Available at arXiv:1401.5398. · Zbl 1373.62368
[77] Piironen, J. and Vehtari, A. (2015). Projection predictive variable selection using Stan \(+\) R. Preprint. Available at arXiv:1508.02502. · Zbl 1505.62321
[78] Polson, N. G. and Scott, J. G. (2012). On the half-Cauchy prior for a global scale parameter. Bayesian Anal.7 887-902. · Zbl 1330.62148 · doi:10.1214/12-BA730
[79] Rapisarda, F., Brigo, D. and Mercurio, F. (2007). Parameterizing correlations: A geometric interpretation. IMA J. Manag. Math.18 55-73. · Zbl 1123.62041
[80] Reich, B. J. and Hodges, J. S. (2008). Modeling longitudinal spatial periodontal data: A spatially adaptive model with tools for specifying priors and checking fit. Biometrics64 790-799. · Zbl 1170.62399 · doi:10.1111/j.1541-0420.2007.00956.x
[81] Reid, N., Mukerjee, R. and Fraser, D. A. S. (2003). Some aspects of matching priors. In Mathematical Statistics and Applications: Festschrift for Constance van Eeden. Institute of Mathematical Statistics Lecture Notes—Monograph Series42 31-43. IMS, Beachwood, OH.
[82] Riebler, A., Sørbye, S. H., Simpson, D. and Rue, H. (2016). An intuitive Bayesian spatial model for disease mapping that accounts for scaling. Stat. Methods Med. Res.25 1145-1165.
[83] Robert, C. P. (2007). The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation, 2nd ed. Springer, New York. · Zbl 1129.62003
[84] Robert, C. P., Chopin, N. and Rousseau, J. (2009). Harold Jeffreys’s theory of probability revisited. Statist. Sci.24 141-172. · Zbl 1328.62012 · doi:10.1214/09-STS284
[85] Roos, M. and Held, L. (2011). Sensitivity analysis in Bayesian generalized linear mixed models for binary data. Bayesian Anal.6 259-278. · Zbl 1330.62150 · doi:10.1214/11-BA609
[86] Roos, M., Martins, T. G., Held, L. and Rue, H. (2015). Sensitivity analysis for Bayesian hierarchical models. Bayesian Anal.10 321-349. · Zbl 1335.62059 · doi:10.1214/14-BA909
[87] Rousseau, J. (2015). Comment on article by Berger, Bernardo, and Sun [MR3420902]. Bayesian Anal.10 233-236. · Zbl 1335.62060 · doi:10.1214/14-BA937
[88] Rousseau, J. and Robert, C. P. (2011). On moment priors for Bayesian model choice: a discussion. Discussion of “Moment priors for Bayesian model choice with applications to directed acyclic graphs” by G. Consonni and L. La Rocca. In Bayesian Statistics 9 136-137. Oxford University Press, Oxford.
[89] Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist.12 1151-1172. · Zbl 0555.62010 · doi:10.1214/aos/1176346785
[90] Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications. Monographs on Statistics and Applied Probability104. Chapman & Hall, Boca Raton, FL. · Zbl 1093.60003
[91] Rue, H. and Martino, S. (2007). Approximate Bayesian inference for hierarchical Gaussian Markov random field models. J. Statist. Plann. Inference137 3177-3192. · Zbl 1114.62025 · doi:10.1016/j.jspi.2006.07.016
[92] Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B Stat. Methodol.71 319-392. · Zbl 1248.62156 · doi:10.1111/j.1467-9868.2008.00700.x
[93] Self, S. G. and Liang, K.-Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Amer. Statist. Assoc.82 605-610. · Zbl 0639.62020 · doi:10.1080/01621459.1987.10478472
[94] Simpson, D., Rue, H., Riebler, A., Martins, T. G. and Sørbye, S. H. (2016). Supplement to “Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors.” DOI:10.1214/16-STS576SUPP. · Zbl 1442.62060
[95] Sørbye, S. H. and Rue, H. (2011). Simultaneous credible bands for latent Gaussian models. Scand. J. Stat.38 712-725. · Zbl 1246.62067 · doi:10.1111/j.1467-9469.2011.00741.x
[96] Sørbye, S. H. and Rue, H. (2014). Scaling intrinsic Gaussian Markov random field priors in spatial modelling. Spat. Stat.8 39-51. · doi:10.1016/j.spasta.2013.06.004
[97] Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist.9 1135-1151. · Zbl 0476.62035 · doi:10.1214/aos/1176345632
[98] Talhouk, A., Doucet, A. and Murphy, K. (2012). Efficient Bayesian inference for multivariate probit models with sparse inverse correlation matrices. J. Comput. Graph. Statist.21 739-757.
[99] van der Pas, S. L., Kleijn, B. J. K. and van der Vaart, A. W. (2014). The horseshoe estimator: Posterior concentration around nearly black vectors. Electron. J. Stat.8 2585-2618. · Zbl 1309.62060 · doi:10.1214/14-EJS962
[100] Wakefield, J. (2007). Disease mapping and spatial regression with count data. Biostat.8 158-183. · Zbl 1213.62178 · doi:10.1093/biostatistics/kxl008
[101] Wakefield, J. C., Best, N. G. and Waller, L. A. (2000). Bayesian approaches to disease mapping. In Spatial Epidemiology: Methods and Applications (P. Elliot, J. C. Wakefield, N. G. Best and D. J. Briggs, eds.) 104-107. Oxford Univ. Press, Oxford.
[102] Wakefield, J. and Lyons, H. (2010). Spatial aggregation and the ecological fallacy. In Handbook of Spatial Statistics 541-558. CRC Press, Boca Raton, FL.
[103] Waller, L. and Carlin, B. (2010). Disease mapping. In Handbook of Spatial Statistics (A. E. Gelfand, P. J. Diggle, M. Fuentes and P. Guttorp, eds.). Handbooks for Modern Statistical Methods 14 217-243. Chapman & Hall/CRC, London.
[104] Watanabe, S. (2009). Algebraic Geometry and Statistical Learning Theory. Cambridge Monographs on Applied and Computational Mathematics25. Cambridge Univ. Press, Cambridge. · Zbl 1180.93108
[105] Wood, S. and Kohn, R. (1998). A Bayesian approach to robust binary nonparametric regression. J. Amer. Statist. Assoc.93 203-213. · Zbl 0906.62037 · doi:10.1080/01621459.1998.10474102
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.
  翻译: