STATS 790 bibliography

Abbott, M. C., & Machta, B. B. (2022). Far from Asymptopia (No. arXiv:2205.03343). arXiv. https://arxiv.org/abs/arXiv:2205.03343

Abramovich, F., Sapatinas, T., & Silverman, B. W. (1998). Wavelet thresholding via a Bayesian approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(4), 725–749. https://doi.org/10.1111/1467-9868.00151

Adar, E. (2015). On the value of command-line “bullshittery.” In Medium. https://medium.com/@eytanadar/on-the-value-of-command-line-bullshittery-94dc19ec8c61

Agrawal, R., Huggins, J. H., Trippe, B., & Broderick, T. (2019). The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions. arXiv:1905.06501 [Cs, Stat]. http://arxiv.org/abs/1905.06501

Alam, M. A., & Fukumizu, K. (2014). Hyperparameter selection in kernel principal component analysis. Journal of Computer Science, 10(7), 1139.

Anderson, E. (1935). The irises of the Gaspe peninsula. Bull. Am. Iris Soc., 59, 2–5.

Anderson, E. (1936). The Species Problem in Iris. Annals of the Missouri Botanical Garden, 23(3), 457–509. https://doi.org/10.2307/2394164

Atlas. (2013). QR factorization for ridge regression. In Mathematics Stack Exchange. https://math.stackexchange.com/questions/299481/qr-factorization-for-ridge-regression

Banerjee, S. (2017). High-Dimensional Bayesian Geostatistics. Bayesian Analysis, 12(2), 583–614. https://doi.org/10.1214/17-BA1056R

Banerjee, S., & Gelfand, A. E. (2003). On smoothness properties of spatial processes. Journal of Multivariate Analysis, 84(1), 85–100. https://doi.org/10.1016/S0047-259X(02)00016-7

Barber, R. F., Candès, E. J., Ramdas, A., & Tibshirani, R. J. (2021). Predictive inference with the jackknife+. The Annals of Statistics, 49(1), 486–507. https://doi.org/10.1214/20-AOS1965

Barber, S., & Nason, G. P. (2004). Real nonparametric regression using complex wavelets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66(4), 927–939. https://doi.org/10.1111/j.1467-9868.2004.B5604.x

Barillec, R., Ingram, B., Cornford, D., & Csató, L. (2011). Projected sequential Gaussian processes: A C++ tool for interpolation of large datasets with heterogeneous noise. Computers & Geosciences, 37(3), 295–309. https://doi.org/10.1016/j.cageo.2010.05.008

Bates, S., Hastie, T., & Tibshirani, R. (n.d.). Cross-validation: What does it estimate and how well does it do it? 36.

Bezdek, J. C., Keller, J. M., Krishnapuram, R., Kuncheva, L. I., & Pal, N. R. (1999). Will the real iris data please stand up? IEEE Transactions on Fuzzy Systems, 7(3), 368–369. https://doi.org/10.1109/91.771092

Bien, J., Taylor, J., & Tibshirani, R. (2013). A lasso for hierarchical interactions. The Annals of Statistics, 41(3), 1111–1141. https://doi.org/10.1214/13-AOS1096

Blanchet, F. G., Legendre, P., & Borcard, D. (2008). Forward Selection of Explanatory Variables. Ecology, 89(9), 2623–2632. https://doi.org/10.1890/07-0986.1

Bodin, E., Campbell, N. D. F., & Ek, C. H. (2017). Latent Gaussian Process Regression. arXiv:1707.05534 [Cs, Stat]. http://arxiv.org/abs/1707.05534

Bodmer, W., Bailey, R. A., Charlesworth, B., Eyre-Walker, A., Farewell, V., Mead, A., & Senn, S. (2021). The outstanding scientist, R.A. Fisher: His views on eugenics and race. Heredity, 126(4), 565–576. https://doi.org/10.1038/s41437-020-00394-6

Bourotte, M., Allard, D., & Porcu, E. (2016). A flexible class of non-separable cross-covariance functions for multivariate space–time data. Spatial Statistics, 18, 125–146. https://doi.org/10.1016/j.spasta.2016.02.004

Breiman, L. (1996). Heuristics of instability and stabilization in model selection. The Annals of Statistics, 24(6), 2350–2383. https://doi.org/10.1214/aos/1032181158

Breiman, L. (2001). Statistical Modeling: The Two Cultures. Statistical Science, 16(3), 199–215. http://www.jstor.org/stable/2676681

Breiman, L., & Friedman, J. H. (1985). Estimating Optimal Transformations for Multiple Regression and Correlation. Journal of the American Statistical Association, 80(391), 580–598. https://doi.org/10.1080/01621459.1985.10478157

Breiman, L., & Friedman, J. H. (1988). Tree-Structured Classification Via Generalized Discriminant Analysis: Comment. Journal of the American Statistical Association, 83(403), 725–727. https://doi.org/10.2307/2289296

Breiman, L., & Spector, P. (1992). Submodel Selection and Evaluation in Regression. The X-Random Case. International Statistical Review / Revue Internationale de Statistique, 60(3), 291–319. https://doi.org/10.2307/1403680

bremen79. (2020). Neural Networks (Maybe) Evolved to Make Adam The Best Optimizer. In Parameter-free Learning and Optimization Algorithms.

Bryan, J. (2017). Project-oriented workflow. In Tidyverse. https://www.tidyverse.org/blog/2017/12/workflow-vs-script/

Buckingham-Jeffery, E., Isham, V., & House, T. (2018). Gaussian process approximations for fast inference from infectious disease data. Mathematical Biosciences, 301, 111–120. https://doi.org/10.1016/j.mbs.2018.02.003

Bujokas, E. (2022). Gradient Boosting in Python from Scratch. In Medium. https://towardsdatascience.com/gradient-boosting-in-python-from-scratch-788d1cf1ca7

Burden, S., Cressie, N., & Steel, D. G. (2015). The SAR Model for Very Large Datasets: A Reduced Rank Approach. Econometrics, 3(2), 317–338. https://doi.org/10.3390/econometrics3020317

Burzykowski, P. B. and T. (n.d.). 5 Introduction to Instance-level Exploration | Explanatory Model Analysis.

Burzykowski, P. B. and T. (2020). Explanatory model analysis.

Bussola, N., Marcolini, A., Maggio, V., Jurman, G., & Furlanello, C. (2020). AI slipping on tiles: Data leakage in digital pathology. arXiv. https://doi.org/10.48550/arXiv.1909.06539

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785

Chen, Y., & Yang, Y. (2021). The One Standard Error Rule for Model Selection: Does It Work? Stats, 4(4), 868–892. https://doi.org/10.3390/stats4040051

Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1). https://doi.org/10.1214/09-AOAS285

Cho, P. H. (2018). Does Xgboost do Newton boosting? In GitHub. https://github.com/dmlc/xgboost/issues/3227

Clarke, B., Clarke, J., & Yu, C. W. (2014). Statistical Problem Classes and Their Links to Information Theory. Econometric Reviews, 33(1-4), 337–371. https://doi.org/10.1080/07474938.2013.807190

Cygu, S., Seow, H., Dushoff, J., & Bolker, B. M. (2023). Comparing machine learning approaches to incorporate time-varying covariates in predicting cancer survival time. Scientific Reports, 13(1), 1370. https://doi.org/10.1038/s41598-023-28393-7

Dahlgren, J. P. (2010). Alternative regression methods are not considered in Murtaugh (2009) or by ecologists in general. Ecology Letters, 13(5), E7–E9. https://doi.org/10.1111/j.1461-0248.2010.01460.x

Datta, A., Banerjee, S., Finley, A. O., & Gelfand, A. E. (2016). Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets. Journal of the American Statistical Association, 111(514), 800–812. https://doi.org/10.1080/01621459.2015.1044091

Datta, A., Banerjee, S., Finley, A. O., Hamm, N. A. S., & Schaap, M. (2016). Nonseparable dynamic nearest neighbor Gaussian process models for large spatio-temporal data with an application to particulate matter analysis. The Annals of Applied Statistics, 10(3). https://doi.org/10.1214/16-AOAS931

De Oliveira, V., & Han, Z. (2022). On Information About Covariance Parameters in Gaussian Matérn Random Fields. Journal of Agricultural, Biological and Environmental Statistics. https://doi.org/10.1007/s13253-022-00510-5

Dezeure, R., Bühlmann, P., Meier, L., & Meinshausen, N. (2015). High-dimensional inference: Confidence intervals, p-values and R software hdi. Statistical Science, 30(4), 533–558. https://doi.org/10.1214/15-STS527

Donoho, D. L., Johnstone, I. M., Kerkyacharian, G., & Picard, D. (1995). Wavelet Shrinkage: Asymptopia? Journal of the Royal Statistical Society: Series B (Methodological), 57(2), 301–337. https://doi.org/10.1111/j.2517-6161.1995.tb02032.x

Dyson, F. (2005). Wise Man. New York Review of Books. https://www.nybooks.com/articles/2005/10/20/wise-man/

Efron, B., & Gong, G. (1983). A Leisurely Look at the Bootstrap, the Jackknife, and Cross-Validation. The American Statistician, 37(1), 36–48. https://doi.org/10.1080/00031305.1983.10483087

Eilers, P. H. C., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89–121. https://doi.org/10.1214/ss/1038425655

El-Bachir, Y., & Davison, A. C. (n.d.). Fast Automatic Smoothing for Generalized Additive Models.

Elith, J., Leathwick, J. R., & Hastie, T. (2008a). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802–813. https://doi.org/10.1111/j.1365-2656.2008.01390.x

Elith, J., Leathwick, J. R., & Hastie, T. (2008b). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802–813. https://doi.org/10.1111/j.1365-2656.2008.01390.x

Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7(2), 179–188. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x

Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5), 1189–1232. https://www.jstor.org/stable/2699986

Friedman, J., Hastie, T., Höfling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics, 1(2), 302–332. https://doi.org/10.1214/07-AOAS131

Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics (Oxford, England), 9(3), 432–441. https://doi.org/10.1093/biostatistics/kxm045

Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2929880/

Garrido-Merchán, E. C., & Hernández-Lobato, D. (2017). Dealing with Integer-valued Variables in Bayesian Optimization with Gaussian Processes. arXiv:1706.03673 [Stat]. http://arxiv.org/abs/1706.03673

Gelman, A. (2020). The typical set and its relevance to Bayesian computation. In Statistical Modeling, Causal Inference, and Social Science. https://statmodeling.stat.columbia.edu/2020/08/02/the-typical-set-and-its-relevance-to-bayesian-computation/.

Gelman, A. (2021). Reflections on Breiman’s Two Cultures of Statistical Modeling. Observational Studies, 7(1), 95–98. https://doi.org/10.1353/obs.2021.0025

Giraud-Carrier, C., & Provost, F. (2005). Toward a justification of meta-learning: Is the no free lunch theorem a show-stopper? Proceedings of the ICML-2005 Workshop on Meta-Learning.

Girolami, M., Calderhead, B., & Chin, S. A. (2019). Riemannian Manifold Hamiltonian Monte Carlo. arXiv:0907.1100 [Cs, Math, Stat]. https://arxiv.org/abs/0907.1100

Golub, G. H., Heath, M., & Wahba, G. (1979). Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter. Technometrics, 21(2), 215–223. https://doi.org/10.1080/00401706.1979.10489751

Görtler, J., Kehlbeck, R., & Deussen, O. (2019). A Visual Exploration of Gaussian Processes. Distill, 4(4), e17. https://doi.org/10.23915/distill.00017

Gramacy, R. B. (n.d.). Surrogates.

Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On Calibration of Modern Neural Networks. Proceedings of the 34th International Conference on Machine Learning, 1321–1330. https://proceedings.mlr.press/v70/guo17a.html

Hand, D. J. (2009). Measuring classifier performance: A coherent alternative to the area under the ROC curve. Machine Learning, 77(1), 103–123. https://doi.org/10.1007/s10994-009-5119-5

Harris, D. J. (2015). Generating realistic assemblages with a joint species distribution model. Methods in Ecology and Evolution, 6(4), 465–473. https://doi.org/10.1111/2041-210X.12332

Hastie, T. (2020). Ridge Regularization: An Essential Concept in Data Science. Technometrics, 62(4), 426–433. https://doi.org/10.1080/00401706.2020.1791959

Hastie, T., & Tibshirani, R. (1987). Generalized Additive Models: Some Applications. Journal of the American Statistical Association, 82(398), 371–386. https://doi.org/10.1080/01621459.1987.10478440

Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning data mining, inference, and prediction. Springer. http://public.eblib.com/EBLPublic/PublicView.do?ptiID=437866

Hensman, J., & Ghahramani, Z. (n.d.). Scalable Variational Gaussian Process Classiﬁcation. 10.

Irfan, M. O., & Bull, P. (2021). Cleaning foregrounds from single-dish 21 cm intensity maps with Kernel principal component analysis. Monthly Notices of the Royal Astronomical Society, 508(3), 3551–3568. https://doi.org/10.1093/mnras/stab2855

Jakkala, K. (2021). Deep Gaussian Processes: A Survey. arXiv:2106.12135 [Cs, Stat]. http://arxiv.org/abs/2106.12135

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). Springer.

Janson, L., Fithian, W., & Hastie, T. (2015). Effective Degrees of Freedom: A Flawed Metaphor. Biometrika, 102(2), 479–485. https://doi.org/10.1093/biomet/asv019

Jones, A. (2021). The Matérn class of covariance functions. In Andy Jones. https://andrewcharlesjones.github.io/journal/matern-kernels.html

Jović, A., Brkić, K., & Bogunović, N. (2015). A review of feature selection methods with applications. 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 1200–1205. https://doi.org/10.1109/MIPRO.2015.7160458

Jurek, M., & Katzfuss, M. (2022). Scalable Spatio-Temporal Smoothing via Hierarchical Sparse Cholesky Decomposition. arXiv. https://doi.org/10.48550/arXiv.2207.09384

Kammann, E. E., & Wand, M. P. (2003). Geoadditive models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 52(1), 1–18. https://doi.org/10.1111/1467-9876.00385

Krasser, M. (2018). Gaussian processes. http://krasserm.github.io/2018/03/19/gaussian-processes/

Krasser, M. (2020). Sparse Gaussian processes. http://krasserm.github.io/2020/12/12/gaussian-processes-sparse/

Kuhn, M. (2017). Nested resampling with rsample. In Applied Predictive Modeling. http://appliedpredictivemodeling.com/blog/2017/9/2/njdc83d01pzysvvlgik02t5qnaljnd

Kumar, I. E., Venkatasubramanian, S., Scheidegger, C., & Friedler, S. (2020). Problems with Shapley-value-based explanations as feature importance measures. Proceedings of the 37th International Conference on Machine Learning, 5491–5500. http://proceedings.mlr.press/v119/kumar20e/kumar20e.pdf

Lambert, B., & Vehtari, A. (2022). R\({_\ast}\): A Robust MCMC Convergence Diagnostic with Uncertainty Using Decision Tree Classifiers. Bayesian Analysis, 17(2), 353–379. https://doi.org/10.1214/20-BA1252

Larsen, K. (2015). GAM: The Predictive Modeling Silver Bullet | Stitch Fix Technology Multithreaded. In MultiThreaded (StitchFix). https://multithreaded.stitchfix.com/blog/2015/07/30/gam/

Lee, J. D., Sun, Y., & Saunders, M. A. (2014). Proximal Newton-Type Methods for Minimizing Composite Functions. SIAM Journal on Optimization, 24(3), 1420–1443. https://doi.org/10.1137/130921428

Lindeløv, J. K. (2019). Common statistical tests are linear models (or: How to teach stats). https://lindeloev.github.io/tests-as-linear/

Loh, W.-Y., & Vanichsetakul, N. (1988). Tree-Structured Classification via Generalized Discriminant Analysis. Journal of the American Statistical Association, 83(403), 715–725. https://doi.org/10.1080/01621459.1988.10478652

Marra, G., & Wood, S. N. (2011). Practical variable selection for generalized additive models. Computational Statistics & Data Analysis, 55(7), 2372–2387. https://doi.org/10.1016/j.csda.2011.02.004

McCormick, T. (2021). The "given data" paradigm undermines both cultures. arXiv:2105.12478 [Cs, Stat]. http://arxiv.org/abs/2105.12478

Meinshausen, N. (n.d.). Quantile Regression Forests. 17.

Milà, C., Mateu, J., Pebesma, E., & Meyer, H. (2022). Nearest neighbour distance matching Leave-One-Out Cross-Validation for map validation. Methods in Ecology and Evolution, 13(6), 1304–1316. https://doi.org/10.1111/2041-210X.13851

Miller, A. C., Foti, N. J., & Fox, E. B. (2021). Breiman’s two cultures: You don’t have to choose sides. arXiv:2104.12219 [Cs, Stat]. http://arxiv.org/abs/2104.12219

Minderer, M., Djolonga, J., Romijnders, R., Hubis, F., Zhai, X., Houlsby, N., Tran, D., & Lucic, M. (2021). Revisiting the Calibration of Modern Neural Networks. Advances in Neural Information Processing Systems, 34, 15682–15694. https://proceedings.neurips.cc/paper/2021/hash/8420d359404024567b5aefda1231af24-Abstract.html

Mount, J. (2012). How robust is logistic regression? In Win Vector LLC. https://win-vector.com/2012/08/23/how-robust-is-logistic-regression/.

Murtaugh, P. A. (2009). Performance of several variable-selection methods applied to real ecological data. Ecology Letters, 12(10), 1061–1068. https://doi.org/10.1111/j.1461-0248.2009.01361.x

Navarro, D. (2019). Science and statistics. https://slides.com/djnavarro/scienceandstatistics

Nazarathy, Y., & Klok, H. (2021). Statistics with Julia: Fundamentals for data science, machine learning and artificial intelligence. Springer International Publishing. https://doi.org/10.1007/978-3-030-70901-3

Neal, R. M. (2012). Bayesian learning for neural networks (Vol. 118). Springer Science & Business Media.

Paciorek, C., & Schervish, M. (2003). Nonstationary Covariance Functions for Gaussian Process Regression. Advances in Neural Information Processing Systems, 16. https://proceedings.neurips.cc/paper/2003/hash/326a8c055c0d04f5b06544665d8bb3ea-Abstract.html

Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238. https://doi.org/10.1109/TPAMI.2005.159

Perperoglou, A., Sauerbrei, W., Abrahamowicz, M., & Schmid, M. (2019). A review of spline function procedures in R. BMC Medical Research Methodology, 19(1), 46. https://doi.org/10.1186/s12874-019-0666-3

Poynor, V., & Munch, S. (2017). Combining functional data with hierarchical Gaussian process models. Environmental and Ecological Statistics, 24(2), 175–199. https://doi.org/10.1007/s10651-017-0366-2

Prechelt, L. (2012). Early Stopping - but when? In G. Montavon & K.-R. Müller (Eds.), Neural Networks: Tricks of the Trade (pp. 53–67). http://page.mi.fu-berlin.de/~prechelt/Biblio/stop_tricks1997.pdf

Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2007). Numerical Recipes 3rd Edition: The Art of Scientific Computing (3rd ed.). Cambridge University Press.

Raper, S. (2020). Leo Breiman’s "Two Cultures". Significance, 17(1), 34–37. https://doi.org/10.1111/j.1740-9713.2020.01357.x

Rasmussen, C. E., & Williams, C. K. I. (2005). Gaussian Processes for Machine Learning. The MIT Press.

Ratz, A. V. (2021). Can QR Decomposition Be Actually Faster? Schwarz-Rutishauser Algorithm. In Medium. https://towardsdatascience.com/can-qr-decomposition-be-actually-faster-schwarz-rutishauser-algorithm-a32c0cde8b9b.

Reiss, P. T., & Ogden, R. T. (2009). Smoothing parameter selection for a class of semiparametric linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 505–523. https://doi.org/10.1111/j.1467-9868.2008.00695.x

Riihimäki, J., & Vehtari, A. (2010). Gaussian processes with monotonicity information. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 645–652. https://proceedings.mlr.press/v9/riihimaki10a.html

Robert, C. P., & Roberts, G. O. (2021). Rao-Blackwellization in the MCMC era (No. arXiv:2101.01011). arXiv. https://doi.org/10.48550/arXiv.2101.01011

Roberts, D. R., Bahn, V., Ciuti, S., Boyce, M. S., Elith, J., Guillera-Arroita, G., Hauenstein, S., Lahoz-Monfort, J. J., Schröder, B., Thuiller, W., Warton, D. I., Wintle, B. A., Hartig, F., & Dormann, C. F. (2017). Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography, 40(8), 913–929. https://doi.org/10.1111/ecog.02881

Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 319–392. https://doi.org/10.1111/j.1467-9868.2008.00700.x

Sansó, B., Schmidt, A. M., & Nobre, A. A. (2008). Bayesian Spatio-Temporal Models Based on Discrete Convolutions. The Canadian Journal of Statistics / La Revue Canadienne de Statistique, 36(2), 239–258. http://www.jstor.org/stable/20445307

Schölkopf, B., Smola, A., & Müller, K.-R. (1997). Kernel principal component analysis. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Artificial Neural Networks ICANN’97 (pp. 583–588). Springer. https://doi.org/10.1007/BFb0020217

Shafer, G., & Vovk, V. (2008a). A tutorial on conformal prediction. Journal of Machine Learning Research, 9, 371–421. https://jmlr.csail.mit.edu/papers/volume9/shafer08a/shafer08a.pdf

Shafer, G., & Vovk, V. (2008b). A Tutorial on Conformal Prediction. Journal of Machine Learning Research, 9, 371–421.

Shalizi, C. R. (2022). Advanced data analysis from an elementary point of view. https://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/

Sigrist, F. (2018). Gradient and Newton Boosting for Classification and Regression. In arXiv.org. https://doi.org/10.48550/arXiv.1808.03064

Silge, M. K. and J. (n.d.). 18 Explaining Models and Predictions | Tidy Modeling with R.

Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2011). Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. Journal of Statistical Software, 39(5), 1–13. https://doi.org/10.18637/jss.v039.i05

Stone, M. (1977). An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike’s Criterion. J. Royal Stat. Soc. B, 39(1), 44–47. https://www.jstor.org/stable/2984877

Taquet, V. (2021). With MAPIE, uncertainties are back in machine learning ! In Medium. https://towardsdatascience.com/with-mapie-uncertainties-are-back-in-machine-learning-882d5c17fdc3

Valavi, R., Elith, J., Lahoz-Monfort, J. J., & Guillera-Arroita, G. (2019). blockCV: An r package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models. Methods in Ecology and Evolution, 10(2), 225–232. https://doi.org/10.1111/2041-210X.13107

van den Goorbergh, R., Smeden, M. van, Timmerman, D., & Van Calster, B. (2022). The harm of class imbalance corrections for risk prediction models: Illustration and simulation using logistic regression. Journal of the American Medical Informatics Association, ocac093. https://doi.org/10.1093/jamia/ocac093

van Houwelingen, J. C. (2001). Shrinkage and Penalized Likelihood as Methods to Improve Predictive Accuracy. Statistica Neerlandica, 55(1), 17–34. https://doi.org/10.1111/1467-9574.00154

Vehtari, A. (2021). Gaussian process demonstration with Stan. https://avehtari.github.io/casestudies/Motorcycle/motorcycle_gpcourse.html

Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P.-C. (2021). Rank-Normalization, Folding, and Localization: An Improved R^ for Assessing Convergence of MCMC (with Discussion). Bayesian Analysis, 16(2), 667–718. https://doi.org/10.1214/20-BA1221

Venables, W. N. (1998). Exegeses on Linear Models. http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf

Wager, S., Hastie, T., & Efron, B. (n.d.). Conﬁdence Intervals for Random Forests: The Jackknife and the Inﬁnitesimal Jackknife.

Wainer, J., & Cawley, G. (2021). Nested cross-validation when selecting classifiers is overzealous for most practical applications. Expert Systems with Applications, 182, 115222. https://doi.org/10.1016/j.eswa.2021.115222

Walters, C. J., & Ludwig, D. (1981). Effects of Measurement Errors on the Assessment of Stock–Recruitment Relationships. Canadian Journal of Fisheries and Aquatic Sciences, 38(6), 704–710. https://doi.org/10.1139/f81-093

Wand, M. P., & Ormerod, J. T. (2011). Penalized wavelets: Embedding wavelets into semiparametric regression. Electronic Journal of Statistics, 5(none). https://doi.org/10.1214/11-EJS652

Warnes, J. J., & Ripley, B. D. (1987). Problems with Likelihood Estimation of Covariance Functions of Spatial Gaussian Processes. Biometrika, 74(3), 640–642. http://www.jstor.org/stable/2336705

Wenger, J., Pleiss, G., Hennig, P., Cunningham, J. P., & Gardner, J. R. (2022). Preconditioning for Scalable Gaussian Process Hyperparameter Optimization. arXiv. https://doi.org/10.48550/arXiv.2107.00243

Wenger, S. J., & Olden, J. D. (2012). Assessing transferability of ecological models: An underappreciated aspect of statistical validation. Methods in Ecology and Evolution, 3(2), 260–267. https://doi.org/10.1111/j.2041-210X.2011.00170.x

Witten, D. M., Tibshirani, R., & Hastie, T. (2009b). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, kxp008. https://doi.org/10.1093/biostatistics/kxp008

Witten, D. M., Tibshirani, R., & Hastie, T. (2009a). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, kxp008. https://doi.org/10.1093/biostatistics/kxp008

Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82. https://doi.org/10.1109/4235.585893

Wood, S. N. (2003). Thin plate regression splines. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(1), 95–114. https://doi.org/10.1111/1467-9868.00374

Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(1), 3–36. https://doi.org/10.1111/j.1467-9868.2010.00749.x

Wood, S. N. (2017a). Generalized Additive Models: An Introduction with R. Chapman & Hall. https://www.amazon.com/Generalized-Additive-Models-Introduction-Statistical-ebook/dp/B071Z9L5D5/ref=sr_1_1?ie=UTF8&qid=1511887995&sr=8-1&keywords=wood+additive+models

Wood, S. N. (2017b). P-splines with derivative based penalties and tensor product smoothing of unevenly distributed data. Statistics and Computing, 27(4), 985–989. https://doi.org/10.1007/s11222-016-9666-x

Yang, Y. (2005). Can the strengths of AIC and BIC be shared? A conflict between model identification and regression estimation. Biometrika, 92(4), 937–950. https://doi.org/10.1093/biomet/92.4.937

Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67. https://doi.org/10.1111/j.1467-9868.2005.00532.x

Zhang, L. (2018). Nearest Neighbor Gaussian Processes (NNGP) based models in Stan. In Stan Case Studies. https://mc-stan.org/users/documentation/case-studies/nngp.html

Zhao, S., Witten, D., & Shojaie, A. (2021). In defense of the indefensible: A very naïve approach to high-dimensional inference. Statistical Science, 36(4), 562–577. https://doi.org/10.1214/20-STS815

Zou, H., Hastie, T., & Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. The Annals of Statistics, 35(5), 2173–2192. https://doi.org/10.1214/009053607000000127