Abbott, M. C., & Machta, B. B. (2022). Far from Asymptopia (arXiv:2205.03343). arXiv.

Abramovich, F., Sapatinas, T., & Silverman, B. W. (1998). Wavelet thresholding via a Bayesian approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(4), 725–749.

Adar, E. (2015). On the value of command-line “bullshittery”. In Medium.

Agrawal, R., Huggins, J. H., Trippe, B., & Broderick, T. (2019). The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions. arXiv:1905.06501 [Cs, Stat].

Alam, M. A., & Fukumizu, K. (2014). Hyperparameter selection in kernel principal component analysis. Journal of Computer Science, 10(7), 1139.

Anderson, E. (1935). The irises of the Gaspe peninsula. Bull. Am. Iris Soc., 59, 2–5.

Anderson, E. (1936). The Species Problem in Iris. Annals of the Missouri Botanical Garden, 23(3), 457–509.

Atlas. (2013). QR factorization for ridge regression. In Mathematics Stack Exchange.

Banerjee, S. (2017). High-Dimensional Bayesian Geostatistics. Bayesian Analysis, 12(2), 583–614.

Banerjee, S., & Gelfand, A. E. (2003). On smoothness properties of spatial processes. Journal of Multivariate Analysis, 84(1), 85–100.

Barber, R. F., Candès, E. J., Ramdas, A., & Tibshirani, R. J. (2021). Predictive inference with the jackknife+. The Annals of Statistics, 49(1), 486–507.

Barber, S., & Nason, G. P. (2004). Real nonparametric regression using complex wavelets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66(4), 927–939.

Barillec, R., Ingram, B., Cornford, D., & Csató, L. (2011). Projected sequential Gaussian processes: A C++ tool for interpolation of large datasets with heterogeneous noise. Computers & Geosciences, 37(3), 295–309.

Bates, S., Hastie, T., & Tibshirani, R. (n.d.). Cross-validation: What does it estimate and how well does it do it? 36.

Bezdek, J. C., Keller, J. M., Krishnapuram, R., Kuncheva, L. I., & Pal, N. R. (1999). Will the real iris data please stand up? IEEE Transactions on Fuzzy Systems, 7(3), 368–369.

Bien, J., Taylor, J., & Tibshirani, R. (2013). A lasso for hierarchical interactions. The Annals of Statistics, 41(3), 1111–1141.

Blanchet, F. G., Legendre, P., & Borcard, D. (2008). Forward Selection of Explanatory Variables. Ecology, 89(9), 2623–2632.

Bodin, E., Campbell, N. D. F., & Ek, C. H. (2017). Latent Gaussian Process Regression. arXiv:1707.05534 [Cs, Stat].

Bodmer, W., Bailey, R. A., Charlesworth, B., Eyre-Walker, A., Farewell, V., Mead, A., & Senn, S. (2021). The outstanding scientist, R.A. Fisher: His views on eugenics and race. Heredity, 126(4), 565–576.

Bourotte, M., Allard, D., & Porcu, E. (2016). A flexible class of non-separable cross-covariance functions for multivariate space–time data. Spatial Statistics, 18, 125–146.

Breiman, L. (1996). Heuristics of instability and stabilization in model selection. The Annals of Statistics, 24(6), 2350–2383.

Breiman, L. (2001). Statistical Modeling: The Two Cultures. Statistical Science, 16(3), 199–215.

Breiman, L., & Friedman, J. H. (1988). Tree-Structured Classification Via Generalized Discriminant Analysis: Comment. Journal of the American Statistical Association, 83(403), 725–727.

Breiman, L., & Friedman, J. H. (1985). Estimating Optimal Transformations for Multiple Regression and Correlation. Journal of the American Statistical Association, 80(391), 580–598.

Breiman, L., & Spector, P. (1992). Submodel Selection and Evaluation in Regression. The X-Random Case. International Statistical Review / Revue Internationale de Statistique, 60(3), 291–319.

bremen79. (2020). Neural Networks (Maybe) Evolved to Make Adam The Best Optimizer. In Parameter-free Learning and Optimization Algorithms.

Bryan, J. (2017). Project-oriented workflow. In Tidyverse.

Buckingham-Jeffery, E., Isham, V., & House, T. (2018). Gaussian process approximations for fast inference from infectious disease data. Mathematical Biosciences, 301, 111–120.

Bujokas, E. (2022). Gradient Boosting in Python from Scratch. In Medium.

Burden, S., Cressie, N., & Steel, D. G. (2015). The SAR Model for Very Large Datasets: A Reduced Rank Approach. Econometrics, 3(2), 317–338.

Burzykowski, P. B. and T. (n.d.). 5 Introduction to Instance-level Exploration | Explanatory Model Analysis.

Burzykowski, P. B. and T. (2020). Explanatory model analysis.

Bussola, N., Marcolini, A., Maggio, V., Jurman, G., & Furlanello, C. (2020). AI slipping on tiles: Data leakage in digital pathology. arXiv.

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357.

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794.

Chen, Y., & Yang, Y. (2021). The One Standard Error Rule for Model Selection: Does It Work? Stats, 4(4), 868–892.

Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1).

Cho, P. H. (2018). Does Xgboost do Newton boosting? In GitHub.

Clarke, B., Clarke, J., & Yu, C. W. (2014). Statistical Problem Classes and Their Links to Information Theory. Econometric Reviews, 33(1-4), 337–371.

Cygu, S., Seow, H., Dushoff, J., & Bolker, B. M. (2023). Comparing machine learning approaches to incorporate time-varying covariates in predicting cancer survival time. Scientific Reports, 13(1), 1370.

Dahlgren, J. P. (2010). Alternative regression methods are not considered in Murtaugh (2009) or by ecologists in general. Ecology Letters, 13(5), E7–E9.

Datta, A., Banerjee, S., Finley, A. O., & Gelfand, A. E. (2016). Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets. Journal of the American Statistical Association, 111(514), 800–812.

Datta, A., Banerjee, S., Finley, A. O., Hamm, N. A. S., & Schaap, M. (2016). Nonseparable dynamic nearest neighbor Gaussian process models for large spatio-temporal data with an application to particulate matter analysis. The Annals of Applied Statistics, 10(3).

De Oliveira, V., & Han, Z. (2022). On Information About Covariance Parameters in Gaussian Matérn Random Fields. Journal of Agricultural, Biological and Environmental Statistics.

Dezeure, R., Bühlmann, P., Meier, L., & Meinshausen, N. (2015). High-dimensional inference: Confidence intervals, p-values and R software hdi. Statistical Science, 30(4), 533–558.

Donoho, D. L., Johnstone, I. M., Kerkyacharian, G., & Picard, D. (1995). Wavelet Shrinkage: Asymptopia? Journal of the Royal Statistical Society: Series B (Methodological), 57(2), 301–337.

Dyson, F. (2005). Wise Man. New York Review of Books.

Efron, B., & Gong, G. (1983). A Leisurely Look at the Bootstrap, the Jackknife, and Cross-Validation. The American Statistician, 37(1), 36–48.

Eilers, P. H. C., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89–121.

El-Bachir, Y., & Davison, A. C. (n.d.). Fast Automatic Smoothing for Generalized Additive Models.

Elith, J., Leathwick, J. R., & Hastie, T. (2008a). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802–813.

Elith, J., Leathwick, J. R., & Hastie, T. (2008b). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802–813.

Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7(2), 179–188.

Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5), 1189–1232.

Friedman, J., Hastie, T., Höfling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics, 1(2), 302–332.

Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22.

Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics (Oxford, England), 9(3), 432–441.

Garrido-Merchán, E. C., & Hernández-Lobato, D. (2017). Dealing with Integer-valued Variables in Bayesian Optimization with Gaussian Processes. arXiv:1706.03673 [Stat].

Gelman, A. (2021). Reflections on Breiman’s Two Cultures of Statistical Modeling. Observational Studies, 7(1), 95–98.

Gelman, A. (2020). The typical set and its relevance to Bayesian computation. In Statistical Modeling, Causal Inference, and Social Science.

Giraud-Carrier, C., & Provost, F. (2005). Toward a justification of meta-learning: Is the no free lunch theorem a show-stopper? Proceedings of the ICML-2005 Workshop on Meta-Learning.

Girolami, M., Calderhead, B., & Chin, S. A. (2019). Riemannian Manifold Hamiltonian Monte Carlo. arXiv:0907.1100 [Cs, Math, Stat].

Golub, G. H., Heath, M., & Wahba, G. (1979). Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter. Technometrics, 21(2), 215–223.

Görtler, J., Kehlbeck, R., & Deussen, O. (2019). A Visual Exploration of Gaussian Processes. Distill, 4(4), e17.

Gramacy, R. B. (n.d.). Surrogates.

Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On Calibration of Modern Neural Networks. Proceedings of the 34th International Conference on Machine Learning, 1321–1330.

Hand, D. J. (2009). Measuring classifier performance: A coherent alternative to the area under the ROC curve. Machine Learning, 77(1), 103–123.

Harris, D. J. (2015). Generating realistic assemblages with a joint species distribution model. Methods in Ecology and Evolution, 6(4), 465–473.

Hastie, T. (2020). Ridge Regularization: An Essential Concept in Data Science. Technometrics, 62(4), 426–433.

Hastie, T., & Tibshirani, R. (1987). Generalized Additive Models: Some Applications. Journal of the American Statistical Association, 82(398), 371–386.

Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning data mining, inference, and prediction. Springer.

Hensman, J., & Ghahramani, Z. (n.d.). Scalable Variational Gaussian Process Classification. 10.

Irfan, M. O., & Bull, P. (2021). Cleaning foregrounds from single-dish 21 Cm intensity maps with Kernel principal component analysis. Monthly Notices of the Royal Astronomical Society, 508(3), 3551–3568.

Jakkala, K. (2021). Deep Gaussian Processes: A Survey. arXiv:2106.12135 [Cs, Stat].

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). Springer.

Janson, L., Fithian, W., & Hastie, T. (2015). Effective Degrees of Freedom: A Flawed Metaphor. Biometrika, 102(2), 479–485.

Jones, A. (2021). The Matérn class of covariance functions. In Andy Jones.

Jović, A., Brkić, K., & Bogunović, N. (2015). A review of feature selection methods with applications. 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 1200–1205.

Jurek, M., & Katzfuss, M. (2022). Scalable Spatio-Temporal Smoothing via Hierarchical Sparse Cholesky Decomposition. arXiv.

Kammann, E. E., & Wand, M. P. (2003). Geoadditive models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 52(1), 1–18.

Krasser, M. (2018). Gaussian processes.

Krasser, M. (2020). Sparse Gaussian processes.

Kuhn, M. (2017). Nested resampling with rsample. In Applied Predictive Modeling.

Kumar, I. E., Venkatasubramanian, S., Scheidegger, C., & Friedler, S. (2020). Problems with Shapley-value-based explanations as feature importance measures. Proceedings of the 37th International Conference on Machine Learning, 5491–5500.

Lambert, B., & Vehtari, A. (2022). R\({_\ast}\): A Robust MCMC Convergence Diagnostic with Uncertainty Using Decision Tree Classifiers. Bayesian Analysis, 17(2), 353–379.

Larsen, K. (2015). GAM: The Predictive Modeling Silver Bullet | Stitch Fix Technology Multithreaded. In MultiThreaded (StitchFix).

Lee, J. D., Sun, Y., & Saunders, M. A. (2014). Proximal Newton-Type Methods for Minimizing Composite Functions. SIAM Journal on Optimization, 24(3), 1420–1443.

Lindeløv, J. K. (2019). Common statistical tests are linear models (or: How to teach stats).

Loh, W.-Y., & Vanichsetakul, N. (1988). Tree-Structured Classification via Generalized Discriminant Analysis. Journal of the American Statistical Association, 83(403), 715–725.

Marra, G., & Wood, S. N. (2011). Practical variable selection for generalized additive models. Computational Statistics & Data Analysis, 55(7), 2372–2387.

McCormick, T. (2021). The "given data" paradigm undermines both cultures. arXiv:2105.12478 [Cs, Stat].

Meinshausen, N. (n.d.). Quantile Regression Forests. 17.

Milà, C., Mateu, J., Pebesma, E., & Meyer, H. (2022). Nearest neighbour distance matching Leave-One-Out Cross-Validation for map validation. Methods in Ecology and Evolution, 13(6), 1304–1316.

Miller, A. C., Foti, N. J., & Fox, E. B. (2021). Breiman’s two cultures: You don’t have to choose sides. arXiv:2104.12219 [Cs, Stat].

Minderer, M., Djolonga, J., Romijnders, R., Hubis, F., Zhai, X., Houlsby, N., Tran, D., & Lucic, M. (2021). Revisiting the Calibration of Modern Neural Networks. Advances in Neural Information Processing Systems, 34, 15682–15694.

Mount, J. (2012). How robust is logistic regression? In Win Vector LLC.

Murtaugh, P. A. (2009). Performance of several variable-selection methods applied to real ecological data. Ecology Letters, 12(10), 1061–1068.

Navarro, D. (2019). Science and statistics.

Nazarathy, Y., & Klok, H. (2021). Statistics with Julia: Fundamentals for data science, machine learning and artificial intelligence. Springer International Publishing.

Neal, R. M. (2012). Bayesian learning for neural networks (Vol. 118). Springer Science & Business Media.

Paciorek, C., & Schervish, M. (2003). Nonstationary Covariance Functions for Gaussian Process Regression. Advances in Neural Information Processing Systems, 16.

Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238.

Perperoglou, A., Sauerbrei, W., Abrahamowicz, M., & Schmid, M. (2019). A review of spline function procedures in R. BMC Medical Research Methodology, 19(1), 46.

Poynor, V., & Munch, S. (2017). Combining functional data with hierarchical Gaussian process models. Environmental and Ecological Statistics, 24(2), 175–199.

Prechelt, L. (2012). Early Stopping - but when? In G. Montavon & K.-R. Müller (Eds.), Neural Networks: Tricks of the Trade (pp. 53–67).

Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2007). Numerical Recipes 3rd Edition: The Art of Scientific Computing (3rd ed.). Cambridge University Press.

Raper, S. (2020). Leo Breiman’s "Two Cultures". Significance, 17(1), 34–37.

Rasmussen, C. E., & Williams, C. K. I. (2005). Gaussian Processes for Machine Learning. The MIT Press.

Ratz, A. V. (2021). Can QR Decomposition Be Actually Faster? Schwarz-Rutishauser Algorithm. In Medium.

Reiss, P. T., & Ogden, R. T. (2009). Smoothing parameter selection for a class of semiparametric linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 505–523.

Riihimäki, J., & Vehtari, A. (2010). Gaussian processes with monotonicity information. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 645–652.

Roberts, D. R., Bahn, V., Ciuti, S., Boyce, M. S., Elith, J., Guillera-Arroita, G., Hauenstein, S., Lahoz-Monfort, J. J., Schröder, B., Thuiller, W., Warton, D. I., Wintle, B. A., Hartig, F., & Dormann, C. F. (2017). Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography, 40(8), 913–929.

Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 319–392.

Sansó, B., Schmidt, A. M., & Nobre, A. A. (2008). Bayesian Spatio-Temporal Models Based on Discrete Convolutions. The Canadian Journal of Statistics / La Revue Canadienne de Statistique, 36(2), 239–258.

Schölkopf, B., Smola, A., & Müller, K.-R. (1997). Kernel principal component analysis. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Artificial Neural Networks ICANN’97 (pp. 583–588). Springer.

Shafer, G., & Vovk, V. (2008a). A tutorial on conformal prediction. Journal of Machine Learning Research, 9, 371–421.

Shafer, G., & Vovk, V. (2008b). A Tutorial on Conformal Prediction. Journal of Machine Learning Research, 9, 371–421.

Shalizi, C. R. (2022). Advanced data analysis from an elementary point of view.

Sigrist, F. (2018). Gradient and Newton Boosting for Classification and Regression. In

Silge, M. K. and J. (n.d.). 18 Explaining Models and Predictions | Tidy Modeling with R.

Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2011). Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. Journal of Statistical Software, 39(5), 1–13.

Stone, M. (1977). An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike’s Criterion. J. Royal Stat. Soc. B, 39(1), 44–47.

Taquet, V. (2021). With MAPIE, uncertainties are back in machine learning ! In Medium.

Valavi, R., Elith, J., Lahoz-Monfort, J. J., & Guillera-Arroita, G. (2019). blockCV: An r package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models. Methods in Ecology and Evolution, 10(2), 225–232.

van den Goorbergh, R., Smeden, M. van, Timmerman, D., & Van Calster, B. (2022). The harm of class imbalance corrections for risk prediction models: Illustration and simulation using logistic regression. Journal of the American Medical Informatics Association, ocac093.

van Houwelingen, J. C. (2001). Shrinkage and Penalized Likelihood as Methods to Improve Predictive Accuracy. Statistica Neerlandica, 55(1), 17–34.

Vehtari, A. (2021). Gaussian process demonstration with Stan.

Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P.-C. (2021). Rank-Normalization, Folding, and Localization: An Improved R^ for Assessing Convergence of MCMC (with Discussion). Bayesian Analysis, 16(2), 667–718.

Venables, W. N. (1998). Exegeses on Linear Models.

Wager, S., Hastie, T., & Efron, B. (n.d.). Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife.

Wainer, J., & Cawley, G. (2021). Nested cross-validation when selecting classifiers is overzealous for most practical applications. Expert Systems with Applications, 182, 115222.

Walters, C. J., & Ludwig, D. (1981). Effects of Measurement Errors on the Assessment of Stock–Recruitment Relationships. Canadian Journal of Fisheries and Aquatic Sciences, 38(6), 704–710.

Wand, M. P., & Ormerod, J. T. (2011). Penalized wavelets: Embedding wavelets into semiparametric regression. Electronic Journal of Statistics, 5(none).

Warnes, J. J., & Ripley, B. D. (1987). Problems with Likelihood Estimation of Covariance Functions of Spatial Gaussian Processes. Biometrika, 74(3), 640–642.

Wenger, J., Pleiss, G., Hennig, P., Cunningham, J. P., & Gardner, J. R. (2022). Preconditioning for Scalable Gaussian Process Hyperparameter Optimization. arXiv.

Wenger, S. J., & Olden, J. D. (2012). Assessing transferability of ecological models: An underappreciated aspect of statistical validation. Methods in Ecology and Evolution, 3(2), 260–267.

Witten, D. M., Tibshirani, R., & Hastie, T. (2009a). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, kxp008.

Witten, D. M., Tibshirani, R., & Hastie, T. (2009b). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, kxp008.

Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82.

Wood, S. N. (2003). Thin plate regression splines. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(1), 95–114.

Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(1), 3–36.

Wood, S. N. (2017b). P-splines with derivative based penalties and tensor product smoothing of unevenly distributed data. Statistics and Computing, 27(4), 985–989.

Yang, Y. (2005). Can the strengths of AIC and BIC be shared? A conflict between model identification and regression estimation. Biometrika, 92(4), 937–950.

Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67.

Zhang, L. (2018). Nearest Neighbor Gaussian Processes (NNGP) based models in Stan. In Stan Case Studies.

Zhao, S., Witten, D., & Shojaie, A. (2021). In defense of the indefensible: A very naïve approach to high-dimensional inference. Statistical Science, 36(4), 562–577.

Zou, H., Hastie, T., & Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. The Annals of Statistics, 35(5), 2173–2192.