Abbott, M. C., & Machta, B. B. (2022). Far from Asymptopia (arXiv:2205.03343). arXiv. http://arxiv.org/abs/arXiv:2205.03343
Abramovich, F., Sapatinas, T., & Silverman, B. W. (1998). Wavelet thresholding via a Bayesian approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(4), 725–749. https://doi.org/10.1111/1467-9868.00151
Adar, E. (2015). On the value of command-line “bullshittery”. In Medium. https://medium.com/@eytanadar/on-the-value-of-command-line-bullshittery-94dc19ec8c61
Agrawal, R., Huggins, J. H., Trippe, B., & Broderick, T. (2019). The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions. arXiv:1905.06501 [Cs, Stat]. http://arxiv.org/abs/1905.06501
Alam, M. A., & Fukumizu, K. (2014). Hyperparameter selection in kernel principal component analysis. Journal of Computer Science, 10(7), 1139.
Anderson, E. (1935). The irises of the Gaspe peninsula. Bull. Am. Iris Soc., 59, 2–5.
Anderson, E. (1936). The Species Problem in Iris. Annals of the Missouri Botanical Garden, 23(3), 457–509. https://doi.org/10.2307/2394164
Atlas. (2013). QR factorization for ridge regression. In Mathematics Stack Exchange. https://math.stackexchange.com/questions/299481/qr-factorization-for-ridge-regression
Banerjee, S. (2017). High-Dimensional Bayesian Geostatistics. Bayesian Analysis, 12(2), 583–614. https://doi.org/10.1214/17-BA1056R
Banerjee, S., & Gelfand, A. E. (2003). On smoothness properties of spatial processes. Journal of Multivariate Analysis, 84(1), 85–100. https://doi.org/10.1016/S0047-259X(02)00016-7
Barber, R. F., Candès, E. J., Ramdas, A., & Tibshirani, R. J. (2021). Predictive inference with the jackknife+. The Annals of Statistics, 49(1), 486–507. https://doi.org/10.1214/20-AOS1965
Barber, S., & Nason, G. P. (2004). Real nonparametric regression using complex wavelets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66(4), 927–939. https://doi.org/10.1111/j.1467-9868.2004.B5604.x
Barillec, R., Ingram, B., Cornford, D., & Csató, L. (2011). Projected sequential Gaussian processes: A C++ tool for interpolation of large datasets with heterogeneous noise. Computers & Geosciences, 37(3), 295–309. https://doi.org/10.1016/j.cageo.2010.05.008
Bates, S., Hastie, T., & Tibshirani, R. (n.d.). Cross-validation: What does it estimate and how well does it do it? 36.
Bezdek, J. C., Keller, J. M., Krishnapuram, R., Kuncheva, L. I., & Pal, N. R. (1999). Will the real iris data please stand up? IEEE Transactions on Fuzzy Systems, 7(3), 368–369. https://doi.org/10.1109/91.771092
Bien, J., Taylor, J., & Tibshirani, R. (2013). A lasso for hierarchical interactions. The Annals of Statistics, 41(3), 1111–1141. https://doi.org/10.1214/13-AOS1096
Blanchet, F. G., Legendre, P., & Borcard, D. (2008). Forward Selection of Explanatory Variables. Ecology, 89(9), 2623–2632. https://doi.org/10.1890/07-0986.1
Bodin, E., Campbell, N. D. F., & Ek, C. H. (2017). Latent Gaussian Process Regression. arXiv:1707.05534 [Cs, Stat]. http://arxiv.org/abs/1707.05534
Bodmer, W., Bailey, R. A., Charlesworth, B., Eyre-Walker, A., Farewell, V., Mead, A., & Senn, S. (2021). The outstanding scientist, R.A. Fisher: His views on eugenics and race. Heredity, 126(4), 565–576. https://doi.org/10.1038/s41437-020-00394-6
Bourotte, M., Allard, D., & Porcu, E. (2016). A flexible class of non-separable cross-covariance functions for multivariate space–time data. Spatial Statistics, 18, 125–146. https://doi.org/10.1016/j.spasta.2016.02.004
Breiman, L. (1996). Heuristics of instability and stabilization in model selection. The Annals of Statistics, 24(6), 2350–2383. https://doi.org/10.1214/aos/1032181158
Breiman, L. (2001). Statistical Modeling: The Two Cultures. Statistical Science, 16(3), 199–215. http://www.jstor.org/stable/2676681
Breiman, L., & Friedman, J. H. (1988). Tree-Structured Classification Via Generalized Discriminant Analysis: Comment. Journal of the American Statistical Association, 83(403), 725–727. https://doi.org/10.2307/2289296
Breiman, L., & Friedman, J. H. (1985). Estimating Optimal Transformations for Multiple Regression and Correlation. Journal of the American Statistical Association, 80(391), 580–598. https://doi.org/10.1080/01621459.1985.10478157
Breiman, L., & Spector, P. (1992). Submodel Selection and Evaluation in Regression. The X-Random Case. International Statistical Review / Revue Internationale de Statistique, 60(3), 291–319. https://doi.org/10.2307/1403680
bremen79. (2020). Neural Networks (Maybe) Evolved to Make Adam The Best Optimizer. In Parameter-free Learning and Optimization Algorithms.
Bryan, J. (2017). Project-oriented workflow. In Tidyverse. https://www.tidyverse.org/blog/2017/12/workflow-vs-script/
Buckingham-Jeffery, E., Isham, V., & House, T. (2018). Gaussian process approximations for fast inference from infectious disease data. Mathematical Biosciences, 301, 111–120. https://doi.org/10.1016/j.mbs.2018.02.003
Bujokas, E. (2022). Gradient Boosting in Python from Scratch. In Medium. https://towardsdatascience.com/gradient-boosting-in-python-from-scratch-788d1cf1ca7
Burden, S., Cressie, N., & Steel, D. G. (2015). The SAR Model for Very Large Datasets: A Reduced Rank Approach. Econometrics, 3(2), 317–338. https://doi.org/10.3390/econometrics3020317
Burzykowski, P. B. and T. (n.d.). 5 Introduction to Instance-level Exploration | Explanatory Model Analysis.
Burzykowski, P. B. and T. (2020). Explanatory model analysis.
Bussola, N., Marcolini, A., Maggio, V., Jurman, G., & Furlanello, C. (2020). AI slipping on tiles: Data leakage in digital pathology. arXiv. https://doi.org/10.48550/arXiv.1909.06539
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785
Chen, Y., & Yang, Y. (2021). The One Standard Error Rule for Model Selection: Does It Work? Stats, 4(4), 868–892. https://doi.org/10.3390/stats4040051
Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1). https://doi.org/10.1214/09-AOAS285
Cho, P. H. (2018). Does Xgboost do Newton boosting? In GitHub. https://github.com/dmlc/xgboost/issues/3227
Clarke, B., Clarke, J., & Yu, C. W. (2014). Statistical Problem Classes and Their Links to Information Theory. Econometric Reviews, 33(1-4), 337–371. https://doi.org/10.1080/07474938.2013.807190
Cygu, S., Seow, H., Dushoff, J., & Bolker, B. M. (2023). Comparing machine learning approaches to incorporate time-varying covariates in predicting cancer survival time. Scientific Reports, 13(1), 1370. https://doi.org/10.1038/s41598-023-28393-7
Dahlgren, J. P. (2010). Alternative regression methods are not considered in Murtaugh (2009) or by ecologists in general. Ecology Letters, 13(5), E7–E9. https://doi.org/10.1111/j.1461-0248.2010.01460.x
Datta, A., Banerjee, S., Finley, A. O., & Gelfand, A. E. (2016). Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets. Journal of the American Statistical Association, 111(514), 800–812. https://doi.org/10.1080/01621459.2015.1044091
Datta, A., Banerjee, S., Finley, A. O., Hamm, N. A. S., & Schaap, M. (2016). Nonseparable dynamic nearest neighbor Gaussian process models for large spatio-temporal data with an application to particulate matter analysis. The Annals of Applied Statistics, 10(3). https://doi.org/10.1214/16-AOAS931
De Oliveira, V., & Han, Z. (2022). On Information About Covariance Parameters in Gaussian Matérn Random Fields. Journal of Agricultural, Biological and Environmental Statistics. https://doi.org/10.1007/s13253-022-00510-5
Dezeure, R., Bühlmann, P., Meier, L., & Meinshausen, N. (2015). High-dimensional inference: Confidence intervals, p-values and R software hdi. Statistical Science, 30(4), 533–558. https://doi.org/10.1214/15-STS527
Donoho, D. L., Johnstone, I. M., Kerkyacharian, G., & Picard, D. (1995). Wavelet Shrinkage: Asymptopia? Journal of the Royal Statistical Society: Series B (Methodological), 57(2), 301–337. https://doi.org/10.1111/j.2517-6161.1995.tb02032.x
Dyson, F. (2005). Wise Man. New York Review of Books. https://www.nybooks.com/articles/2005/10/20/wise-man/
Efron, B., & Gong, G. (1983). A Leisurely Look at the Bootstrap, the Jackknife, and Cross-Validation. The American Statistician, 37(1), 36–48. https://doi.org/10.1080/00031305.1983.10483087
Eilers, P. H. C., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89–121. https://doi.org/10.1214/ss/1038425655
El-Bachir, Y., & Davison, A. C. (n.d.). Fast Automatic Smoothing for Generalized Additive Models.
Elith, J., Leathwick, J. R., & Hastie, T. (2008a). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802–813. https://doi.org/10.1111/j.1365-2656.2008.01390.x
Elith, J., Leathwick, J. R., & Hastie, T. (2008b). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802–813. https://doi.org/10.1111/j.1365-2656.2008.01390.x
Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7(2), 179–188. https://doi.org/10.1111/j.1469-1809.1936.tb02137.x
Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5), 1189–1232. https://www.jstor.org/stable/2699986
Friedman, J., Hastie, T., Höfling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics, 1(2), 302–332. https://doi.org/10.1214/07-AOAS131
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2929880/
Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics (Oxford, England), 9(3), 432–441. https://doi.org/10.1093/biostatistics/kxm045
Garrido-Merchán, E. C., & Hernández-Lobato, D. (2017). Dealing with Integer-valued Variables in Bayesian Optimization with Gaussian Processes. arXiv:1706.03673 [Stat]. http://arxiv.org/abs/1706.03673
Gelman, A. (2021). Reflections on Breiman’s Two Cultures of Statistical Modeling. Observational Studies, 7(1), 95–98. https://doi.org/10.1353/obs.2021.0025
Gelman, A. (2020). The typical set and its relevance to Bayesian computation. In Statistical Modeling, Causal Inference, and Social Science. https://statmodeling.stat.columbia.edu/2020/08/02/the-typical-set-and-its-relevance-to-bayesian-computation/.
Giraud-Carrier, C., & Provost, F. (2005). Toward a justification of meta-learning: Is the no free lunch theorem a show-stopper? Proceedings of the ICML-2005 Workshop on Meta-Learning.
Girolami, M., Calderhead, B., & Chin, S. A. (2019). Riemannian Manifold Hamiltonian Monte Carlo. arXiv:0907.1100 [Cs, Math, Stat]. http://arxiv.org/abs/0907.1100
Golub, G. H., Heath, M., & Wahba, G. (1979). Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter. Technometrics, 21(2), 215–223. https://doi.org/10.1080/00401706.1979.10489751
Görtler, J., Kehlbeck, R., & Deussen, O. (2019). A Visual Exploration of Gaussian Processes. Distill, 4(4), e17. https://doi.org/10.23915/distill.00017
Gramacy, R. B. (n.d.). Surrogates.
Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On Calibration of Modern Neural Networks. Proceedings of the 34th International Conference on Machine Learning, 1321–1330. https://proceedings.mlr.press/v70/guo17a.html
Hand, D. J. (2009). Measuring classifier performance: A coherent alternative to the area under the ROC curve. Machine Learning, 77(1), 103–123. https://doi.org/10.1007/s10994-009-5119-5
Harris, D. J. (2015). Generating realistic assemblages with a joint species distribution model. Methods in Ecology and Evolution, 6(4), 465–473. https://doi.org/10.1111/2041-210X.12332
Hastie, T. (2020). Ridge Regularization: An Essential Concept in Data Science. Technometrics, 62(4), 426–433. https://doi.org/10.1080/00401706.2020.1791959
Hastie, T., & Tibshirani, R. (1987). Generalized Additive Models: Some Applications. Journal of the American Statistical Association, 82(398), 371–386. https://doi.org/10.1080/01621459.1987.10478440
Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning data mining, inference, and prediction. Springer. http://public.eblib.com/EBLPublic/PublicView.do?ptiID=437866
Hensman, J., & Ghahramani, Z. (n.d.). Scalable Variational Gaussian Process Classification. 10.
Irfan, M. O., & Bull, P. (2021). Cleaning foregrounds from single-dish 21 Cm intensity maps with Kernel principal component analysis. Monthly Notices of the Royal Astronomical Society, 508(3), 3551–3568. https://doi.org/10.1093/mnras/stab2855
Jakkala, K. (2021). Deep Gaussian Processes: A Survey. arXiv:2106.12135 [Cs, Stat]. http://arxiv.org/abs/2106.12135
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). Springer.
Janson, L., Fithian, W., & Hastie, T. (2015). Effective Degrees of Freedom: A Flawed Metaphor. Biometrika, 102(2), 479–485. https://doi.org/10.1093/biomet/asv019
Jones, A. (2021). The Matérn class of covariance functions. In Andy Jones. https://andrewcharlesjones.github.io/journal/matern-kernels.html
Jović, A., Brkić, K., & Bogunović, N. (2015). A review of feature selection methods with applications. 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 1200–1205. https://doi.org/10.1109/MIPRO.2015.7160458
Jurek, M., & Katzfuss, M. (2022). Scalable Spatio-Temporal Smoothing via Hierarchical Sparse Cholesky Decomposition. arXiv. https://doi.org/10.48550/arXiv.2207.09384
Kammann, E. E., & Wand, M. P. (2003). Geoadditive models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 52(1), 1–18. https://doi.org/10.1111/1467-9876.00385
Krasser, M. (2018). Gaussian processes. http://krasserm.github.io/2018/03/19/gaussian-processes/
Krasser, M. (2020). Sparse Gaussian processes. http://krasserm.github.io/2020/12/12/gaussian-processes-sparse/
Kuhn, M. (2017). Nested resampling with rsample. In Applied Predictive Modeling. http://appliedpredictivemodeling.com/blog/2017/9/2/njdc83d01pzysvvlgik02t5qnaljnd
Kumar, I. E., Venkatasubramanian, S., Scheidegger, C., & Friedler, S. (2020). Problems with Shapley-value-based explanations as feature importance measures. Proceedings of the 37th International Conference on Machine Learning, 5491–5500. http://proceedings.mlr.press/v119/kumar20e/kumar20e.pdf
Lambert, B., & Vehtari, A. (2022). R\({_\ast}\): A Robust MCMC Convergence Diagnostic with Uncertainty Using Decision Tree Classifiers. Bayesian Analysis, 17(2), 353–379. https://doi.org/10.1214/20-BA1252
Larsen, K. (2015). GAM: The Predictive Modeling Silver Bullet | Stitch Fix Technology Multithreaded. In MultiThreaded (StitchFix). https://multithreaded.stitchfix.com/blog/2015/07/30/gam/
Lee, J. D., Sun, Y., & Saunders, M. A. (2014). Proximal Newton-Type Methods for Minimizing Composite Functions. SIAM Journal on Optimization, 24(3), 1420–1443. https://doi.org/10.1137/130921428
Lindeløv, J. K. (2019). Common statistical tests are linear models (or: How to teach stats). https://lindeloev.github.io/tests-as-linear/
Loh, W.-Y., & Vanichsetakul, N. (1988). Tree-Structured Classification via Generalized Discriminant Analysis. Journal of the American Statistical Association, 83(403), 715–725. https://doi.org/10.1080/01621459.1988.10478652
Marra, G., & Wood, S. N. (2011). Practical variable selection for generalized additive models. Computational Statistics & Data Analysis, 55(7), 2372–2387. https://doi.org/10.1016/j.csda.2011.02.004
McCormick, T. (2021). The "given data" paradigm undermines both cultures. arXiv:2105.12478 [Cs, Stat]. http://arxiv.org/abs/2105.12478
Meinshausen, N. (n.d.). Quantile Regression Forests. 17.
Milà, C., Mateu, J., Pebesma, E., & Meyer, H. (2022). Nearest neighbour distance matching Leave-One-Out Cross-Validation for map validation. Methods in Ecology and Evolution, 13(6), 1304–1316. https://doi.org/10.1111/2041-210X.13851
Miller, A. C., Foti, N. J., & Fox, E. B. (2021). Breiman’s two cultures: You don’t have to choose sides. arXiv:2104.12219 [Cs, Stat]. http://arxiv.org/abs/2104.12219
Minderer, M., Djolonga, J., Romijnders, R., Hubis, F., Zhai, X., Houlsby, N., Tran, D., & Lucic, M. (2021). Revisiting the Calibration of Modern Neural Networks. Advances in Neural Information Processing Systems, 34, 15682–15694. https://proceedings.neurips.cc/paper/2021/hash/8420d359404024567b5aefda1231af24-Abstract.html
Mount, J. (2012). How robust is logistic regression? In Win Vector LLC. https://win-vector.com/2012/08/23/how-robust-is-logistic-regression/.
Murtaugh, P. A. (2009). Performance of several variable-selection methods applied to real ecological data. Ecology Letters, 12(10), 1061–1068. https://doi.org/10.1111/j.1461-0248.2009.01361.x
Nazarathy, Y., & Klok, H. (2021). Statistics with Julia: Fundamentals for data science, machine learning and artificial intelligence. Springer International Publishing. https://doi.org/10.1007/978-3-030-70901-3
Neal, R. M. (2012). Bayesian learning for neural networks (Vol. 118). Springer Science & Business Media.
Paciorek, C., & Schervish, M. (2003). Nonstationary Covariance Functions for Gaussian Process Regression. Advances in Neural Information Processing Systems, 16. https://proceedings.neurips.cc/paper/2003/hash/326a8c055c0d04f5b06544665d8bb3ea-Abstract.html
Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238. https://doi.org/10.1109/TPAMI.2005.159
Perperoglou, A., Sauerbrei, W., Abrahamowicz, M., & Schmid, M. (2019). A review of spline function procedures in R. BMC Medical Research Methodology, 19(1), 46. https://doi.org/10.1186/s12874-019-0666-3
Poynor, V., & Munch, S. (2017). Combining functional data with hierarchical Gaussian process models. Environmental and Ecological Statistics, 24(2), 175–199. https://doi.org/10.1007/s10651-017-0366-2
Prechelt, L. (2012). Early Stopping - but when? In G. Montavon & K.-R. Müller (Eds.), Neural Networks: Tricks of the Trade (pp. 53–67). http://page.mi.fu-berlin.de/~prechelt/Biblio/stop_tricks1997.pdf
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2007). Numerical Recipes 3rd Edition: The Art of Scientific Computing (3rd ed.). Cambridge University Press.
Raper, S. (2020). Leo Breiman’s "Two Cultures". Significance, 17(1), 34–37. https://doi.org/10.1111/j.1740-9713.2020.01357.x
Rasmussen, C. E., & Williams, C. K. I. (2005). Gaussian Processes for Machine Learning. The MIT Press.
Ratz, A. V. (2021). Can QR Decomposition Be Actually Faster? Schwarz-Rutishauser Algorithm. In Medium. https://towardsdatascience.com/can-qr-decomposition-be-actually-faster-schwarz-rutishauser-algorithm-a32c0cde8b9b.
Reiss, P. T., & Ogden, R. T. (2009). Smoothing parameter selection for a class of semiparametric linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 505–523. https://doi.org/10.1111/j.1467-9868.2008.00695.x
Riihimäki, J., & Vehtari, A. (2010). Gaussian processes with monotonicity information. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 645–652. https://proceedings.mlr.press/v9/riihimaki10a.html
Roberts, D. R., Bahn, V., Ciuti, S., Boyce, M. S., Elith, J., Guillera-Arroita, G., Hauenstein, S., Lahoz-Monfort, J. J., Schröder, B., Thuiller, W., Warton, D. I., Wintle, B. A., Hartig, F., & Dormann, C. F. (2017). Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography, 40(8), 913–929. https://doi.org/10.1111/ecog.02881
Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2), 319–392. https://doi.org/10.1111/j.1467-9868.2008.00700.x
Sansó, B., Schmidt, A. M., & Nobre, A. A. (2008). Bayesian Spatio-Temporal Models Based on Discrete Convolutions. The Canadian Journal of Statistics / La Revue Canadienne de Statistique, 36(2), 239–258. http://www.jstor.org/stable/20445307
Schölkopf, B., Smola, A., & Müller, K.-R. (1997). Kernel principal component analysis. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Artificial Neural Networks ICANN’97 (pp. 583–588). Springer. https://doi.org/10.1007/BFb0020217
Shafer, G., & Vovk, V. (2008a). A tutorial on conformal prediction. Journal of Machine Learning Research, 9, 371–421. https://jmlr.csail.mit.edu/papers/volume9/shafer08a/shafer08a.pdf
Shafer, G., & Vovk, V. (2008b). A Tutorial on Conformal Prediction. Journal of Machine Learning Research, 9, 371–421.
Shalizi, C. R. (2022). Advanced data analysis from an elementary point of view. https://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/
Sigrist, F. (2018). Gradient and Newton Boosting for Classification and Regression. In arXiv.org. https://doi.org/10.48550/arXiv.1808.03064
Silge, M. K. and J. (n.d.). 18 Explaining Models and Predictions | Tidy Modeling with R.
Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2011). Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. Journal of Statistical Software, 39(5), 1–13. https://doi.org/10.18637/jss.v039.i05
Stone, M. (1977). An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike’s Criterion. J. Royal Stat. Soc. B, 39(1), 44–47. https://www.jstor.org/stable/2984877
Taquet, V. (2021). With MAPIE, uncertainties are back in machine learning ! In Medium. https://towardsdatascience.com/with-mapie-uncertainties-are-back-in-machine-learning-882d5c17fdc3
Valavi, R., Elith, J., Lahoz-Monfort, J. J., & Guillera-Arroita, G. (2019). blockCV: An r package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models. Methods in Ecology and Evolution, 10(2), 225–232. https://doi.org/10.1111/2041-210X.13107
van den Goorbergh, R., Smeden, M. van, Timmerman, D., & Van Calster, B. (2022). The harm of class imbalance corrections for risk prediction models: Illustration and simulation using logistic regression. Journal of the American Medical Informatics Association, ocac093. https://doi.org/10.1093/jamia/ocac093
van Houwelingen, J. C. (2001). Shrinkage and Penalized Likelihood as Methods to Improve Predictive Accuracy. Statistica Neerlandica, 55(1), 17–34. https://doi.org/10.1111/1467-9574.00154
Vehtari, A. (2021). Gaussian process demonstration with Stan. https://avehtari.github.io/casestudies/Motorcycle/motorcycle_gpcourse.html
Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P.-C. (2021). Rank-Normalization, Folding, and Localization: An Improved R^ for Assessing Convergence of MCMC (with Discussion). Bayesian Analysis, 16(2), 667–718. https://doi.org/10.1214/20-BA1221
Venables, W. N. (1998). Exegeses on Linear Models. http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf
Wager, S., Hastie, T., & Efron, B. (n.d.). Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife.
Wainer, J., & Cawley, G. (2021). Nested cross-validation when selecting classifiers is overzealous for most practical applications. Expert Systems with Applications, 182, 115222. https://doi.org/10.1016/j.eswa.2021.115222
Walters, C. J., & Ludwig, D. (1981). Effects of Measurement Errors on the Assessment of Stock–Recruitment Relationships. Canadian Journal of Fisheries and Aquatic Sciences, 38(6), 704–710. https://doi.org/10.1139/f81-093
Wand, M. P., & Ormerod, J. T. (2011). Penalized wavelets: Embedding wavelets into semiparametric regression. Electronic Journal of Statistics, 5(none). https://doi.org/10.1214/11-EJS652
Warnes, J. J., & Ripley, B. D. (1987). Problems with Likelihood Estimation of Covariance Functions of Spatial Gaussian Processes. Biometrika, 74(3), 640–642. http://www.jstor.org/stable/2336705
Wenger, J., Pleiss, G., Hennig, P., Cunningham, J. P., & Gardner, J. R. (2022). Preconditioning for Scalable Gaussian Process Hyperparameter Optimization. arXiv. https://doi.org/10.48550/arXiv.2107.00243
Wenger, S. J., & Olden, J. D. (2012). Assessing transferability of ecological models: An underappreciated aspect of statistical validation. Methods in Ecology and Evolution, 3(2), 260–267. https://doi.org/10.1111/j.2041-210X.2011.00170.x
Witten, D. M., Tibshirani, R., & Hastie, T. (2009a). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, kxp008. https://doi.org/10.1093/biostatistics/kxp008
Witten, D. M., Tibshirani, R., & Hastie, T. (2009b). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, kxp008. https://doi.org/10.1093/biostatistics/kxp008
Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82. https://doi.org/10.1109/4235.585893
Wood, S. N. (2003). Thin plate regression splines. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(1), 95–114. https://doi.org/10.1111/1467-9868.00374
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(1), 3–36. https://doi.org/10.1111/j.1467-9868.2010.00749.x
Wood, S. N. (2017a). Generalized Additive Models: An Introduction with R. Chapman & Hall. https://www.amazon.com/Generalized-Additive-Models-Introduction-Statistical-ebook/dp/B071Z9L5D5/ref=sr_1_1?ie=UTF8&qid=1511887995&sr=8-1&keywords=wood+additive+models
Wood, S. N. (2017b). P-splines with derivative based penalties and tensor product smoothing of unevenly distributed data. Statistics and Computing, 27(4), 985–989. https://doi.org/10.1007/s11222-016-9666-x
Yang, Y. (2005). Can the strengths of AIC and BIC be shared? A conflict between model identification and regression estimation. Biometrika, 92(4), 937–950. https://doi.org/10.1093/biomet/92.4.937
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49–67. https://doi.org/10.1111/j.1467-9868.2005.00532.x
Zhang, L. (2018). Nearest Neighbor Gaussian Processes (NNGP) based models in Stan. In Stan Case Studies. https://mc-stan.org/users/documentation/case-studies/nngp.html
Zhao, S., Witten, D., & Shojaie, A. (2021). In defense of the indefensible: A very naïve approach to high-dimensional inference. Statistical Science, 36(4), 562–577. https://doi.org/10.1214/20-STS815
Zou, H., Hastie, T., & Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. The Annals of Statistics, 35(5), 2173–2192. https://doi.org/10.1214/009053607000000127