cc Licensed under the Creative Commons attribution-noncommercial license. Please share & remix noncommercially, mentioning its origin.

challenges & solutions

errors/problems

optimization simply crashes due to:

  • predictions outside of feasible range (counts <0, prob outside of (0,1))
  • underflow/overflow
  • convergence to extreme/flat ranges

(L-BFGS-B is particularly non-robust)

improve objective function

  • check for bugs!
  • use more robust expressions
    • compute on log scale if possible
    • 1/(1+exp(-x)) (or plogis(x)) rather than exp(x)/(1+exp(x))
    • use lower.tail= for tail probabilities
  • “clamping”: impose minima/maxima to avoid over/underflow (e.g. see make.link("cloglog")$linkinv)
  • evaluate expressions in limits

better starting conditions

  • better understanding of problem
  • neutral values (e.g. \(\beta=0\) for linear-like models)
  • build up fit from less complex/reduced models
  • heuristic self-starting algorithms to find starting values
    apropos("^SS[a-z]",ignore.case=FALSE) (link, code)

convergence warnings

KKT (Kuhn-Karush-Tucker) conditions

  • first and second order conditions for “nonlinear programming”, i.e. nonlin optimization with constraints (Wikipedia)
  • optextras::kktchk()
  • unconstrained: simplifies to gradient=0, Hessian=positive definite (for minimization)

lme4 warnings

  • “toast-scraping” (see here)
  • generally identifies less-stable problems (grad>0.1 is often a problem)
  • not many solutions other than scaling, centering, trying other optimizers

lots of parameters (high dimensionality)

  • not really an intrinsic problem
  • hard to visualize
  • slow

visualization

  • slices
  • expand.grid() plus ggplot (geom_tile() + facet_grid() works up to 4D: maybe ggforce::facet_grid_paginate() if you’re crazy)

slowness

  • Ross (2013) link:
    • rewrite in C++
    • parallelize (hard in this case)

discontinuities and thresholds

  • Nelder-Mead
  • profiling across discrete values
    general approach for any “difficult” parameter, e.g. \(t\) degrees of freedom

multi-modality

  • can be very hard to tell (Simon Wood examples)
  • check convergence
  • KKT criteria, locally
  • multi-start algorithms
    • random
    • Latin hypercube
    • Sobol sequences
  • cumulative distribution of neg log likelihoods (Raue et al. 2013)
  • stochastic global optimization

constraints (box)

  • independent inequality constraints
  • built into some optimizers (optim/L-BFGS-B, nloptr tools)
  • transformations
    • convenient; can also improve Wald approximation, improve scaling
    • bad if parameter is actually on the boundary
  • e.g. negative binomial parameter: var=\(\mu(1+\mu/k)\). Often use \(\log(k)\) to keep it positive, but what if equi/underdispersed? Use \(1/k\) with lower bound at 0?

parameterization

  • mathematically pretty: \(ax/(1+bx)\)
  • simpler units: \(ax/(1+x/c)\)
  • traditional: \(ax/(1+ahx)\)
  • separate features: \(ax/(1+(a/d)x)\), \(d>0\)

references

Raue, Andreas, Marcel Schilling, Julie Bachmann, Andrew Matteson, Max Schelke, Daniel Kaschek, Sabine Hug, et al. 2013. “Lessons Learned from Quantitative Dynamical Modeling in Systems Biology.” PLOS ONE 8 (9): e74335. doi:10.1371/journal.pone.0074335.

Ross, Noam. 2013. “FasteR! HigheR! StrongeR! - A Guide to Speeding Up R Code for Busy People.” Noam Ross. http://www.noamross.net/blog/2013/4/25/faster-talk.html.