managing R on HPC systems

In general, R scripts can be run just like any other kind of program on an HPC (high-performance computing) system. However, there are a few peculiarities. This document compiles some helpful practices; it should be useful to people who are familiar with R but unfamiliar to HPC, and the reverse.

Some of these instructions will be specific to Compute Canada ca. 2022, and particularly to the Graham cluster.

There is also useful information at the Digital Research Alliance of Canada wiki (en français aussi.

I assume that you’re slightly familiar with HPC machinery (i.e. you’ve taken the Compute Canada orientation session, know how to use sbatch/squeue/etc. to work with the batch scheduler)

Below, “batch mode” means running R code from an R script rather than starting R and typing commands at the prompt (i.e. “interactive mode”); “on a worker” means running a batch-mode script via the SLURM scheduler (i.e. using sbatch) rather than in a terminal session on the head node. Commands to be run within R will use an R> prompt, those to be run in the shell will use sh>.

running scripts in batch mode

Given an R script stored in a .R file, there are a few ways to run it in batch mode:

r: an improved batch-R version by Dirk Eddelbuettel. You can install it by installing the littler package from CRAN (see “Installing Packages” below) and running
```
mkdir ~/bin
cd ~/bin
ln -s ~/R/x86_64-pc-linux-gnu-library/4.1/littler/bin/r
```
in the shell. (You may need to adjust the path name for your R version.)
Rscript <filename>: by default, output will be printed to the standard output (which will end up in your .log file)
- one weirdness of Rscript is that it does not load the methods package by default, which may occasionally surprise you - if your script directly or indirectly uses stuff from methods you need to load it explicitly with library("methods")
R CMD BATCH <filename>: this is similar but automatically sends output to <filename>.out
This StackOverflow question says that r > Rscript > R CMD BATCH (according to the author of r …)

loading modules

R is often missing from the set of programs that is available by default on HPC systems. Most HPC systems use the module command to make different programs, and different versions of those programs, available for your use.

If you try to run R in interactive mode on Graham, the system will pop up a long list of possible modules to load and ask you which one you want. The first choice (currently r/4.1.2 ) is the default, and generally the best option.
You can make this happen manually by typing module load r/4.1.2 at the shell prompt.
if you try to run R in batch mode without loading the module first, or if you try to run R on a worker, you’ll get an error. In order to run R on a worker you should add module load r/4.1.2 to your batch script.
From time to time you may find that your scripts require other modules …

installing packages

if done in batch mode, need to specify the repository, i.e. something like options(repos = c(CRAN = "https://cloud.r-project.org")) (this is a safe default value)
in order to install your own packages you need to have created and specified a user-writable directory. If you are working interactively the first time you try to install packages for a particular version of R, R will prompt you for whether you want to create such a directory (yes) and where to put it (the default is ~/R/x86_64-pc-linux-gnu-library/<R-version>). (If you are in batch mode you’ll get an error.)
It’s probably easiest to install packages from the head node, either by running install.packages("<pkg>" in an interactive R session, or by running an R script that defines and installs a long list of packages, e.g. r pkgs <- c("broom", "mvtnorm", <...>) install.packages(pkgs) It’s generally OK to run short (<10 minutes) interactive jobs like this in interactive mode, on the head node.
The main reason to do package installation on the head node is that worker nodes don’t have network access, so you won’t be able to download packages from CRAN. If absolutely necessary you can work around this by downloading tarballs from CRAN (onto the head node, or onto your own machine and then copying them to SHARCnet), but this can be really annoying because you will also have to download tarballs for all of the dependencies of the package you want, and install them in the right order - when you install directly from CRAN this all gets handled automatically. If you have downloaded a package tarball mypkg_0.1.5.tar.gz, use install.packages("mypkg_0.1.5.tar.gz", repos=NULL) from within R or R CMD INSTALL mypkg_0.1.5.tar.gz from the shell to install it.
if you really do want to install packages in batch mode, we can probably put together some machinery using
- https://stackoverflow.com/a/40391302/190277

running R jobs via job array

A job array is the preferred method for submitting multiple batch jobs
The number of jobs and the indices used in the job array are restricted by the slurm configuration variable MAX_ARRAY_SIZE. This means even if you use steps in your array indices ex. --array=0-20000:10000 where the number of jobs is only 3, the job array will not run because the maximum index (20000) is larger than MAX_ARRAY_SIZE-1. Run something like $scontrol show config | grep -E 'MaxArraySize|MaxJobCount' to determine SLURM configuration. (slurm job array support)
To use job arrays effectively with R scripts, you need to know how to use commandArgs() to read command-line arguments from within an R script. For example, if batch.R contains:

cc <- commandArgs(trailingOnly  = TRUE)
intarg <- as.integer(cc[1])
chararg <- cc[2]
cat(sprintf("int arg = %d, char arg = %s\n", intarg, chararg))

then running Rscript batch.R 1234 hello will produce

int arg = 1234, char arg = hello

(note that all command-line arguments are passed as character, must be converted to numeric as necessary). If you want fancier argument processing than base R provides (e.g. default argument values, named rather than positional arguments), see this Stack Overflow question for some options.

general performance tips

vectorize!

interactive sessions

While Compute Canada is generally meant to be run in batch mode, it is sometimes convenient to do some development/debugging in short (<3 hour) interactive sessions.

console

See https://docs.alliancecan.ca/wiki/Running_jobs#Interactive_jobs

in RStudio (or Jupyter notebook)

log into https://jupyterhub.sharcnet.ca with your Compute Canada username/password
click on the ‘softwares’ icon (left margin), load the rstudio-server-... module
an RStudio icon will appear – click it!
this session does not have internet access, but it does see all of the files in your user space (including packages that you have installed locally)
You can run Jupyter notebooks, etc. too (I don’t know if there is a way to run a Jupyter notebook with a Python kernel …)
- https://docs.computecanada.ca/wiki/Jupyter
- https://docs.alliancecan.ca/wiki/JupyterHub#RStudio
it might make sense to ‘reserve’ your session in advance (so you don’t have to wait a few minutes for it to start up), not yet sure how to do that …

questions for SHARCnet folks

can you use multithreading and multiprocessing at once?
confirming SLURM argument --cpus-per-task in single threaded computing

miscellaneous useful (?) links

https://wiki.math.uwaterloo.ca/fluidswiki/index.php?title=Graham_Tips
https://helpwiki.sharcnet.ca/wiki/images/3/36/Webinar2016-parallel-hpc-R.pdf
- this is useful in being written by SHARCnet folks, but is not super-useful: (1) out of date in some ways (i.e. SHARCnet no longer recommends spawning multiple batch submissions via shell for loop, they prefer job arrays), (2) much of the document focuses on general performance tips for high-performance computing in R (using vectorization, packages for out-of-memory computation, etc..) that are not specific to running on HPC clusters
https://cran.r-project.org/web/packages/rslurm/vignettes/rslurm.html
- SHARCnet support recommends not using this package. “The slurm settings on the national systems are set to include group accounting and memory request information that determines your group fairshare and priority. Bypassing that may result in unexpected results.”
https://docs.alliancecan.ca/wiki/META:_A_package_for_job_farming.
- this was recommended by SHARCnet support for embarrassingly parallel jobs in conjunction with job arrays

levels of parallelization

threading: shared memory, lightweight. Base R doesn’t do any multithreading. Multithreaded computations can be done in R (1) by making use of a multithreaded BLAS (linear algebra) library, (2) using a package that includes multithreaded computation (typically via the OpenMP system) within C++ code (e.g. glmmTMB).
- It may sometimes be necessary to suppress multithreaded computation
  - OpenMP is usually controlled by setting the shell environment export OMP_NUM_THREADS=1, but there may be controls within an R package as well.
  - BLAS threading specifically can be controlled via RhpcBLASctl::blas_set_num_threads() (see here)
multicore/multiprocess: parallelization at these levels can be implemented within R. There are a bunch of different packages/front ends to handle this, almost all ultimately rest on either the RMPI package (see below) or the parallel package: foreach, doParallel, future, furrr, …
- if using these tools (i.e. parallelization within R) you probably want to figure out the number of chunks N and then define a virtual cluster with N cores within R (e.g. parallel::makeCluster(N)) and set ntasks==N in your submission script (and let the scheduler pick the number of CPUs etc.)
- MPI-based: you probably don’t want to use MPI unless you are doing ‘fancy’ parallel computations that require inter-process communication during the course of the job (you can still use MPI but it’s a waste if you don’t need the communication); requires more specialized cluster resources etc.
via batch scheduler: this is probably the most efficient way to handle distributed/embarrassingly parallel problems
- see comments elsewhere about META/job arrays
deciding on chunk size/number of chunks: META documentation suggests that (1) lower bound on chunking is that each chunk should take >20 minutes computation time (otherwise too much scheduler overhead); max < 1000.
- overall speed/wall-clock time of completion
- resource grabbiness/queue wait time
a useful primer on threading vs multicore
some tips/things to avoid when parallelizing (e.g. be aware of memory constraints, overparallelization …)

parallelization and SLURM

Determining the number of nodes/cores/processes to request using SLURM will depend on which R package is used for parallelization. The foreach package supports both multi-core (multiple cores on a single node/computer) and multiprocessing (multiple processes within a single node or across multiple nodes in a cluster) parellelization. This is an example on how to run both using foreach, including how to ensure R and SLURM are communicating via the shell script, https://docs.alliancecan.ca/wiki/R#Exploiting_parallelism_in_R

You should not let the R package you are using detect and try to use the number of available cores when using HPC, you should instead always specify the number to use.

When setting SLURM #SBATCH arguments, here are some helpful notes: - A task in SLURM is a process - a process uses one CPU core if it is single threaded. - How tasks are allocated across cores and nodes can be specified using the arguments --nodes, --ntasks, and --ntasks-per-node (--cpus-per-task is specific to multi-threading). Some helpful task allocation examples: https://support.ceci-hpc.be/doc/_contents/SubmittingJobs/SlurmFAQ.html#q05-how-do-i-create-a-parallel-environment - The task allocation you choose will affect job scheduling. Requesting multiple tasks without specifying the number of nodes (if you don’t require all tasks to be on the same node) puts fewer constraints on the system. Requesting a full node --nodes=1 --ntasks-per-node=32 on the Graham cluster has a scheduling advantage but can be seen as abuse if this is not required. https://docs.alliancecan.ca/wiki/Job_scheduling_policies#Whole_nodes_versus_cores