Parallel_and_remote

Parallel and remote execution

Phoenix NLME jobs can be executed remotely using NLME Job Control System (JCS) or Phoenix JMS. Below is a comparison of the two options:

Remote Windows submission

JCS: No JMS: Yes

Remote RHEL 8 or Ubuntu 22 submission

JCS: Yes JMS: No

Disconnect/reconnect/stop

JCS: Yes JMS: Yes

Progress reporting

JCS: Yes JMS: Yes

Remote software required

JCS:

GCC

R ≥ 4.0

Certara.NLME8 R package and dependencies

ssh

MPI (Open MPI for RHEL 8 or Ubuntu 22 platforms, MS-MPI for Windows)

JMS: Full Phoenix Installation

Parallelization method

JCS:

Local MPI

MPI for within job parallelization

RHEL 8 or Ubuntu 22 Grid (TORQUE, SGE, LSF, SLURM)* or MultiCore for between job parallelization

JMS: Local MPI

*TORQUE = Terascale Open source Resource and QUEue Manager, SGE = Sun Grid Engine, LSF = Platform Load Sharing Facility, SLURM = Simple Linux Utility for Resource Management.

This section focuses on the Job Control Setup, for information on JMS, refer to the “Job Management System” documentation. For parallel processing on a remote host or for using multicore parallelization locally, additional files must be installed. Refer to the “Installing job control files” section for more details.

Phoenix NLME jobs can be executed on several platform setups, enabling the program to take full advantage of available computing resources. All run modes can be executed locally as well as remotely.

One NLME job can be executed using:

Single core on the local host

The default configuration profile is as follows:

Single_Local_Config

Single core on a remote host

An example of the configuration profile is as follows:

Single_Remote_Config

Using By Subject (first order engines, IT2S-EM)/By Sample (QRPEM) MPI parallelization on the host

Subject/Sample method of MPI parallelization is chosen automatically based on the engine chosen. MPI software is required (MS MPI is distributed within Phoenix for Windows runs) and, for remote host, Certara.NLME8 R package should be installed. OpenMPI could be installed from the official site (https://www.open-mpi.org/software/ompi/). Note that an additional environment variable pointing to the root of the OpenMPI installation (the default is /lib64/openmpi/) should be available for the ssh session of the current user.

Each model and dataset are unique and the analyst needs to explore the best solution for the current project. However, there are some general guidelines that can be applied to most projects. The speed of model execution is based on the number of computational cores, the speed of those computational cores, and the speed of writing to disk. In general, it is thought that an increase in number of cores will result in a decrease in computation time. Thus, parallelizing by 8 MPI threads will be 2x faster than 4 threads. This is true but the relationship is not linear, due to existing overhead of some unparalleled segments and the overhead of collecting results from different threads.

For Windows platforms, the default profile with MPI parallelization is as follows:
Parallel_MPI_Windows_Config

For RHEL 8 or Ubuntu 22 remote runs, an example profile for MPI parallelization is as follows:
Parallel_MPI_Linux_Config

Execution can be parallelized by job level for the following run modes:

Simple (Sorted datasets)

Scenarios

Bootstrap

Stepwise Covariate search

Shotgun Covariate search

Profile

The implemented methods of “by job” parallelization are:

Multicore: multiple jobs are executed in parallel on the local or remote host

Certara.NLME8 R package should be installed on the chosen host. Note that the Multicore method cannot be used within MPI (by Subject/by Sample MPI parallelization within each job).

The number of processes run in parallel can be controlled (Number of cores field).

An example of Windows local configuration is as follows.
Parallel_Multicore_Windows_Config

An example of RHEL 8 remote configuration is as follows:
Parallel_Multicore_Linux_Config

Submission to a supported remote RHEL 8 grid

Supported grids are SGE, LSF, TORQUE, and SLURM. The number of cores in the configuration for the grid means the number of nodes to be used.

Configuration templates can be modified for job scripts for each grid used by the batchtools R package (i.e., C:\Program Files (x86)\Certara\Phoenix\application\lib\NLME\Executables\batchtools.slurm.tmpl), if necessary. For details, refer to the bachtools documentation on the CRAN website or the documentation for the chosen grid.

Number of cores in a grid configuration is the number of cores to be used. Each host can be configured to have multiple cores and each core can handle a separate job.

An example profile for submission to the TORQUE grid is as follows:

TORQUE_Linux_Config

The startup script is executed before running R, it could be used for grid/MPI initialization.

The NLME jobs submitted to the grid can be parallelized using MPI if the system has the appropriate MPI service installed and the Parallel mode is set to one of the three *_MPI options (LSF_MPI, SGE_MPI, TORQUE_MPI, or SLURM_MPI (to parallelize the runs as by job as well as by Sample/Subject within each job).

For any of the *_MPI modes, the number of cores to be used for each job in parallelization will be calculated as the smallest of the following 2 numbers:

The number of cores in the configuration divided by the number of jobs.
Or

The number of unique subjects in a specific job divided by 3. If there is an uneven number of unique subjects in each replicate, the smallest number of subjects will be used for the calculation.

Example 1: There are 300 cores available, according to the configuration profile, 4 jobs requested (replicates), and 200 subjects in each replicate. Each of the 4 replicates would parallelize across 66 cores (300/4 = 75. 200/3 = 66. 66 < 75). Total cores used = 264.

Example 2: There are 100 cores available, according to configuration profile, 3 jobs requested (replicates), and 300 subjects in each replicate. Each of the 3 replicates would parallelize across 33 cores (100/3 = 33. 300/3 = 100. 33 < 100). Total cores used = 99.

An example of the configuration profile is as follows:

LSF_Linux_Config

Note: For some grid configurations, the number of calculated MPI cores for the job cannot exceed the total number of hosts available on the grid. This can cause the software to ask for more hosts to do the computation than are available and result in the job freezing or exiting with an error. In such cases, it is advised to switch to the grid mode without MPI.