Job control for parallel and remote execution

The following topics are discussed:

Installing job control files

MPI configuration

Users have the option to execute Phoenix NLME jobs remotely using NLME Job Control System or Phoenix JMS. Below is a comparison of the two options:

 

NLME Job Control System

Phoenix JMS

Remote Windows submis­sion

No

Yes

Remote Linux submission

Yes

No

Disconnect/reconnect/stop

Yes

Yes

Progress reporting

Yes

Yes

Remote software required

GCC

Full Phoenix installation

R (batchtools, XML, reshape, Rmpi, Cer­tara.NLME8)

Parallelization method

Local MPI

Local MPI

MPI Cluster

Linux Grid

MultiCore

LSF

Caution:The number of available cores specified for a grid cannot exceed the total number of cores for which the grid is configured. This can cause the software to ask for more cores to do the computa­tion than are available and result in the job remaining in the queue.

This section focuses on the Job Control Setup, for information on JMS, refer to “Job Management System (JMS)”. Phoenix NLME jobs can be executed on a number of different platform setups, enabling the program to take full advantage of available computing resources:

Single core on local host

By Subject MPI parallelization on local host

By Sample multiple core parallelization on local or MPI cluster host

By Sample MPI parallelization on local host

Remote Linux execution

Remote Linux execution with by Sample multiple core parallelization

Remote GRID parallel submission of supported scenarios

Installing job control files

Additional software and libraries are required for certain platform setups.

R 3.2 (or earlier) can be obtained from http://cran.r-project.org.

R batchtools (v.0.9.2) handles submission of NLME replicates to a Linux/Windows GRID. Install from the R user interface using the Packages >> Install Packages menu option.

R reshape (v 0.8.6) is used for summarizing NLME spreadsheet. Install on all Linux machines either from the R user interface using the Packages >> Install Packages menu option.

R XML (v 3.98) is used to manage reading/writing progress.xml Install from the R user interface using the Packages >> Install Packages menu option.

Certara.NLME8_0.0.1.0000.tar.gz is the R package for NLME distributed with Phoenix, located in <Phoenix_install_dir>\application\lib\NLME\Executable. From the R user interface (as administrator), type the following command:
install.packages(“Cer­tara.NLME8_0.0.1.0000.tar.gz”,repos=NULL,type=”source”).

SSH/Sftp needs to be installed on all Linux machines that will be used with the MultiCore par­allelization method or act as a submission host to a GRID. SSH is installed by default on Red Hat Enterprise Linux (RHEL) version 6.x. To facilitate job submission from Windows to Linux, NLME in Phoenix (on Windows) contains an SSH library.

For grid execution (SGE/LSF/TORQUE), installation of Openmpi (v 1.10.6) as a parallel plat­form on the grid is recommended. Please refer to https://www.open-mpi.org/faq/ for informa­tion. Standard grids have Openmpi installed by default.

R 3.1.1 (or earlier). R can be obtained from http://cran.r-project.org.

Rmpi (Rmpi_0.6-3.zip) provides support for executing NLME replicates in parallel on an MPI cluster. This file is distributed with Phoenix and is located in <Phoenix_in­stall_dir>\Redistributables. Install on all Windows hosts by unzipping it into the following directory: <R_install_dir>\R\R-3.1.1\library.

R reshape (v 0.8.6) is used for summarizing NLME spreadsheet. Install on all Linux machines either from the R user interface using the Packages >> Install Packages menu option.

R XML (v 3.98) is used to manage reading/writing progress.xml Install from the R user interface using the Packages >> Install Packages menu option.

Certara.NLME8_0.0.1.0000.tar.gz: R package for NLME distributed with Phoenix, located in <Phoenix_install_dir>\application\lib\NLME\Executable. From the R user interface (as administrator), type the following command:
install.packages(“Cer­tara.NLME8_0.0.1.0000.tar.gz”,repos=NULL,type=”source”) 

Note:R 3.1.1 was the last version of R which worked with Rmpi and Rmpi is no longer being updated. Users can upgrade to more recent versions of R and run any of the other Job Control modes, just not Remote MPI Clusters.

The R packages mentioned above are also available at https://cran.r-project.org/web/packages.

MPI configuration

MPI configuration is done by the Phoenix installer at the time of installation.

Note:If one of the MPI processes should crash, other MPI processes and the parent process mpiexec.exe may be left running, putting Phoenix NLME in an unpredictable state. Use the Task Manager to check if mpiexec.exe is still running and stop it. This will stop any other MPI processes.

Phoenix Model job status

The NLME Job Status window is automatically displayed when a Phoenix Model object is executed or by selecting Window > View NLME Jobs from the menu. The window provides easy monitoring of all NLME jobs that are executing as well as a history of jobs that have completed during the current Phoenix session.

During local execution of a Phoenix Model, the status window displays the parameter values and gra­dients of each major iteration of the algorithm, with the most recent iteration at the top. The model name, protocol, stage, time that the execution started and stopped, the status of the job, number of runs, and the number of runs that completed or failed are listed at the top. (See “Job control for paral­lel and remote execution” for more execution options.)

The job can be canceled (no results are saved) or stopped early (results obtained up to the time when the job is ended are saved) by right-clicking the Status menu. Specifically, if the Stop early button is clicked during an iteration, the run is ended once the current iteration is completed. If the Stop early button is pushed during the standard error calculation step, then Phoenix stops the run and prepares outputs without standard error results.

Status_Window.png 

When Stop Early is executed, the Overall results worksheet will show the Return Code of 6, indicat­ing that the fitting was not allowed to run to convergence.

Next to the name of the model that is executed in a run, several pieces of information are shown pro­viding a quick summary of the job’s current state. The Status of a job can be In Progress, Finished, or Canceled.

If a job involves multiple iterations, information for each completed iteration is added as a row. Use the button at the beginning of the job row to expand/collapse the iteration rows.

Right-clicking a row displays a menu from which the job can be stopped early (the results up to the last completed iteration are saved) or canceled (no results are saved).

 


Last modified date:6/26/19
Certara USA, Inc.
Legal Notice | Contact Certara
© 2019 Certara USA, Inc. All rights reserved.