Modeling Project Files

Most modeling projects will use three ASCII files: a data file in *.dat, *.csv or *.txt format, a *.txt file that maps the model data columns to Phoenix model columns, and a Phoenix model file in *.mdl or *.txt format. The *.mdl extension is used as a convention to identify PML model files.

Data files
Model files
Column mappings

Data files

The ASCII model data files *.dat, *.csv or *.txt are used for model fitting and the data can be delimited by a space, a comma, or a tab. The first row should identify the column names, and must be preceded by ##. For example, the first row of the example Theophylline dataset listed below looks like this:

## id, wt, dose, time, conc.

Only the period “.” character is an acceptable decimal separator.

Caution:The column header line in a PML dataset must be preceded by ## or Phoenix will not recognize the column headers.

Each subsequent row must contain data for each field. Use a period “.” to represent a null value. The data for the first subject in the example Theophylline dataset thbates.dat are shown below.

thbates.dat
## xid wt dose time   yobs
     1 79.6 4.02    0    .74
     1 79.6 4.02   .25 2.84
     1 79.6 4.02   .57 6.57
     1 79.6 4.02 1.12 10.5
     1 79.6 4.02 2.02 9.66
     1 79.6 4.02 3.82 8.58
     1 79.6 4.02 5.1   8.36
     1 79.6 4.02 7.03 7.47
     1 79.6 4.02 9.05 6.89
     1 79.6 4.02 12.12 5.94
     1 79.6 4.02 24.37 3.28

Dataset row limitations

The vast majority of memory is allocated and de-allocated dynamically as needed. In most cases, peak total memory demands for the Phoenix engines are easily accommodated within the memory available on modern computers (typically at least one gigabyte of memory per processor). However, there are still a number of static limits on model parameters as follows.

Maximum number of subjects=120,000 (This limit applies to all engines except the nonparametric engine, where the maximum number of subjects is 1000.)
Maximum number of observations per subject=unlimited (See limit on total number of observations below.)
Maximum total number of observations=480,000
Maximum number of thetas (fixed effects)=1000 (This includes both fixed effects that are frozen to a user-specified value, as well as free fixed effects that are included in the likelihood optimization, which is given below as 401.)
Maximum number of etas (random effects)=101 (This is also the maximum dimension of the Omega matrix in the diagonal case. Although, if Omega has a full or partial block structure, the maximum dimension will be less (see remarks below for free parameters).)
Maximum number of free parameters to be optimized=401 (This includes both free fixed effect, residual error model, and Omega parameters. Only non-zero Omega parameters on and below the diagonal are counted against this limit. Thus a full block matrix with Neta random effects will consume Neta*(Neta+1)/2 of these parameters, while a diagonal matrix will only consume Neta parameters. Omega matrices with a block diagonal structure will fall somewhere in between as determined by the particular block structure.
Maximum number of QRPEM samples=no limit (Large values (e.g., > 100,000) may cause difficulties with total static+dynamic peak memory demands. Typical values of QRPEM sample size range between 300 and 10,000 and should cause no problems.
Maximum number of iterations=10,000.
Maximum number of covariate categories or occasions=40.

Depending on the available memory and actual combination of model and run parameters, it is possible for very large models technically within the above static limits to require more dynamically allocated memory than is available. However, this should be an extremely rare occurrence and the overwhelming majority of population NLME models should easily fit.

Model files

The Phoenix model file is an ASCII text file that contains the model definition statements. It must follow the general form:

  mdl(variables){
     statements
  }

where mdl is the assigned model name. All models are called test by default, but users can rename them. The (variables) parentheses are normally empty (), but they can contain a list of variables when the model is used in trial simulation. See “Modeling Syntax” for details of the available statements.

The model file for the Theophylline example is shown below. See “Theophylline PML example” for an annotated example.

   fm3theophx.mdl
  fm1theo(){
  # Theophylline model example coding
  # One compartment model with first order absorption
  # Single dose at time=0, explicit concentration prediction formula
     covariate(dose,time)
     fixef(
        tvlKa=c(, 0.5,)
        tvlKe=c(, -2.5,)
        tvlCl=c(, -3.0,)
        )
     ranef(
        diag(nlKa, nlCl)=c(1.0 1.0)
        )
     stparm(
        Ka=exp(tvlKa+nlKa)
        Ke=exp(tvlKe)
        )
     V=Cl/Ke
      cpred=dose
        *Ka
        /(V*(Ka-Ke))
        *(exp(-Ke*time)-exp(-Ka*time))
     error(eps1=c(.5))
     observe(cObs=cpred+eps1)
  }

This is an example model file of a well-known model, and it shows how to code an explicit closed-form single-dose solution. Note that any text string that is initiated by '#' is treated as a comment and does not affect model execution. Later examples show how to create models using differential equations, which are more adaptable to multiple dosing regimens and to trial simulation.

The example above shows a style in which indentation is used consistently throughout. This makes the model self-outlining for readability, and indicates a careful discipline, which makes errors less likely.

Including files in the generated C code

If one or more of the following statement appears in the PML outside of the model definition:

  include(“MyIncludeFile.h”)
  test(){ # model definition
     …
  }

where MyIncludeFile.h is the name of any C-style include file (enclosed in double quotes), it results in the following code being inserted near the top of the generated C file Model.c:

#include(“MyIncludeFile.h”)

This can be useful for purposes such as allowing access to additional functions that a user might include in the compile-and-link step for models.

Column mappings

The column mapping file is an ASCII text file (*.txt) that contains a series of statements that define the association between model concepts and columns in a dataset:

id(subject_id_column_name)

Example: id(SubjectID) says that “SubjectID” is the data column signifying the subject identifier. SubjectID is not used by Phoenix, but the column mapping is still required. It is acceptable to map to a nonexistent column, such as: id(zzzDummyID).

time(time_column_name)

Example: time(T) says that T is the data column signifying time. The time values can be either simple decimal numbers, or they can be in hh:mm[:sec] format, with an optional “AM” or “PM”. Note that 12:06AM=0:06=0.1, and 1:30PM=13:30=13.5. This formate works for hours, but it does not imply any particular time units are being used. The AM and PM suffixes can be either lower or upper case.

Normally, time increases monotonically from one row to the next within each subject. If it does not, an error message is generated. However, if there is a reset column, time is allowed to be reset when that occurs. Also, if the /sort option is sent to the engine, data is automatically sorted by subject ID and time, so data does not have to be initially ordered.

reset(reset_column_name=c(lowvalue, highvalue))

Example: reset(RESET=c(3, 4)) says that RESET is the data column signifying a resetting of subject time. If the value in the reset column is between three and four inclusive, time is allowed to be reset on that row. Also, all compartments in the model are reset to their initial values.

date(date_column_name[, format string [, century base]])

Examples: date(DATE)
date(DATE, mdy)
date(DATE, mdy, 1980)
says that DATE is the data column signifying the date. The default format of the date is month-day-year with arbitrary separators. If two digit years are given, they are assumed to be between 1980–2079, which is the default.

covr(covariate_name <- column_name)

Example: covr(W <- BW) says that BW is the data column signifying the model covariate W. If the model contains covariate variables, then every covariate must be mapped in this way, or else an error message is generated.

fcovr(covariate_name <- column_name)

Example: fcovr(W <- BW)

fcovr is identical to covr, except for the handling of covariate value changes. A covariate is set whenever it has any non-null value in a data record. Normally if a covariate such as bodyweight (BW) is set to value BW1 at time T1, and another value BW2 is set to a subsequent time point T2, the second value BW2 holds during the interval (T1,T2), so it is carried back in time. Similarly, BW1 holds at time T1 and during the period extending back from T1 to T0, the closest previous time where the covariate is set.

If fcovr is used, the first value BW1 holds during the forward interval (T1,T2), and gets reset to BW2 at time T2. However, if the covariate is interpolated, it doesn't matter if covr or fcovr are used, because the value is linearly interpolated.

obs(observation_variable_name <- column_name)

Example: obs(cObs <- Conc) says that Conc is the data column signifying the model covariate W. Use the obs mapping for all observation types such as observe, multi, count, event, and LL.

obs(cObs <- Conc, bql=BQL) also says that the data column BQL contains the flag specifying that the observed value is less than or equal to the value in column Conc. To use this feature, it is also necessary that the BQL option is used in the obs statement in the model.

mdv(mdv_column_name)

Example: mdv(MDV) says that MDV is the data column signifying “missing data values” for any observation. If this column is present, then on any given row it specifies if there are any missing observations on that row. If the MDV value is 0 (zero) or “.”, then the observation on that row is present, otherwise it is missing.

dose(dosepoint_name <- column_name)

Examples: dose(A1 <- Dose) says that Dose is the data column signifying the amount of drug administered to dosepoint A1.

dose(A1 <- Dose, Rate) also says that data column Rate specifies the infusion rate associated with the dose. If the rate is zero or unspecified, then the dose is a bolus. The concepts “bolus” and “infusion” are not limited to the central compartment, but can apply to a dosepoint on any compartment, including an absorption compartment.

There are also the statements dose1 and dose2, whose syntax is identical to dose. These match up with the dosepoint1 and dosepoint2 statements in the model. This is because there can be more than one dosepoint with the same name, so multiple dosepoints are referred to by sequential numbers, such as dosepoint 1 and dosepoint 2. dose can be used as a synonym for dose1, and dosepoint can be used as a synonym for dosepoint1.

ss(ss column_name, dose_cycle_description)

Examples: ss(SS, 10 bolus(A1) 24 dt)
ss(SS, Dose bolus(A1) II dt)
ss(SS, 10 bolus(A1) 16 dt 10 bolus(A1) 8 dt)
says that SS is the data column that brings the model to steady-state. On a given row, if the value in the SS column is other than 0 (zero) or “.”, the model is brought to steady state by running the dose cycle description as many times as necessary.

ss(SS, 10 bolus(A1) 24 dt) says the dose cycle is “administer 10 units of drug in a bolus to dosepoint A1, and then wait 24 time units.” The dose cycle description has a very simple syntax in reverse polish notation (RPN). the syntax is:

Option	Definition
number	Provide a number for an ensuing operation.
column-name	Provide a column name for ensuing operation.
bolus (dosepoint)	Give the previous number as a bolus to a dosepoint.
dt	Sleep for the length of time of the preceding number
inf(dosepoint)	Take the previous two numbers as an amount and a rate and give an infusion to a dosepoint.
bolus2, inf2	Same as bolus and inf, but for dosepoint2.
value op value	Simple arithmetic operators. op=+, –, *, /, ^.

When defining a dose cycle, there must be at least one dt statement. In general, a dt statement should come at the end of the cycle, so that any infusions or time lags in the cycle finish before the start of the next cycle. If a dose occurs on the same data row as an ss statement, then the model is first brought to steady state, and then the dose is administered.

addl(ss_column_name, dose_cycle_description)

Examples: addl(ADDL, 24 dt 10 bolus(A1))
addl(ADDL, II dt Dose bolus(A1))
says that ADDL is the data column signifying additional doses. On a given row, if the value in the ADDL column is other than 0 (zero) or “.”, then additional dose cycles are given according to the dose cycle description.

The syntax of the dose cycle description is the same as for ss. The dt statement should come first in the dose cycle, since ADDL is usually specified on the same row as a dose, and it indicates follow-on doses.

table(
  [optional_file]
  [optional_dosepoints]
  [optional_covariates]
  [optional_observations]
  [optional_times]
  variable_list
  )

Example: table(file="foo.csv"
           time(0,10,seq(2,8,0.1))
           dose(A1)
           covr(BW)
           obs(Conc)
           BW, C, cObs, V, Ke
           )
says a table is generated in file foo.csv, which consists of the variables BW, C, cObs, V, and Ke, whose values are generated at times 0, 2, 2.1,… 7.9, 8, and 10. (Note that the seq operator specifies a sequence of numbers, so seq(60,80,5) is shorthand for “60, 65, 70, 75, 80”). Values are also generated at the times of observations of Conc, when BW changes, and when a dose is given to A1. The times do not need to be specified in order, because they are automatically sorted. If multiple table statements are used, then multiple tables are generated.

The following are the contents of a column mapping file for the Theophylline model example:

   colstheo.txt
  id(xid)
  covr(dose <- dose)
  covr(time <- time)
  obs(cObs <- yobs)