$ESTIMATION
Instructions for the NONMEM Estimation Step.
|
|
Discussion
Optional. Requests that the NONMEM Estimation Step be implemented.
May also be coded $ESTM or $ESTIMATE. The Estimation Step obtains
parameter estimates.
In NONMEM v7+, multiple Estimation Steps can be implemented in a
single problem. A sequence of two or more Estimation Steps will
result in the sequential execution of each. Options specified in an
$ESTIMATION record will carry over to the next $ESTIMATION record
unless a new option is specified. If a particular option is not
used by the method then the option will be ignored. The final
parameter estimates from an Estimation Step will be passed on as the
initial estimates for the next Estimation Step. For example,
|
|
will first result in estimation of the problem by the first order
method, using as initial parameters those defined by the $THETA,
$OMEGA, and $SIGMA statements. Next, the first order conditional
estimation method will be implemented, using as initial parameters the
final estimates of THETA, OMEGA, and SIGMA from the previous analysis.
Up to 20 estimations may be performed within a problem. For all
intermediate estimation steps, their final parameter values and
objective function will be printed to the raw output file. Settings to
options specified in a $EST method will by default carry over to the
next $EST method, until a new option setting specified. Thus, in the
example above, PRINT will remain 5 and MAXEVAL will remain 9999 for
the second $EST statement, whereas NSIG will be changed to 4 and
METHOD becomes conditional. (NOTHETABOUND, NOOMEGABOUND, and
NOSIGMABOUND are exceptions to this rule. They pertain to all of the
estimations in the series within a $PROB. In NM710, NM712, and NM720,
these options must be given with the very first $EST record in the
problem. With NM73, these options may be placed with any of the $EST
records, but will still apply to all $EST records in the problem.)
We discuss the METHOD option first, since many other options are context-dependent on the estimation method used.
METHOD=kind
- kind=0 or ZERO: always set ETAs to 0 during the computation of the objective function. Also called the "first order (FO) method." This is the default.
-
kind=1 or CONDITIONAL: use conditional estimates for the ETAs during the computation of the objective function. METHOD=1 (without the LAPLACIAN option) is also called the "first order conditional estimation (FOCE) method." The conditional estimates of the ETAs are referred to as Conditional Parametric ETAs (CPE).
1 2 3 4METH=COND NOLAPLACIAN ;# the FOCE method. METH=COND LAPLACE ;# the Laplace method. METH=COND NOLAPLACE CENTERING ;# the Centering FOCE method. METH=COND LAPLACE CENTERING ;# the Centering Laplace method. - HYBRID: use conditional estimates for the ETAs during the computation of the objective function, with the exception of those ETAs listed in the ZERO option. Cannot be used with LAPLACIAN or CENTERING.
The following methods are new to NONMEM 7. When any of these methods are used, the data are inferred to be population, and METHOD=1 is supplied if it is not already present. The first four methods are referred to as Expectation-Maximization (EM) Methods.
METHOD=ITS
ITS stands for the iterative two-stage method. This method evaluates the conditional mode and first order (expected) or second order (Laplace) approximation of the conditional variance of parameters of individuals by maximizing the posterior density. This integration step is the same as is used in FOCE or Laplace. Population parameters are updated from subjects’ conditional mode parameters and their approximate variances by single iteration maximization steps that are very stable (usually converging in 50-100 iterations). Because of approximations used, population parameters almost, but not quite, converge towards the linearized objective function of FOCE. Iterative two stage method is about as fast as FOCE with simple one or two compartment models, and when set up with MU referencing (described below) can be several fold faster than FOCE with more complex problems, such as three-compartment models, and differential equation problems.
|
|
NITER (default 50) sets maximum number of iterations. For all new methods, it is essential to set INTERACTION if the residual error is heteroscedastic.
METHOD=IMP
The Monte Carlo Importance Sampling EM evaluates the conditional (posterior) mean and variance of parameters of ETAs by Monte Carlo sampling (integration, expectation step). It uses the posterior density which incorporates the likelihood of parameters relative to population means (THETAs) and variances (ETAs) with the individual’s observed data. By default, for the first iteration, the mode and first order approximation of the variance are estimated (called mode a posteriori, or MAP estimation) as is done in ITS or FOCE, and are used as the parameters to a normal distribution proposal (sampling) density. From this proposal density Monte Carlo samples are generated, then weighted according to the posterior density as a correction, since the posterior density itself is generally not truly normally distributed, and conditional means and their conditional variances are evaluated. For subsequent iterations, the normal density near the mean of the posterior (obtained from the previous iteration) is used as a proposal density. Population parameters (THETAs, SIGMAs, and OMEGAs) are then updated from subjects’ conditional mean parameters, gradients, and their variances by single iteration maximization steps that are very stable, and improve the objective function. The population parameters converge towards the minimum of the objective function, which is an accurate marginal density based likelihood.
|
|
METHOD=IMPMAP
Sometimes for highly dimensioned PK/PD problems with very rich data the importance sampling method does not advance the objective function well or even diverges. For this the IMPMAP method may be used. At each iteration, conditional modes and conditional first order variances are evaluated as in the ITS or FOCE method, not just on the first iteration as is done with IMP method. These are then used as parameters to the multivariate normal proposal density for the Monte Carlo importance sampling step.
|
|
METHOD=SAEM
As in importance sampling, random samples are generated from normal distribution proposal densities. However, instead of always centered at the mean or mode of the posterior density, the proposal density is centered at the previous sample position. New samples are accepted with a certain probability. The variance of the proposal density is adjusted to maintain a certain average acceptance rate (IACCEPT). This method requires more elaborate sampling strategy, but is useful for highly non-normally distributed posterior densities, such as in the case of very sparse data (few data points per subject), or when there is categorical data.
In the first phase, called the burn-in or stochastic mode, SAEM evaluates an unbiased but highly stochastic approximation of individual parameters (semi integration, usually 2 samples per individual). Population parameters are updated from individual parameters by single iteration maximization steps that are very stable, and improves the objective function (usually in 300-5000 iterations). In the second mode, called the accumulation mode, individual parameter samples from previous iterations are averaged together, converging towards the true conditional individual parameter means and variances. The algorithm leads to population parameters converging towards the maximum of the exact likelihood.
The SAEM method is specified by
|
|
The following table shows the mapping of SAEM parameters between Monolix and SAEM.
| Monolix | NONMEM |
|---|---|
| Number of Chains | ISAMPLE |
| K0 | CONSTRAINT subroutine may be user modified to provide any constraining pattern on any population parameters |
| K1 | NBURN |
| K2 | NITER |
| Auto K1 | CTYPE=1,2,3 |
| rho | IACCEPT |
| m1 | ISAMPLE_M1 |
| m2 | ISAMPLE_M1A |
| m3 | ISAMPLE_M2 |
| m4 | ISAMPLE_M3 |
| No simulated annealing | CONSTRAIN=0 |
| Simulated Annealing | CONSTRAIN=1,2,3,or user-defined |
| SEED | SEED |
After the analysis, suitable objective functions for hypothesis testing and second order standard errors can be obtained by importance sampling at the final population parameter values. Thus, one could issue this sequence of commands:
|
|
Here, after SAEM, IMPMAP estimation done on its first iteration, is performed, but without updating the main population parameters. Sometimes the MAP estimation is problematic, or the user wishes to use the SAEM’s last conditional mean and variances as the parameters to the importance sampler’s sampling density for the first iteration, so one may try:
|
|
For very large dimensioned problems (large OMEGA matrix), the IMP evaluated objective function can have a lot of stochastic variability (more than plus or minus 10 units), or continually increase with each iteration even though the population parameters are kept fixed. One way to reduce this volatility is to use IMPMAP instead of IMP, if the MAP estimation is not an issue:
|
|
Alternatively one can increase ISAMPLE (or combine it with IMPMAP):
|
|
As of NM74, another choice is to set EONLY=2, and the Monte Carlo variability of the objective function can be significantly reduced if, like the population parameters, the conditional means and variances from the SAEM estimation are also not updated after each IMP iteration:
|
|
Or, have the first iteration evaluate conditional modes and conditional MAP variances, and then use them for the subsequent iterations, by setting EONLY=3 and MAPITER=1:
|
|
Another approach to use SAEM is to begin with a short ITS run to provide good initial ETA estimates for each subject, followed by the SAEM analysis. The SAEM step uses these initial ETA estimates as a starting point for its MCMC of each subject’s conditional (posterior) density, followed by objective function evaluation:
|
|
To have conditional mean values (values in "root.phi") evaluated by MCMC sampling used in the SAEM method, but at a constant set of the final fixed parameters, then you could invoke EONLY=1 with the SAEM method as well:
|
|
METHOD=BAYES
The Markov Chain Monte Carlo (MCMC) Bayesian Analysis method. The goal of the MCMC Bayesian analysis [12,13] is not to obtain the most likely THETAs, SIGMAs, and OMEGAs, but to obtain a large sample set of probable population parameters, usually 10000-30000. The samples are not statistically independent, but when analysis is properly performed, they are uncorrelated overall. Various summary statistics of the population parameters may then be obtained, such as means, standard deviations, and even confidence (or credible) ranges. The mean population parameter estimates and their variances are evaluated with considerable stability. Maximum likelihood parameters are not obtained, but with problems of sufficient data, these sample mean parameters are similar to maximum likelihood values, and the standard deviations of the samples are similar to standard errors obtained with maximum likelihood methods. A maximum likelihood objective function is also not obtained, but, a distribution of joint probability densities is obtained, from which 95% confidence bounds (assuming a type I error of 0.05 is desired) can be constructed and tested for overlap with those of alternative models.
As with the SAEM method, there are two phases to the BAYES analysis. The first phase is the burn-in mode, during which population parameters and likelihood may change in a very directional manner with each iteration, and which should not be used for obtaining statistical summaries. The second phase is the stationary distribution phase, during which the likelihood and parameters tend to vary randomly with each iteration, without changing on average. It is these samples that are used to obtain summary statistics.
The Bayesian method is specified by
|
|
METHOD=DIRECT
On rare occasions, direct Monte Carlo sampling may desired. This method is the purest method for performing expectation maximization, in that it creates completely independent samples (unlike MCMC), and there is no chance of causing bias if the sampling density is not similar enough to the conditional density (unlike IMP). However, it is very inefficient, requiring ISAMPLE values of 10000 to 300000 to properly estimate the problem. The method can be implemented by issuing a command such as
|
|
On occasion it can have some use in jump starting an importance sampling method, especially if the first iteration of importance sampling fails because it relies on MAP estimation, and the problem is too unstable for it. Thus, one could perform the following, where just a few iterations of direct sampling begin the estimation process:
|
|
Notice that since MAPITER=0, the first iteration of IMP method relies on starting parameters for its sampling density that came from the DIRECT sampling method.
METHOD=NUTS (NM74)
The No-U-Turn Sampling (NUTS) algorithm is a limited version of NUTS in Stan [1, 2], focused on analyzing population PK/PD models, with normally distributed THETA priors (or t-distributed theta priors, see TTDF), and Wishart/Gamma distributed OMEGA and SIGMA priors, or using the LKJ correlation prior (see OLKJDF and SLKJDF options) for OMEGAS and SIGMAS. was developed by Hoffman, Gelman, and others of the Stan Development Team ([1, 2], and communications with Bob Carpenter, Andrew Gelman, Matt Hoffman, Michael Betancourt, and Sebastian Weber.). The algorithm developed in NONMEM
Typical Bayesian algorithms search for individual parameters (PHIs or ETAs) and population parameters (THETAs, OMEGAs, and SIGMAs) in separate stages. While this provides for rapid generation of an MCMC sample, the samples can be heavily correlated, especially if there are high correlations in the OMEGA matrix. The NUTS sampler uses a directed search using partial derivatives and scaling techniques using posterior density knowledge from previous samples to reduce the correlation of the parameters from one iteration to the next. While each iteration takes longer to generate with NUTS, the samples may be 10-100 times decorrelated relative to a standard MCMC sampling. Thus, 10000 samples with NUTS may be worth 100000 samples with traditional MCMC algorithms. See the NEFF (Number of EFFective samples) utility for analyzing the quality of an MCMC run, I.103 NEFF and NEFFI Utility Programs (NM74).
Because the NUTS algorithm relies on derivatives, it is best if analytical derivatives are created for each of the estimated parameters. For OMEGAs and SIGMAs, these are done automatically, but for THETAs, analytical derivatives are created only if they are MU referenced (see MU Reference section below). So MU reference all THETAs that are to be estimated. It is okay to set their OMEGA to 0, the analytical derivative will still be utilized.
The easiest way to use the NUTS algorithm in NONMEM is to use the AUTO option (see section I.46 Some General Options and Notes Regarding EM and Monte Carlo Methods for more details on the AUTO feature, AUTO=0 (default) (NM73)). For example, stanrb42.ctl uses AUTO=1:
|
|
The example stanrb10.ctl shows another setup for using NUTS:
|
|
The OLKJDF option specifies the degrees of freedom to the LKJ decorrelation density for the OMEGA prior. The OLKJDF should be set to a value greater than 0 to use the LKJ decorrelation prior for OMEGAs, as recommended by the STAN group. Also see comment below about using LKJ decorrelation versus inverse Wishart prior. Experience has shown that a low non- or weakly-informative OLKJDF should be at least 2, and no greater than the number of the Omega diagonals. If OLKJDF is set to 1, then there may be considerable pull towards high correlations which reduces efficiency of sampling. A single iteration of ITS is helpful to center the initial ETAs at their modes, as a good facilitator for initiating the NUTS run that follows it (NITER=0, so population parameters are not advanced).
An alternative method is to use the traditional BAYES method to rapidly generate samples for an initial mass matrix, which can then be passed on to the NUTS algorithm (example stanrb9):
|
|
Notice that the MASSRESET is set to 1 to initialize the mass matrix accumulator at the BAYES step, and then set MASSRESET=0 at the NUTS step, so that the mass matrix is not re-initialized, but rather, carried over, from the BAYES step. Also, you may wish to set KAPPA=0.75 during BAYES so that the accumulated mass matrix favors values collected during the latter portion of the BAYES analysis (for the NUTS step itself, KAPPA should be set back to 1). This technique can sometimes make the burn-in (warm-up) period for NUTS execute faster, and/or improve the de-correlation. MADAPT is set to ½ of NBURN, during which the mass matrix is continuously updated every NUTS_BASE iterations (which in this case by default is 0.025*NBURN=25).
Other stanrb* examples in the examples directory show the various ways in which the problem may be analyzed.
Several versions of a differential equation problem, example6, are also in the examples directory, example6hmt*, to be compared with example6classic2. and example6classico3. Comparing example6hmto26.ctl using pure NUTS algorithm with example6hmto19 which uses a pre-warmup from a previous BAYES estimation, you can see that the pre-warmup can reduce time of computation.
To request NUTS as estimation method, use
|
|
METHOD=CHAIN
Allows the user to create a series of random initial values of THETAs and OMEGA's, or for reading in initial population parameters from a file of rectangular (rows/column) format. Applies only to the Estimation Step.
Consider
|
|
The NSAMPLE=5 random samples of THETAS and OMEGAS will be generated and written to file "example1.chn", using “comma” as a delimiter. SEED sets the starting seed for the random number sequence.
As of nm75, the actual starting seed can be set changed with CLOCKSEED=1, to produce different stochastic results for automated replications, without the need to modify the seed value in the control stream file in each replication.
By default (CTYPE=0), random values of THETA are generated from a
uniform distribution within the interval [lower bound, upper bound]
from the $THETA record. If bound not specified, then the default bounded
interval is
[(1-IACCEPT)*THETA, (1+IACCEPT)*THETA].
For the SIGMA values their Cholesky-decomposed values are uniformly varied within (but see option DFS)
[(1-IACCEPT)*SIGMA, (1+IACCEPT)*SIGMA]
If CTYPE=1, then the lower and upper bounds from $THETA are ignored
and all THETAs are uniformly varied using the IACCEPT factor. If
CTYPE=2, then THETAs are initiated based on a normal distribution with
the initial $THETA as the mean, and the second set of $OMEGA~s (or
~$THETAPV) as the variance, if there is a $PRIOR command with NTHP
non-zero. This is the best way and most complete way to define the
sampling density for the THETAs. Otherwise, if NTHP=0 (or no
$THETAP are provided) , the variance for THETA is obtained from the
first set of $OMEGA, and requires that the THETAs be MU modeled, and
those THETAs not MU modeled will follow the uniform distribution as if CTYPE=0.
The OMEGAs are sampled using a Wishart distribution with variance from
$OMEGA, and DF is the degrees of freedom for randomly creating the
OMEGAS. If DF=0, then the dimensionality of the entire OMEGA matrix
is used as the degrees of freedom. As of NM73, setting DF>1E6 fixes OMEGA elements at their initial values.
The format of the chain file that is created is exactly the same as
the raw output files, including iteration numbers. In the above
example, after the 5 random samples are made, ISAMPLE=3 (the third
randomly created sample) is selected, and brought in as the initial
values. If ISAMPLE=0, then the initial values are not set to any of
the randomly generated samples, but will just be what was listed in
$THETA and $OMEGA of the control stream file.
If NSAMPLE=0, but ISAMPLE=some number, then it is expected that FILE already exists, and its iteration number specified by ISAMPLE is to be read in for setting initial values:
|
|
One could create a control stream file that first creates a random set of population parameters, and then sequentially uses them as initial values for several trial estimation steps:
|
|
Here the five estimations are performed in sequence. To perform them in parallel, one may set up a control stream file with
|
|
and additionally five "child" control streams with the NSAMPLE=0 (so that it does not create a new chain file, but uses the already existing one), and different ISAMPLE's:
|
|
Each control stream file points to a different ISAMPLE position in the .chn file, so each would use these as the respective initial positions. Each of these "child" control stream files could be loaded on to a job queue, as separate processes. If the user is running a multi-core computer, this would be quite straight forward.
An existing chain file could actually be a raw output file from a previous analysis, with a list of iterations. In the following example:
|
|
could pick up the final result of the previous analysis, since ISAMPLE points to the iteration number, and -1000000000 is the iteration number for the final estimate. As of nm75, you may also select the table number TBLN, as raw output files can contain multiples tables of results, with similar sample numbers. If TBLN is not specified, the first matching iteration (isample) number is selected among all the tables in the file. Thus, the CHAIN method in this usage is really just an input command to bring in values from a raw output-type file format. Of course, users may have the chain file created by any program, not just NONMEM, so long as it has the raw output file format, with delimiter specified by DELIM/FORMAT (which is space by default).
In NM73+, if the option ISAMPEND is set to a value greater than ISAMPLE, then NONMEM will evaluate the objective function (using FOCEI method) for each sample between numbers ISAMPLE and ISAMPEND in the file, and then select the one with the smallest objective function. For example,
|
|
randomly creates 20 sets of initial parameters, and selects the one with the lowest objective function.
If METHOD=CHAIN is used, it must be the first $EST command in the
particular $PROB. Furthermore, because the settings it uses for FILE,
NSAMPLE, ISAMPLE, IACCEPT, CTYPE, and DF are functionally different
from the way the other $EST methods use them, these settings from
METHOD=CHAIN are not passed on to the next $EST command, which must be
an estimation method. However, other parameters such as DELIM,
FORMAT, SEED, AND RANMETHOD will be passed on as default
delimiter/format to the next $EST command. However, the RANMETHOD
does not propagate to the $CHAIN record.
Options
Setting an option to "-100" will force NONMEM to select the default value for that parameter.
ABORT | NOABORT | NOHABORT
ABORT requests that during the Estimation Step, NONMEM does not implement theta-recovery when PRED sets the error return code to 1. (The PRED error return code n is set by the statement "EXIT n [k]" in abbreviated code, or by the statement IERPRD=n in user-supplied code, or by PREDPP when it detects errors.) This is the default.
NOABORT requests that during the Estimation Step, NONMEM implements theta-recovery, i.e., attempt to avoid values of theta which result in PRED error return code 1. In addition, most non-positive Hessian matrices will be forced to be positive definite, allowing the program to continue, and abnormal termination of the estimation step will occur less often.
NOHABORT requests performing positive definite correction at all levels of the estimation. This can hide a serious ill-posed problem, so use with care.
ATOL=n (NM72)
Absolute tolerance. Used only with ADVAN9, ADVAN13, ADVAN14, ADVAN15, ADAN16, and ADVAN17, ADVAN18, for which TOL is a relative tolerance. Sets ANRD=ATOL. The default is 12 (that is, accuracy is 10**(-12)). Usually the problem runs quickly when using this setting. On occasion, however, you may want to reduce ATOL (usually set it equal to that of TOL), and improve speeds of up to 3 to 4 fold.
By default, the value set on $SUBROUTINES record (or $TOL or TOL
subroutine) is used. If ATOL is coded on $ESTIMATION, it overrides the default for that step. If ATOL is coded on $COVARIANCE, it overrides $ESTIMATION and/or the default for that step.
With NONMEM 74, this feature is deprecated. A user-supplied TOL
subroutine should test NM_STEP and set ANRD accordingly.
AUTO=[0|1|2|3]
AUTO sets recommended options for METHOD used. This option is ignored by the classical NONMEM methods (FO/FOCE/Laplace).
- AUTO=0 (NM73): NONMEM does not provide best settings of certain options. This is the default.
-
AUTO=1 (NM73): Several options will be set by NONMEM that will allow best settings to be determined. User may still over-ride those options set by auto, by specifying them on the same
$ESTrecord:1 2 3 4$EST METHOD=ITS AUTO=1 PRINT=10 $EST METHOD=SAEM AUTO=1 PRINT=50 $EST METHOD=IMP PRINT=1 EONLY=1 NITER=5 ISAMPLE=1000 $EST METHOD=BAYES AUTO=1 NITER=1000 FILE=auto.txt PRINT=100The setting of AUTO=1 for each METHOD is as follows.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39METHOD=DIRECT INTERACTION ISAMPLE=1000 CTYPE=3 NITER=500 STDOBJ=10 ISAMPEND=10000 NOPRIOR=1 CITER=10 CINTERVAL=0 CALPHA=0.05 EONLY=0 METHOD=BAYES INTERACTION CTYPE=3 NITER=10000 NBURN=4000 NOPRIOR=0 CITER=10 CINTERVAL=0 CALPHA=0.05 IACCEPT=0.4 ISCALE_MIN=1.0E-06 ISCALE_MAX=1.0E+06 PACCEPT=0.5 PSCALE_MIN=0.01 PSCALE_MAX=1000 PSAMPLE_M1=1 PSAMPLE_M2=-1 PSAMPLE_M3=1 OSAMPLE_M1=-1 OSAMPLE_M2=-1 OACCEPT=0.5 ISAMPLE_M1=2 ISAMPLE_M1A=0 ISAMPLE_M2=2 ISAMPLE_M3=3 ISAMPLE_M1B=2 MCETA=0 METHOD=SAEM INTERACTION CTYPE=3 NITER=1000 NBURN=4000 ISAMPEND=10 NOPRIOR=1 CITER=10 CINTERVAL=0 CALPHA=0.05 IACCEPT=0.4 ISCALE_MIN=1.0E-06 ISCALE_MAX=1.0E+06 ISAMPLE_M1=2 ISAMPLE_M1A=0 ISAMPLE_M1B=2 ISAMPLE_M2=2 ISAMPLE_M3=2 CONSTRAIN=1 EONLY=0 ISAMPLE=2 MCETA=0 METHOD=ITS INTERACTION CTYPE=3 NITER=500 NOPRIOR=1 CITER=10 CINTERVAL=1 CALPHA=0.05 METHOD=IMP INTERACTION CTYPE=3 NITER=500 ISAMPLE=300 STDOBJ=10 ISAMPEND=10000 NOPRIOR=1 CITER=10 CINTERVAL=1 CALPHA=0.05 IACCEPT=0.0 ISCALE_MIN=0.1 ISCALE_MAX=10 DF=0 MCETA=3 EONLY=0 MAPITER=1 MAPINTER=-1 METHOD=IMPMAP INTERACTION CTYPE=3 NITER=500 ISAMPLE=300 STDOBJ=10 ISAMPEND=10000 NOPRIOR=1 CITER=10 CINTERVAL=1 CALPHA=0.05 IACCEPT=0.0 ISCALE_MIN=0.1 ISCALE_MAX=10 DF=0 MCETA=3 EONLY=0 ;# NM74 METHOD=NUTS INTERACTION CTYPE=0 NITER=2000 NBURN=10000 NOPRIOR=0 NUTS_STEPITER=1 NUTS_STEPINTER=0 NUTS_TEST=0 NUTS_INIT=75 NUTS_BASE=-3 NUTS_TERM=50 NUTS_GAMMA=0.05 NUTS_DELTA=0.8 KAPPA=1.0 IKAPPA=1.0 NUTS_REG=0.0 MADAPT=-1 NUTS_EPARAM=0 NUTS_OPARAM=1 NUTS_SPARAM=1 NUTS_MASS=B NUTS_TRANSFORM=0 NUTS_MAXDEPTH=10 -
AUTO=2: allow user setup the alternative sampling strategy.
1 2 3 4 5 6 7 8 9 10 11 12 13;# (NM74) IMP estimation: settings in AUTO=1 with additional: GRDQ=-1.0 DERCONT=1 RANMETHOD=3S2P ;# (NM75) SAEM estimation: settings in AUTO=1 with additional: MAPITERS=1 ;# MAP assessment assist on the first iteration MCETA=100 ;# (NM75) BAYES estimation: settings in AUTO=1 with additional: MAPITERS=1 ;# MAP assessment assist on the first iteration MCETA=100 ;# (NM74) NUTS estimation: settings in AUTO=1 but changes on: NUTS_EPARAM=2 NUTS_MASS=BD -
AUTO=3 (NM74): for NUTS only. Same settings in AUTO=1 but with changes
1NUTS_EPARAM=1 NUTS_MASS=D
The AUTO setting itself transfers to the next $EST within the same
$PROB, just like any other option settings explicitly set by the user
in the control stream file, so AUTO remains on or off until then next
AUTO option specified. For example, with
|
|
the IMP statement also has AUTO=1. However, with
|
|
the AUTO setting is turned off for IMP, and turned back on for BAYES.
Any option settings implicitly set by the AUTO feature does not
transfer to the next $EST statement. Also, when using AUTO=1, the
transfer of any options settings explicitly set by the user from
previous $EST statements may or may not occur for those options set by
the AUTO option, depending on the situation.
BAYES_PHI_STORE=[0|1] (NM75)
As of NM75, samples of phi/eta are collected at each BAYES iteration, and summarized to provide mean phi and phc() in the root.phi table, as described above. By default, the individual phi values from each iteration are not stored. However, if you set
|
|
then PHI and ETA values from each BAYES iteration will be stored in root.iph. For non-mixture problems, only records of SUBP=0 are recorded, as there are no sub-population divisions. For mixture problems, the SUBP=0 records contain the composite phis and ETAs (the average of these across all non-negative iterations are in the root.phi table), and the SUBP>0 records contain the phis and ETAs appropriate to each sub-population SUBP (the average of these across all non-negative iterations are in the root.phm table). The root.iph file can become quite large, so it should be used only on the final analysis. This root.iph file can be read by the utility program NEFFI (see I.103 NEFF and NEFFI Utility Programs (NM74)).
BIONLY=[0|1] (NM75)
BIONLY stands for Bayesian individual parameters only, and when set to 1, will create new samples of individual parameters only, but will keep the population parameters fixed. This is equivalent to EONLY=1 used for SAEM and IMP estimation. Of course, for this to be useful, either write statements capturing the individual parameters samples in the control stream must be entered (see example 8 near end of this manual), or the BAYES_PHI_STORE option must be set to 1, to collect individual samples in the .iph file.
Example:
|
|
The BIONLY option of BAYES, with collection of individual samples in the .iph file, is comparable to the ETASAMPLES option of SAEM, with collection of individual samples in the .ets file (ETASAMPLES=0 (default) (NM74) for SAEM.
BOOTDATA=[0|1] (NM75)
By default (BOOTDATA=0), when data are selected based on $SIML BOOSTRAP, the randomly selected subjects are analyzed during the subsequent estimation method. If BOOTDATA=1, then the subjects not selected are analyzed. Usually, you may wish to simply perform an evaluation, not an estimation, on these non-selected data (MAXEVAL=0, or NITER=0), at the population parameter values based on estimation of the selected subjects. For example:
|
|
Or for EM methods:
|
|
CALPHA=n
n between 0.01 and 0.05. Alpha error rate to use for linear regression test to assess statistical significance. Default is 0.05.
CENTERING | NOCENTERING
CENTERING requests that the average conditional estimates of each eta be constrained to be close to 0. May only be used with METHOD=1. Not permitted with INTERACTION.
NOCENTERING requests that the average conditional estimates of each ETA not be constrained. This is the default.
By default model fitting assumes the ETAs are centered at 0. When results such as ETABAR and corresponding P-value show otherwise, it is possible that the fit using the noncentering conditional estimation method is not the best desirable, and a centering method might be tried. Using centering FOCE or centering Laplacian, one should noticethat the P-values are greater (although perhaps some are still less than 0.05), and often the fit is improved. Centering methods should be used to examine model bias, when the noncentering method is suspected to be misspecified. Even when a model is well-specified, it may be so complicated (e.g. it uses difficult differential equa- tions) that to use it with a conditional estimation method requires a great amount of computer time. In this case, if indeed a conditional estimation method is needed, one might use centering FOCE with the first-order model, even though centering per se may not be needed. In this situation, use of centering, along with the first-order model, is just a device allowing a conditional estimation method to be implemented with less of a computational burden. A compromise is achieved; the fit should appear to be an improvement over that obtained with the FO fit, but it may not be quite as good as one obtained with the noncentering FOCE or Laplacian methods. Because the first-order model is automatically obtained from the given model, the final form of the given model (obtained after completing model development) is readily available, and with this model, one might try to implement one run using either the noncentering FOCE or Laplacian methods and compare results.
CINTERVAL=n
Every n iterations is submitted to the convergence test system. If CINTERVAL=0, then a best CINTERVAL will be found, then used.
CITER=n
n is the number of latest PRINT or CINTERVAL iterations on which to perform a linear regression test (where independent variable is iteration number, dependent variable is parameter value). If CITER=10, then 10 of the most recent PRINTed or CINTERVAL iterations are used for the linear regression test. Default is 10. May also be coded CNSAMP. If CINTERVAL is not specified, then the PRINT option is used.
CONSTRAIN=n (NM72)
Requests simulated annealing for parameters to slow the rate of reduction of the elements of OMEGA during the burn-in phase of the SAEM method, allowing for a more global search of parameters that minimize the objective function. Values for n are:
- 0 or 4: No simulated annealing.
- 1 (default) or 5: Requests simulated annealing for OMEGA.
- 2 or 6: Requests simulated annealing for SIGMA.
- 3 or 7: Requests simulated annealing for both OMEGA and SIGMA.
When CONSTRAIN>=4, simulated annealing is also performed on diagonal elements of OMEGA that are fixed to 0 to facilitate any associated THETAs.
Simulated annealing is performed by subroutine CONSTRAINT.
The $ANNEAL record facilitates EM search methods for this additional annealing technique. The subroutine CONSTRAINT may also be used to provide any kind of constraint pattern on any parameters.
The user may modify the subroutine CONSTRAINT that performs the simulated annealing algorithm.
CTYPE=[0|1|2|3|4]
CTYPE is used to define the termination test to be applied to the burn-in phases for SAEM and BAYES methods and to the estimation phases for the ITS, IMP, and IMPMAP methods. (CTYPE=4 Applies to classical methods FO/FOCE/Laplacean.)
- CTYPE=0 indicates no termination test, the default.
- CTYPE=1 indicates that the test should be applied to the objective function value, THETA's and SIGMA's but not to the OMEGA's.
- CTYPE=2 indicates that the test should be applied to the objective function value, THETA's, SIGMA's and diagonal elements of OMEGA.
- CTYPE=3 indicates that the test should be applied to the objective function value, THETA's, SIGMA's and all OMEGA elements.
- CTYPE=4 indicates that NONMEM should test if the objective function has not changed by more then NSIG digits beyond the decimal point over 10 iterations, even though a parameter may oscillate at some digit. If this condition is satisfied, the estimation will terminate successfully. Applies to FO/FOCE/Laplacean methods.
DERCONT=[0|1] (NM73)
The derivative continuity test (DERCONT) by default is off (0). When DERCONT=1, the partial derivative of the objective function with respect to THETAs will perform an additional test to determine if a backward difference assessment is more accurate than a forward difference assessment. The forward difference assessment can differ greatly from the backward difference assessment in cases of extreme discontinuity when varying certain THETAs by even just a small amount in the model results in a large change in objective function, (such as a viral model in which a very small change in the potency of an anti-viral agent results in widely varying time of return of viral load). This results in standard errors being poorly assessed for THETAs that do not have inter-subject variances associated with them. Setting DERCONT=1 slows the analysis, but can provide more accurate assessments of SE in such models. The DERCONT works only for the Monte Carlo EM algorithms such as IMP, DIRECT, IMPMAP, and SAEM.
DF=n
The proposal density is to be a t-distribution with n the number of degrees of freedom (DF). Default DF=0, a normal density. Used with the IMP and IMPMAP methods.
DFS=n (NM73)
Degrees of freedom for the SIGMA matrix for simulation purposes by CHAIN.
- DFS=-1: this is the default. The cholesky elements are
uniformly varied over the interval
[(1-accept)*initial value, (1+accept)*initial value]. - DFS=n: the SIGMA matrix is randomly created with an inverse Wishart distribution centered about the initial SIGMA values, with degrees of freedom DFS for dispersion.
- DFS=0: as above, but the size of the SIGMA matrix is used as degrees of freedom.
- DFS=>1000000: SIGMA is fixed at its initial value.
EONLY=[0|1]
Evaluate the IMP objective function by performing only the expectation step, without advancing the population parameters (default is 0, population parameters are updated). When this method is used, NITER should equal 5 to 10, to allow proposal density to improve with each iteration, since mean and variance of parameters of normal or t distribution proposal density are obtained from the previous iteration. Also it is good to get several objective function values to assess the Monte Carlo noise in it.
As of NM74, if EONLY=2, then not only are the population parameters not updated with each iteration, neither are the individual conditional means/modes and conditional variances until the last iteration, if the variances of the population parameter estimates are estimated ($COV is requested). If EONLY=3, then the conditional modes and approximate variances from MAP estimation will be saved on the first iteration, and used for the sampling density of all subsequent iterations. This improves efficiency when selecting MAPITER=1, MAPINTER=0, so that the MAP estimation did not need to be repeatedly performed.
ETABARCHECK | NOETABARCHECK
ETABARCHECK: there is an ETABAR statistic (see ETABAR) from a previous problem, and the P-value associated for the ETABAR statistic with the problem at hand relates to a hypothesis test that the true etabar is the same as that with the previous problem.
NOETABARCHECK: The P-value associated for the ETABAR statistic related to a hypothesis test that the true ETABAR is 0. This is the default.
ETADER=n (NM73)
For evaluating individual variances by numerical derivative methods. In evaluating the MAP objective function, the term log(Det(V)) must be evaluated to obtained the marginal or integrated posterior density, where V is the eta Variance matrix based on the subject's posterior density. With ETADER>0, SLOW option may be needed.
- ETADER=0: expected value V, using analytical first derivatives.
- ETADER=1: expected value V, using forward finite difference numerical first derivatives. Needed if not all code evaluating F and Y derivatives with respect to eta are available for processing by NM-TRAN or in user supplied code.
- ETADER=2: expected value V, using central finite difference numerical first derivatives. Needed if not all code evaluating F and Y derivatives with respect to eta are available for processing by NM-TRAN or in user supplied code.
- ETADER=3: 2nd derivative method of evaluating V, using numerical second derivatives of -log(L) with respect to ETAs. This is equivalent to using the "Laplace NUMERICAL" method, even though FOCE may be selected.
ETASAMPLES=[0|1] (NM74)
Used with SAEM or BAYES. Default ETASAMPLES=0. ETASAMPLES=1 causes individual ISAMPLE random ETA samples
per subject, output to "root.ets". To do this, perform an SAEM
analysis (or BAYES) after the primary estimation, keeping the
population parameters fixed (ETYPE=1), only performing the burn stage
so samples do not get accumulatively averaged, and set ISAMPLE to 10 or higher to collect suffient samples per subject. For example, note the $EST line in bold, after the usual SAEM analysis:
|
|
ETASAMPLES=1 causes individual ISAMPLE random eta samples per subject, to be written to root.ets, where root is the root name of the control stream file. MASSRESET is set to 1 to initialize the internal burn-in coefficients, and collect that information during the population parameter estimation. Then, have the next SAEM step accumulate eta samples (ETASAMPLES=1), and set MASSRESET=0 so the internal burn-in coefficients are note reset, but use those obtained during the previous SAEM step. After this, you could request the IMP step (IMP needs to use its own burn-in or shaping coefficients, so re-initialize with MASSRESET=1).
Suppose your result was from a previous analysis, called precevample.ctl. You could do the following for the present problem:
|
|
ETASTYPE (NM73)
ETASTYPE=0: ETA shrinkage is averaged for all subjects. This is the default.
ETASTYPE=1: ETA shrinkage is averaged only among subjects that provided a non-zero derivative of their data likelihood with respect to that ETA. (See ETASXI).
EVALSHRINK=0 (NM760)
By default, when MAXEVAL=0, and population parameters are not
inputted from an MSF file (using the $MSFI record), Etabar and
shrinkage information is not calculated and displayed in the NONMEM
report file. If you wish to have this information displayed when
MAXEVAL=0, then set EVALSHRINK=1 on the $EST record.
FILE=filename
Name for the raw output file. Parameter estimates and objective function value will be printed to this file every printed iteration as indicated by the PRINT option. Default: root.ext, where root is the name of the control file (not including any extension; "nmbayes" if the name is not specified on the nmfe command line). Note that the names of additional output files are not affected by this option.
FNLETA=n (NM72)
Set FNLETA to 0 if you do not want it to spend time performing the end
FNLMOD (which evaluates final mixture proportions for each subject in
mixture models) and FNLETA (which evaluates final ETAs) routines using
the original algorithm after the estimation and covariance steps are
completed. You may want to turn this off if each objective function
call takes a long time, with very complex problems or large data sets.
Be aware, that certain $TABLE outputs, such as the traditional WRES,
RESI, and PRED, may or may not be properly evaluated if the FNLMOD and
FNLETA steps are omitted.
Normally, when you do not set FNLETA, or when you set FNLETA to 1,
regardless of the method that was used (classical or EM/Monte Carlo)
to obtain the THETAs, OMEGAs and SIGMAs in the last $EST step, $TABLE
parameters are estimated based on a “post-hoc” evaluation of the ETAs
at the mode of the posterior density position (eta hat), and using the
final theta/OMEGAs/sigma estimates (classical and EM), or at the
posterior means of THETAs/OMEGAs/SIGMAs (BAYES) (so, population
parameters used are those listed on the -1000000000 line of the .ext
file). These eta hat values are identical to those evaluated during
the estimation for ITS/FOCE/Laplace methods, but differ from the mean
values estimated during an IMP, SAEM, BAYES analysis.
Setting FNLETA=0 prevents the post-hoc analysis, so that $TABLE
parameters are evaluated based on the eta/phi values generated by the
last iteration of the last $EST method implemented (those listed in
the .phi file), which are mode of posterior values for
ITS/FOCE/Laplace, and conditional means for IMP/SAEM.
Before nm75, the phis/ETAs after a BAYES analysis yields single sample
position values of the very last iteration, and had limited use. With
nm75, $TABLE parameters use final MCMC posterior means (those listed
in the .phi file, collected during the stationary phase) for the phis,
and posterior means of the THETAs, OMEGAs, SIGMAs (those listed on the
-1000000000 line of the .ext file). For example, suppose you have a
user defined variable in $ERROR or $PK that is a function of THETA(1)
and ETA(1):
|
|
Then W will be output to the table as
|
|
If you wish to have the MCMC mean of W:
|
|
you will need to collect individual samples of W with an additional WRITE statement, and then summarize the the results after NONMEM analysis, as shown in example 8.
Regardless of the FNLETA setting, the .phi and .phm tables always output the PHI/ETA values used for the particular method (mode of posterior, and approximate Fisher information based variances for ITS/FOCE/Laplace methods, Monte Carlo assessed conditional means and conditional variances for SAEM/IMP methods).
If you set FNLETA=2 (NM73), then the estimation step is not done, and
whatever ETAs are stored in memory at the time are used in any
subsequent $TABLE~’s. This has value if you loaded the individual ETAs
from an MSF file, or from a ~$PHIS and $ETAS record, and you want to
calculate $TABLE items based on those ETAs, rather than from a new
estimation. For example:
|
|
To summarize:
- FNLETA=1: Diagnostics depending on EBE’s such as CWRES, CIWRES, CIPRED, etc., will use EBE’s based on the final estimation method (conditional mode for FO/FOCE/Laplace/ITS, conditional mean for IMP/SAEM, MCMC posterior means for BAYES), while user selected items will use EBE’s from the FNLETA step (eta modes).
- FNLETA=0: All table outputs (diagnostics and user selected items) will use EBE’s from final estimation method (conditional modes for FO/FOCE/Laplace/ITS, conditional means for IMP/SAEM, MCMC posterior means for BAYES).
- FNLETA=2: All table outputs will use a common set of EBE’s from an
imported source. When using an imported source, such as a *_ETAS.msf
or *.phi file (from a
$ETASor$PHISrecord), ensure that their ID ordering matches that of the$DATAdata file. - FNLETA=3 (as of nm74): Like FNLETA=1, will call FNLETA, and all table outputs (diagnostics and user selected items) will use EBE’s from the FNLETA step (eta modes).
DELIM, FORMAT
Format for the raw output file and all additional output files. s defines the delimiter [,|s(pace)|t(ab)] followed by a Fortran format specification. The default is s1PE12.5. For more details, see FORMAT. May also be coded DELIM.
FO | NOFO
FO: requests that the First-Order Model be used with METHOD=1 and CENTERING. Cannot be used with LAPLACIAN.
NOFO: requests that the First-Order Model not be used with METHOD=1 and centering. This is the default.
FPARAFILE (NM74)
Final ETAs (empirical Bayes estimates; EBE's) are evaluated after the last Estimation Step (when FNLETA=1). This computation is parallelized if parallelization is on for the final Estimation Step.
FPARAFILE=filename specifies a different parafile than was used for the Estimation Step.
FPARAFILE=ON turns on parallelization for the EBE's.
FPARAFILE=OFF turns off parallelization for the EBE's.
The FPARAFILE option may be specified on any $ESTIMATION record,
but applies only after the last Estimation Step.
GRD=[G|N|D|S]…
Specifying a string of [G|N|D|S]'s with each symbol representing a THETA or SIGMA parameter in numerical order. The first m letters of GRD refer to the m THETA's. Then the m+1th letter refers to SIGMA(1,1), m+2 refers to SIGMA(2,2), etc (going along the diagonal of SIGMA). Omitted symbols are assumed to be D. G indicates that the THETA should be Gibbs sampled. N indicates the THETA should be sampled using the Metropolis-Hasting algorithm. S indicates that the THETA is being used to model a SIGMA parameter. S is used with Monte Carlo EM methods. D (default) indicates the program will decide. G and N are used only with the BAYES method. Default is DDDD…
An alternative syntax may be used:
|
|
T is parameter type (T for theta, S for sigma-like theta). V is a letter (S,D or N), and n is a number list. For example, to specify THETAs 3, and 5 through 8 to be Gibbs samples, theta 4 is sigma-like, and SIGMAs 1-3 are to be Metropolis-Hastings processed,
|
|
THETAs and SIGMAs not specified are given a default D designation.
GRDQ=0 (NM74)
GRDQ ("gradient quick") option allows THETAs that must be gradient assessed (such as those that are not mu-referenced) and SIGMAS to be more quickly evaluated by not evaluating the gradients for every one of the ISAMPLE random samples, but chooses a subset of the most important samples. This reduces the computational cost, since gradients of the objective function with respect to the THETAs require more objective function calls than is usually required when evaluating mu-referenced THETAs.
If GRDQ>=1, then this is interpreted as the number of important samples to be used for theta gradient assessment per subject.
If GRDQ<1, then GRDQ is interpreted as the fraction of ISAMPLEs to be used (GRDQ*ISAMPLE samples are used for theta gradient assessment). When GRDQ<0.0, then the number of samples used is ABS(GRDQ)*ISAMPLE/(Number of subjects with observations).
When GRDQ=0 (default), then all ISAMPLE samples are used to evaluate the theta gradients. Some experience suggests that if GRDQ is too low (<30 samples), the quality of standard error assessments may deteriorate, so some trial and error may be needed to determine to what extent the GRDQ can be reduced. Suggestion of the GRDQ algorithm courtesy of Robert Leary.
During estimation, the iteration output for importance sampling contains the objective function, average effective number of samples (eff.), actual samples (smpl., which is usually ISAMPLE), and average fitness (fit.) of proposal (sampling) density to the posterior density. The average effective number of samples per total samples is essentially the average ratio (or weights) of posterior to proposal density at all sample positions, and NONMEM tries to adjust the scaling of the variance to the proposal density so that on average, eff./smpl. approximates the IACCEPT value. The average fitness is the average absolute difference between the posterior density and proposal density among the samples, and indicates to what extent that the proposal density matches the posterior density. Typically if the IACCEPT is not too low, then the closer the shape of the posterior density is to a normal density, the higher the fitness, with perfect fitness being 1 if IACCEPT is set to 1.
GRID=(nr,ns,r0,r1)
See details in option STIELTJES. Briefly, a grid is obtained by first taking the interval [r0,r1] of the length axis and dividing this interval into nr equal subinter- vals. ns may be thought of as the number of points in a single quadrant of a 2-dimensional ellipse in n-space. Constraints are nr<=100, ns<=9999, 0<r0<r1<1. If r1>.9999, there is no tail region. nr and ns should be integers. The default values are: GRID=(3,1,.6,.9).
IACCEPT=[x|0]
The function of IACCEPT depends on the method.
- SAEM and BAYES: the scaling of OMEGA is adjusted so that samples are accepted x fraction of the time. See ISAMPLE_M2 option. Default is 0.4.
- Importance sampling: expand proposal (sampling) density variance relative to conditional density so that on average conditional density/proposal density=IACCEPT (default 0.4). For very sparse data or highly non-linear posterior densities (such as with categorical data), consider IACCEPT=0.1 to 0.3.
- IACCEPT=0: for importance sampling only. NONMEM will determine the most appropriate IACCEPT level for each subject, and if necessary, will use a t-distribution (by altering the degrees of freedom of the distribution for each subject) as well. In this case, the individual IACCEPT values and degrees of freedom (DF) values will be listed in "root.imp".
IACCEPTL=x (NM74)
If IACCEPTL is set to greater than 0 then NONMEM uses this value as a scale to a second multi-variate normal density, to cover long tails in the posterior density (hence L for long tails), in combination with the ACCEPT value to cover the posterior density near the mode. For one half of ISAMPLE samples, IACCEPT is used to scale a multivariate-normal proposal density, and for the other half of ISAMPLE samples, IACCEPTL is used to scale another multivariate normal proposal density. Thus, a mixture of two normal densities, with two different variance scales, are used as the proposal density. This serves as a pseudo t-distribution, but assuring radial symmetry as well as statistically independent samples, that may be useful when using with the Sobol method. This method has been suggested by Robert Leary, recommending IACCEPT=1.0 and IACCEPTL=0.01.
If IACCEPT is set to 0, and IACCEPTL is set to a value greater than 0, then a search for the best IACCEPTL for each subject is made, starting the testing at the IACCEPTL value given by the user, while IACCEPT is fixed to 1. The root.imp file will contain the final values selected for each subject, listing the two IACCEPT scale values, the first one being 1 for near the mode, and the second for the long tails.
IKAPPA=x (NM74)
The individual parameters are averaged using a weight N-IKAPPA for the Nth iteration (so a simple average with each iteration’s value equally weighted would be IKAPPA=1), in obtaining the mean and variance-covariance for the ISAMPLE_M1B mode. A value of 0.75 can sometimes provide an improved decorrelation efficiency when performing standard Bayesian analysis.
INTERACTION | NOINTERACTION
INTERACTION: The dependence on ETAs of the model for intra-individual random error is preserved in the computation of the objective function. INTERACTION cannot be used with CENTERING. With NONMEM 7.3, This is the default with EM/Bayes methods and is supplied if NOINTERACTION is not specified by the user. With NONMEM 7.4, INTERACTION is not supplied if LIKELIHOOD is present.
NOINTERACTION: Always set ETAs to 0 during the computation of the model for intraindividual random error. This is the default with non-EM and non-Bayes methods.
ISAMPEND=n (NM73)
For SAEM, if ISAMPEND is specified as an upper integer value (usually 10), then NONMEM will perform a ISAMPLE preprocess to determine the best ISAMPLE value. For the ISAMPLE preprocessing the used entered ISAMPLE value must be at least 2. It will perform 200 iterations during the ISAMPLE preprocess, and the last 50 iterations will be used to obtain average conditional variance/OMEGA (eta shrinkage) for each subject. The largest ETAshrinkage fraction*10 is the ISAMPLE for that subject. Thus,
|
|
assesses a best ISAMPLE for each subject. The ISAMPLE will not be higher than 10 or lower than 2.
See also STDOBJ below.
ISAMPLE=n
When used with the IMP or IMPMAP methods n is the number of random samples per subject used for the expectation step. Default n=300, but may require 1000-3000 for very sparse data.
When used with the SAEM or BAYES method n is the number of chains used by the Metropolis-Hastings algorithm for individual parameter estimation. The default n=2 for SAEM and n=1 for BAYES.
A kernel is the Metropolis-Hastings sampling and testing mode used. The ISAMPLE_Mx options define how many times to generate and test a sample for goodness-of-fit using a given kernel. ISAMPLE does not refer to a kernel, but defines the number of chains that are maintained, each chain having their own sample generation and testing sequence using the various kernels. Each chain retains a final sample for each subject, at the end of each iteration.
For each ISAMPLE, SAEM performs ISAMPLE_M1 mode (or kernel) 1 iterations using the population means and variances as proposal density, followed by ISAMPLE_M1B mode 1B iterations using the individual conditional mean and individual conditional variance collected from previous iterations as proposal density, followed by ISAMPLE_M1A mode 1A iterations, testing model parameters from other subjects as possible values (by default this is not used, ISAMPLE_M1A=0), followed by ISAMPLE_M2 mode 2 iterations, using the present parameter vector position as mean, and a scaled variance of OMEGA as variance [5]. Next, ISAMPLE_M3 mode 3 iterations are performed, in which samples are generated for each parameter separately. The scaling is adjusted so that samples are accepted IACCEPT fraction of the time. The final sample for a given chain is then kept. The average of the isample parameter vectors and their variances are used in updating the population means and variances. Usually, these options need not be changed.
The ISAMPLE_M1A method of sampling has limited use to assist certain subjects to find good parameter values by borrowing from their neighbors, in case the neighbors had obtained good values while the present subject has difficulty finding good samples. This mode should generally not be used, and can be inaccurate if not all subjects share the same and , such as in covariate modeling. Alternatively, use mode 1A sampling at the beginning of an SAEM analysis for a few burn in iterations, then continue with a complete SAEM analysis with mode 1A sampling turned off, with more burn in and accumulated sampling iterations, for example:
|
|
As of NM75, If the option MCETA is set to greater than ISAMPLE_M1, then for the first iteration of SAEM, ISAMPLE_M1, ISAMPLE_M2, and ISAMPLE_M3 will be set to MCETA, to facilitate a robust initial search for reasonable parameter values.
ISAMPLE_M1=n
The number of mode 1 iterations for the Metropolis-Hasting algorithm for estimating individual parameters using the population means and variances as proposal density. Used with the SAEM and BAYES methods. Default n=2.
For Bayesian analysis, the MCMC algorithm performs ISAMPLE_M1 mode 1 iterations using the population means and variances as proposal density, followed by ISAMPLE_M1B mode 1B iterations using the individual conditional mean and individual conditional variance collected from previous iterations as proposal density, followed by ISAMPLE_M1A mode 1A iterations, testing model parameters from other subjects as possible values (by default this is not used, ISAMPLE_M1A=0), followed by ISAMPLE_M2 mode 2 iterations, using the present parameter vector position as mean, and a scaled variance of OMEGA as variance [11]. Next, ISAMPLE_M3 mode 3 iterations are performed, in which samples are generated for each parameter separately. The scaling is adjusted so that samples are accepted IACCEPT fraction of the time. The final sample for a given chain is then kept. Usually, these options need not be changed. There is only one chain of samples produced for a given NONMEM run (ISAMPLE is not used for MCMC, only for SAEM). If you would like additional chains, then create separate control stream files with different starting seed numbers.
As of NM75, if the option MCETA is set to greater than ISAMPLE_M1, then for the first iteration of SAEM, ISAMPLE_M1, ISAMPLE_M2, and ISAMPLE_M3 will be set to MCETA, to facilitate a robust initial search for reasonable parameter values.
ISAMPLE_M1A=n(NM72)
n is the number of mode 1A iterations for the Metropolis-Hasting algorithm, testing model parameters from other subjects as possible values. Used with the SAEM and BAYES methods. Default is 0.
ISAMPLE_M1B=n (NM74)
n is the number of mode 1B iterations for the Metropolis-Hasting algorithm Default is 2.
ISAMPLE_M2=n
n is the number of mode 2 iterations for the Metropolis-Hasting algorithm for estimating individual parameters using the current parameter vector position as mean and a scaled variance of OMEGA as variance. Used with the SAEM and BAY
ISAMPLE_M3=n
n is the number of mode 3 iterations for the Metropolis-Hasting algorithm for estimating individual parameters in which samples are generated for each parameter separately. Used with the SAEM and BAYES methods. Default is 2.
ISCALE_MIN, ISCALE_MAX (NM72)
In importance sampling, the scale factor used to vary the size of
the variance of the proposal density in order to meet the IACCEPT
condition is by default bounded by ISCALE_MIN=0.1 and
ISCALE_MAX=10.0. On rare occasions, the IMP objective function
varies widely, and the scale factor boundary may need to be
reduced, e.g. ISCALE_MIN=0.3, ISAMPLE_MAX=3. After the IMP
estimation, remember to revert these parameters to default
operation on the next $EST step:
|
|
In MCMC sampling (SAEM and BAYES), the scale factor used to vary the
size of the variance of the proposal density in order to meet the
IACCEPT condition, is by default bounded by ISCALE_MIN=1.0E-06 and
ISCALE_MAX=1.0E+06. This should left alone for MCMC sampling, but on
occasion there may be a reason to reduce the boundaries (e.g., to
ISCALE_MIN=0.001, ISAMPLE_MAX=1000). Similar to in IMP, after the
saem estimation method, remember to revert these parameters back to
default operation on the next $EST step.
KAPPA=x (NM74)
The parameters are averaged using a weight N-KAPPA (also called PKAPPA) for the Nth iteration (so a simple average with each iteration’s value equally weighted would be KAPPA=1), in obtaining the mass matrix. A value of 0.75 gives the best results when preparing a mass matrix during the BAYES step, in anticipation of the NUTS step.
KNUTHSUMOFF=n] (NM74)
The Knuth summing method is used to allow the most accurate summation of individual objective function values, even with large
variations in values of the individual objective function. To
turn this off, and allow a standard summation (not recommended
except for comparison purposes from earlier versions), set KNUTHSUMOFF=1. With KNUTHSUM algorithm on by default, the SORT option
is not necessary. Default is 0. May also be set with $COVARIANCE record.
LAPLACIAN | NOLAPLACIAN
LAPLACIAN: use the Laplacian method, in which second derivatives
with respect to eta are used. Laplacian may not be used with
METHOD=0. It may be used with the EM/Monte Carlo methods, in
which case the Laplacian option will be properly utilized, such as
during MAP estimation used during IMP, IMPMAP, and ITS, or
ignored, such as during SAEM or BAYES. Cannot be used with
$ABBREVIATED DERIV2=NO unless NUMERICAL option is also specified.
NOLAPLACIAN: do not use the Laplacian method. This is the default.
LEVCENTER=[0|1] (NM75)
There is no default. Required with $LEVEL and $ESTIMATION. If
LEVCENTER=1, this ensures the ETAs of super ID random levels sum
to 0. In earlier versions of NONMEM, this was the default (and
only) action. To obtain similar results as earlier versions of
NONMEM, set LEVCENTER=1.
If LEVCENTER=0, level ETAs are not forced to sum to 0.
See Cookbook: Nested random effects for examples.
LEVOBJTYPE=[0|1] (NM751)
If LEVOBJTYPE=0, then for IMP/ITS/FOCE/Laplace, the reported ojective function (and what is used for estimation) is the average of two cycles of optimization. The first optimization floats all ETAs, and the second optimization fixes the super-ID ETAs to the appropriate average of ETAs per super-id group from the first optimization, and then floats the ID (and below) level ETAs. If LEVOBJTYPE=1, then the OBJ reported (and used for estimation) is only that of the second optimization cycle. The LEVOBJTYPE=0 provides the most consistent results between FOCE/Laplace and the EM and Bayes methods, particularly with super-ID levels higher than 1 (so, site ID and country ID, for example). The LEVOBJTYPE=1 provides an OBJ more consistent with theory.
By default, NONMEM sets the estimation process to SLOW when a
$LEVEL option is used. As of nm751, you may set NOSLOW or FAST.
While there is an increase in speed, but there may not be as good
optimization progress as with the SLOW option. It may be best to use
SLOW for the $EST step, and use NOSLOW or FAST for the $COV step.
See Cookbook: Nested random effects for examples.
LEVWT=n (NM74)
This option applies when $LEVEL record is present. By default,
LEVWT=0, and weights each level value equally, regardless of number of subjects per level value. To weight according to number of
subjects for that value, set LEVWT=1.
LIKELIHOOD | -2LOGLIKELIHOOD
- LIKELIHOOD: designed mainly for use with discrete observed
responses. Indicates that Y (with NM-TRAN
abbreviated code) or F (with a user-supplied PRED or ERROR code)
will be set to a (conditional) likelihood. Upon simulation it
will be ignored, and the DV data item will be set directly to the
simulated value in abbreviated or user code. Also ETAs, if any,
are understood to be population ETAs. Epsilon variables and the
$SIGMArecord may not be used. The L2 data item may not be used. The CONTR and CCONTR options of the$SUBROUTINESrecord may not be used. NONMEM cannot obtain the initial estimate for omega. If the data are population, and MAXEVALS=0 is not coded, then NOINTERACTION is required. Compare with PREDICTION option. - -2LOGLIKELIHOOD: Y (with NM-TRAN abbreviated code) or F (with a user-supplied PRED or ERROR code) is a -2 log (conditional) likelihood. All remarks for LIKELIHOOD apply. May also be coded "-2LLIKELIHOOD". Compare with PREDICTION option.
MADAPT=n (NM74)
If MADAPT/=-1 (also called PMADAPT), the mass matrix is accumulative
average updated throughout the NUTS analysis during the burn-in
period, the update occurring every NUTS_BASE iterations, for the first
MADAPT iterations for the parameters, then changes no further after
that. For example, if NBURN=300, NUTS_BASE=20, and MADAPT=200, then
during the first 20 burn-in iterations, an identity mass matrix is
used because there is no past history of samples collected yet, then
during the next 20 burn-in iterations the mass matrix derived from the
first twenty burn-in iterations is used, then for following 20 burn-in
iterations, the mass matrix from the first 40 burn-in iterations are
used, etc. If KAPPA/=1, the samples of each are not evenly weighted,
but are weighted according to the description of KAPPA below. If
MASSRESET=0 because the previous $EST METHOD=BAYES … MASSRESET=1
estimation from which an initial mass matrix was derived, then the
first 20 burn-in iterations use this BAYES pre-prepared mass matrix,
the next 20 burn-in iterations use a mass matrix derived from an
average of the pre-prepared BAYES mass matrix (weighted as 2*number of
population parameters to be estimated) plus the first 20 burn-in
iterations (weighted as 20), the next 20 burn-in iterations use a
mass matrix derived from an average of the pre-prepared BAYES mass
matrix (weighted as 2*number of population parameters to be estimated)
plus the first 40 burn-in iterations (weighted as 40), etc. This
weighting is modified according to the KAPPA value used.
In the following code portion (from examplesνts_example6h.ctl), a brief iterative 2 stage is performed to center the parameters quickly in the stationary zone, followed by a BAYES estimation to quickly create a starting mass matrix, followed by the NUTS estimation, that uses the BAYES derived mass matrix as a starting point:
|
|
In the above setting, the mass matrix is updated every 20 (NUTS_BASE) iterations during the entire 100 NBURN iterations (since PMADAPT=NBURN).
In another example, (from examplesνts_example6g.ctl), the mass matrix from the BAYES estimation is used directly for the stationary sampling phase of the NUTS estimation, as there is no burn-in period for the NUTS estimation (NBURN=0):
|
|
If MADAPT=-1, then the STAN method of warmup and mass matrix accumulation is used (according to the STAN manual [20], Optimization Algorithms section). When using MADAPT=-1, the tuning options, NUTS_INIT, NUTS_BASE, and NUTS_TERM are useful. Use of the AUTO option allows for easy setup of the STAN method of matrix accumulation without having to deal with every one of these options. An example of this is examplesνts_example6f.ctl:
|
|
The AUTO=1 option (TODO: see section I.46 Some General Options and Notes Regarding EM and Monte Carlo Methods for more details on the AUTO feature, AUTO=0 (default) (NM73)) automatically sets the various options to appropriate default values (including setting MADAPT=-1) for Stan type mass matrix development, unless over-written by the user, as is done above for NUTS_INIT (default is 75), to reduce computation time. Using the Stan warmup method for creating the mass matrix takes longer than using the BAYES pre-analysis BAYES as done in nuts_example6g.
MAPCOV=1 (NM74)
For iterations for which the MAP estimation is performed, by default (MAPCOV=1), the MAP estimated mode is used as the center (mean) for the sampling density, and the first order (or second order if Laplace option is used) approximate conditional variance is used as the variance of the sampling density. If MAPCOV=0, then only the mode is used for the sampling density’s center, and the Monte Carlo assessed variance of the previous iteration is used as the sampling density’s variance. If MAPCOV=2, then the Monte Carlo assessed conditional mean of the previous iteration is used, but the MAP first order (or second order if Laplace option) assessed variance is used for the sampling density. This option has been added for experimental purposes, and has no value for the user. It should be left at its default value of 1.
MAPINTER=n (NM72)
Every nth iteration, the MAP estimation should be used to provide parameters to the sampling density. Thus, if MAPITER=20 and MAPINTER=5, then for the first 20 iterations, MAP estimation is used, and thereafter, every 5th iteration the MAP estimation is used. If MAPINTER=-1, then mapinter will be turned on only if the objective function increases consistently over several iterations. Setting any of the above parameters to -100 will force NONMEM to select the default value for that parameter.
MAPITER=n (NM72)
By default, MAP estimation is performed only on the first iteration, to obtain initial conditional values (modes and approximate variances) to be used for the sampling density. Subsequently, the Monte Carlo assessed conditional means and variances from the previous iteration are used as parameters to the sampling density. However, the user can select the pattern by which MAP estimations are intermittently done, and their conditional statistics used for the sampling density. MAPITER=n means the first n iterations are to use MAP estimation to assess parameters for the sampling density. After these n iterations, the Monte Carlo conditional means and variances of the pervious iteration are used for the sampling density parameters of the present iteration. If MAPITER=0, then the first iteration will rely on conditional means and variances that are in memory. These may have come from an MSF file, or from a previous estimation step.
MAPITERS=[0|1] (NM75)
By default, no MAP estimation is performed with SAEM or BAYES. To get good individual parameter values near the mode of the posterior density for the first iteration of SAEM, you can set MAPITERS=1. Alternatively, you can insert the record:
|
|
followed by
|
|
or
|
|
MASSRESET=n (NM74)
By default mass matrix information accumulation is turned off. However, MASSRESET=1 should be set when performing a Gibbs/MH Baysian analysis to initialize accumulation of mass matrix information, followed by the NUTS algorithm, with MASSRESET=0 set, so the mass matrix accumulator information is not reset to 0, but adds to the information acquired during the previous Gibbs/MH process. IF you have a BAYES record setup previous to the NUTS record, make sure to set the NUTS_MASS value at the BAYES record first, so it accumulates the correct type of mass matrix information, and allocates the appropriate memory for its storage. See the description of MADAPT=-1 (default) below.
MAXEVALS=n
Maximum allowable number of evaluations of the objective function
during the Estimation Step. Default: a generous number. (Each
evaluation of the objective function requires one pass through
the data set. This is also referred to as a "function evaluation.") MAXEVALS=-1 may be specified when a $MSFI record is
present. It requests that NONMEM re-use the value from the previous run, and is the default with $MSFI.
MAXEVALS=0 requests that the Estimation Step be omitted. This is useful, for example, with POSTHOC (see above).
MCETA=n (NM73)
- MCETA=0: ETA=0 is initial setting for MAP estimation (ETA optimization). This is the default.
- MCETA=1: ETA=values of previous iteration is initial setting for MAP estimation, or ETA=0, whichever gives lower objective function.
- MCETA>1: MCETA-1 Random samples of ETA, using normal random distribution with variance OMEGA, are tested. In addition the previous ETA as well as ETA=0 are tested. The initial setting is taken as the ETAs that give the lowest OFV.
LNTWOPI (NM74)
The objective function is reported including the \(N\log(2\pi)\) constant, where N is the total number of normally distributed data values in the data set. See also OLNTWOPI (can be used at the same time).
MSFO=filename
A Model Specification File is output to a file with the given filename. Filename may not contain embedded spaces. If filename contains commas, semicolons, or parentheses, then it must be surrounded by quotes (' or "). Filename may also contain equal signs if it is enclosed in quotes. Filename may contain at most 71 characters. If filename is the same as any option of the $ESTIMATION record, it must be enclosed in quotes. If the $NONPARAMETRIC record is present and also specifies the MSFO option, the filename is required on the record which appears first in the control stream. If filename is present on both, it must be the same. If the filename is omitted on the second of the two records, the MSF option must be the final option on that record. Default: If the MSFO option is not used, no MSF is output.
If a MSFO is output, then the iteration estimates may also be seen in the original parameterization for those iterations whose summaries appear in intermediate printout. These estimates may be found in file INTER.
When MAXEVAL=0 and the Covariance Step is implemented, the MSFO option may also be used, and then a Model Specification File will be output which will include information from the Covariance Step and from the input model specification file concerning the earlier Estimation Step (in this case there must be an input model specification file).
MUM=[M|N|D|X]…
Specify a string consisting of [M|N|D|X]'s with each letter representing a THETA parameter in numerical order. M indicates that the THETA should be Mu modeled. N indicates the THETA should not be Mu modeled. D indicates the program will decide if the parameter should be Mu modeled or not. X indicates that THETA is involved in a covariate-dependent mixture model and is required if this is the case. Default is DDDD…
An alternative syntax may be used:
|
|
V is a letter (N,M,D, or X), and n is a number list. For example, to specify that THETAs 3 and 5 through 8 should not be MU modeled, theta 2 is a population mixture parameter, and THETAs 6 and 12 are to be MU modeled:
|
|
NBURN=n
When used with the SAEM method n is the maximum number of iterations used to perform the stochastic phase. Default n=2000. During this time, the advance of the parameters may be monitored by observing the results in file specified by the FILE parameter (described later in the Format of Output Files section), and the advance of the objective function (SAEMOBJ) at the console may be monitored. When all parameters or the SAEMOBJ do not appear to drift in a specific direction, but appear to bounce around in a stationary region, then it has sufficiently “burned” in. A termination test is available (described later), that will give a statistical assessment of the stationarity of objective function and parameters. Note that the objective function SAEMOBJ that is displayed during SAEM analysis is not valid for assessing minimization or for hypothesis testing. It is highly stochastic, and does not represent a marginal likelihood that is integrated over all possible eta, but rather, is the likelihood for a given set of ETAs.
When used with the BAYES method n is the maximum number of iterations used to perform the burn-in phase. Default n=4000. During this time, the advance of the parameters may be monitored by observing the results in file specified by the FILE parameter, and/or the objective function displayed at the console. The objective function progress is also written in OFV.TXT, and the report file. Full sets of population parameters and likelihood functions are also written in the file specified with the FILE option. When all parameters and objective function do not appear to drift in a specific direction, but appear to bounce around in a stationary region, then it has sufficiently “burned” in. A termination test may be implemented to perform a statistical assessment of stationarity for the objective function and parameters. As mentioned earlier, the objective function (MCMCOBJ) that is displayed during BAYES analysis is not valid for assessing minimization or for hypothesis testing in the usual manner. It does not represent a likelihood that is integrated over all possible ETA (marginal density), but the likelihood at a given set of ETAs.
NITER=n
When used with the ITS, IMP, and IMPMAP methods n is the maximum number of iterations. Default n=50.
When used with the SAEM method n is the number of iterations for the non-stochastic accumulation phase. Default n=1000.
When used with the BAYES method n is the number of iterations used to obtain the stationary distribution. Default n=10000. NITER may also be coded NSAMPLE.
NOCOV=[0|1] (NM73)
If covariance estimation is not desired for a particular estimation step, set NOCOV=1. It may be turned on again for the next
estimation step with NOCOV=0. If NOCOV=1 is set for an
FOCE/Laplace/FO step, this is equivalent to $COV NOFCOV setting.
For ITS and IMP, covariance estimation can take some time for
large problems, and you may wish to obtain only the objective
function, such as in the case of $EST METHOD=IMP EONLY=1 after an
SAEM estimation. NOCOV has no effect on BAYES analysis, as no
extra time is required in assessing covariance for BAYES.
By default, standard error information for the classical methods (FO/FOCE/Laplace) will be given only if they are the last estimation method, even if NOCOV=0 for an intermediate estimation step. If NOCOV=1 for the FOCE/LAPLACE/FO method, and it is the last estimation step, then standard error assessment for it will be turned off.
NOLABEL=[0|1]
1 indicates that the row of item names for FILE will not be written, otherwise 0, the default. Affects the raw output file and all additional output files.
NONINFETA=[0|1] (NM73)
NONMEM has traditionally not assessed post-hoc eta hat (also known as
empirical Bayes Estimates, EBE’s, conditional mode ETAs, or
conditional parametric ETAs (CPE)), if the derivative of the data
likelihood with respect to that eta is zero for a given subject, and
simply specified that eta as zero. This eta is called a
non-influential eta. The true EBE is zero anyway, if this eta is not
correlated by an off-diagonal omega element with an eta that is
influential. If the non-influential eta is correlated with an
influential eta, then the true EBE of the non-influential eta will in
general not be 0. When NONINFETA=0, the default, then this
traditional algorithm is in effect, so that all non-influential ETAs,
even those correlated with influential ETAs, will be reported as 0
when outputted with $TABLE. However, if NONINFETA=1, then all ETAs
are involved in the MAP estimation, regardless of their influence.
This will result in non-influential ETAs reported as a non-zero value,
if it is correlated with influential ETAs. From a pure statistical
stand-point, this is the true EBE, although intuitively it may be
puzzling for some users. Whether NONINFETA=1 or 0, the individual’s
objective function will change very little if at all, because NONMEM
provides a corrective algorithm to assess the correct objective
function. But for purposes of post-hoc evaluated ETAs, one may wish
to set NONINFETA depending on the desired interpretation. The
NONINFETA option applies only to FO/FOCE/Laplace. The Monte Carlo and
EM methods have always used (even with earlier versions of NONMEM 7)
the pure statistical option (NONINFETA=1)
NOPRIOR=[0|1]
If prior information was specified using the $PRIOR statement, then
normally the analysis is set up for three stage hierarchical analysis.
By default NOPRIOR=0, and this prior information will be used. If
NOPRIOR=1, then for the particular estimation, the prior information
is not included in the analysis. This is useful if you do not want to
use prior information during a maximization (METHOD=IMP, CONDITIONAL,
IMPMAP, SAEM, or ITS), but then use it for the Bayesian analysis
(METHOD=BAYES). With NOPRIOR=1, FOCE is still allowed to evaluate
an S MATRIX, since prior information is not used, i.e., $EST NOPRIOR=1 and $COV MATRIX=S are permitted.
With NONMEM 7.3, when NOPRIOR=1 is set, the estimation will not use TNPRI prior information (TNPRI should only be used with FO/FOCE/Laplace estimations). In previous versions of NONMEM, NOPRIOR=1 did not act on TNPRI priors.
NOSUB=[0|1] (NM74)
With NOSUB=0, label substitution will be performed for final estimates in the NONMEM report file. (See $ABBREVIATED). This is the default. With NOSUB=1, label substitution will not be performed.
NOTITLE=[0|1]
1 indicates the header for FILE will not be written, otherwise 0, the default. Affects the raw output file and all additional output files.
NUMDER=[0|1|2|3] (NM73)
NUMDER=1: NONMEM computes and displays numerically evaluated derivatives of Y or F with respect to eta and eps (G and H). These numeric values are displayed in root.fgh, but are not used in estimation.
NUMDER=0: file root.fgh is not produced. This is the default.
NUMDER=2:, analytical derivatives values are stored in root.agh.
NUMDER=3: both root.agh and root.fgh are produced.
NUMERICAL | NONUMERICAL
NUMERICAL: requests that second eta-derivatives for the Laplacian method be obtained numerically.
NONUMERICAL: requests that second eta-derivatives for the Laplacian method be computed by PRED. Not permitted with the combination LAPLACIAN and INTERACTION. Otherwise, this is the default.
NUTS_BASE (NM74)
When using the STAN algorithm (MADAPT=-1) for mass matrix and step size development during the burnin-in (warmup) stage, NUTS_BASE (if NUTS_BASE>=1) or NUTS_BASE*NBURN (if NUTSBASE<1) serves as the number of iterations for the first Stage II of the warmup period ([20]), and doubles in iteration number with each subsequent segment of Stage II. The total number of stage II iterations is bounded by NBURN-NUTSINIT-NUTS_TERM. If NUTS_INIT=75, NUTS_BASE=25, NUTS_TERM=150, and NBURN=1000, then you have 5 segments of stage II, so that
NUTS_INIT+NUTS_TERM+NUTS_BASE+NUTS_BASE*2+NUTS_BASE*4=NBURN= 75 + 150 +25 +25*2+25*4+25*8+25*16=1000
When MADAPT>0, this period is also used to update the mass matrix every NUTSBASE iterations until MADAPT total iterations have been performed.
If NUTS_BASE<=-1.0, then NUTS_BASE will be set to the largest block section of the mass matrix plus 10. This assures that a large enough base set of samples are collected before the mass matrix is used.
If NUTS_BASE<-1, then in addition, the number of stage II iterations is ABS(NUTS_BASE). The actual NBURN will be based on the above equation, but not to exceed the user specified NBURN, which serves as the max NBURN. With NUTS_BASE<-1.0, set NBURN to a large number (4000 or so). The AUTO feature set NUTS_BASE to -3.
Default is 0.025.
NUTS_DELTA (NM74)
This is essentially the sample acceptance rate for the NUTS sampling process, equivalent to PACCEPT in standard MH sampling. NUTS experts recommend 0.8.
NUTS_EPARAM (NM74)
When NUTS_EPARAM=0, parameters are entered into the NUTS algorithm parameterized as THETAs and phis, Cholesky. When NUTS_EPARAM=1, parameters are entered into the NUTS algorithm parameterized as THETAs and ETAs. When NUTS_EPARAM=2, parameters are entered into the NUTS algorithm parameterized as THETAs and Choleksy of OMEGA inverse*ETA. Default 0.
NUTS_GAMMA (NM74)
Gamma factor in the NUTS algorithm. Should not be changed. Default 0.05.
NUTS_INIT (NM74)
When using the STAN algorithm (MADAPT=-1) for mass matrix and step size development during the burn-in (warmup) stage, when NUTS_INIT<1 serves as the fraction of NBURN iterations for Stage I of the warmup period ([2]). When NUTS_INITS>1, then the explicit number of iterations is interpreted. Similarly, when MADAPT>0, this period is also used to accumulate NUTS_INIT*NBURN iterations before using the mass matrix. Default 0.075.
NUTS_MASS (NM74)
- NUTS_MASS=B (default): uses a block diagonal mass matrix for scaling its search. The THETAs/SIGMAs/OMEGAs and their correlations will be scaled with one block matrix, and parameter sets of each individual will have their own block matrix correlations. Correlations between THETAs and individual parameters will not be accounted for. Most efficient.
- NUTS_MASS=F: full mass matrix. A full correlation matrix between THETAs, OMEGAs, SIGMAs, and individual parameters among all individuals will be accounted for. Computationally very expensive. When using METHOD=BAYES as a preparation for NUTS, as shown in example stanrb_177.ctl, make sure you set the NUTS_MASS (if you will be using something different from the default B value) at the METHOD=BAYES record, as well as MASSRESET=1, as this is required to set the appropriate memory allocation, and store the posterior variance-covariance (mass matrix) information that the NUTS algorithm will then use.
- NUTS_MASS=D: diagonal mass matrix. No correlations between parameters will be considered.
- NUTS_MASS=BD: Block mass matrix covering THETAs, SIGMAs, OMEGAs, and diagonal mass matrix on individual parameters.
- NUTS_MASS=DB: diagonal mass matrix covering THETAs, SIGMAs, OMEGAs, and block mass matrix on individual parameters.
- NUTS_MASS=BBD: THETAs and SIGMAs are blocked together, OMEGAs are in their own block, and individual parameters are diagonal.
- NUTS_MASS=BBB: THETAs and SIGMAs are blocked together, OMEGAs are in their own block, and individual parameters are blocked within each subject.
The mass matrix is generated by accumulating previous samples of parameters and obtaining their variance-covariance, so that the NUTS algorithm performs an efficient search in the domain of the empirical posterior density. It is best to acquire this mass matrix by first performing a couple of thousand iterations using standard Gibbs Bayesian analysis, followed by a NUTS process.
In the following we discuss the combinations of NUTS_MASS, NUTS_EPARAM, NUTS_OPARAM, and NUTS_SPARAM options.
- NUTS_EPARAM=0, NUTS_MASS=B Default. The most efficient for many of the problems tested so far. They offer the greatest speed efficiency and sampling (Neff/Nsample) efficiency. On occasion, one or two THETAs will have low efficiencies relative to the rest. The AUTO=1 option allows an easy setup of this configuration (see section I.46 Some General Options and Notes Regarding EM and Monte Carlo Methods for more details on the AUTO feature, AUTO=0 (default) (NM73)). Example stanrb42.ctl uses the AUTO=1 feature.
- NUTS_EPARAM=1, NUTS_MASS=D When the problem is submitted to the NUTS algorithms with ETAs rather than phis (NUTS_REPARAM=1), the NUTS_MASS=B does not yield good efficiencies on THETAs. Therefore, NUTS_MASS=D needs to be used. However, this reduces speed efficiencies by about 5 fold, but evens out the theta efficiencies so the lowest Neff/Nsample efficiency is about 3x higher than the lowest sampling efficiency of the NUTS_EPARAM=0 setting. The AUTO=3 option allows an easy setup of this configuration.
- NUTS_EPARAM=2, NUTS_MASS=BD NUTS_EPEARM=2 requests transformation from a centered to a non-centered parameterzization [2,3], and offers very high sampling efficiencies of 3 to 5 fold than that of the default settings. However, the speed is about 4-8 times slower from the default settings when performing NUTS in NONMEM, so there may or may not be greater overall efficiency, in terms of number of independent samples per unit time. Furthermore, this method uses a conditional likelihood equation that differs from the standard that the population analysis community is used to: It drops the NIND*LOG(DET(OMEGA)) term (where NIND=number of subjects), resulting in a conditional likelihood that has very different distribution properties and does not fully represent the contribution of the individual ETAs to the likelihood. An example of using NUTS_EPARAM=2 is ..\examples\stanrb19. The AUTO=2 option allows an easy setup of this configuration, and example ..\examples\stanbrb39 uses this option.
NUTS_MAXDEPTH (NM74)
This sets the maximum number of total branchings to try in the NUTS algorithm in the search for the next decorrelated sample. If many messages are received of reaching the maximum buildtree level, increase NUTS_MAXDEPTH.
NUTS_OPARAM (NM74)
When NUTS_OPARAM=1, Omega elements are parameterized in a correlation cholesky format that constrains correlations to be between -1 and 1. When NUTS_OPARAM=0, then the full Omega elements are parameterized in Cholesky format. Default 1.
NUTS_REG (NM74)
By default, the mass matrix is made slightly diagonal dominant by adding a fraction of the diagonal element. If NUTS_REG>0.0, then the mass matrix is made slightly diagonal dominant by adding the value of NUTS_REG. When OLKJDF>0, then NUTS_REG=1.0 may provide a more efficient sampling process. Default 0.0.
NUTS_SPARAM (NM74)
When NUTS_SPARAM=1, Sigma elements are parameterized in a correlation cholesky format that constrains correlations to be between -1 and 1. When NUTS_SPARAM=0, then the full SIGMA elements are parameterized in Cholesky format. Default 1.
NUTS_STEPINTER (NM74)
An initial step size is calculated every NUTS_STEPINTER iterations. Default is 0.
NUTS_STEPITER (NM74)
An initial step size is calculated for the first NUTS_STEPITER iterations. Default 1.
NUTS_TERM (NM74)
When using the STAN algorithm (MADAPT=-1) for mass matrix and step size development during the burn-in (warmup) stage, NUTS_TERM serves as the number of iterations for Stage III of the warmup period ([2]), to make final adjustments in step size. Default 0.05.
NUTS_TEST (NM74)
NUTS_TEST=0 (default) tests the acceptance as in the original NUTS algorithm [1]. NUTS_TEST=1 uses the default Stan algorithm.
NUTS_TRANSFORM (NM74)
When NUTS_TRANSFORM=0, model parameters are transformed using the mass matrix, for population parameters. If NUTS_TRANSFORM=1, the momentum parameters are transformed using the mass matrix. It is best to set NUTS_TRANSFORM to 1 if NUTS_TEST is set to 1, and NUTS_TRANSFORM should be set to 0 when NUTS_TEST is set to 0. Default 0.
OACCEPT=n
Used only with the BAYES method. n has meaning only for OMEGA sampled by the Metropolis-Hastings algorithm. The scaling of degrees of freedom is adjusted so that samples are accepted n fraction of the time. See OSAMPLE_M2 option. Default is
OLKJDF=n (NM74)
Used with NUTS method, OLKJDF stands for OMEGA LKJ density degrees of freedom. When 0, the usual inverse Wishart prior is used for Omegas. When OLKJDF>0, then the LKJ density is used as the prior, with OLKJDF degrees of freedom for all omega blocks. In addition, only diagonal elements of the OMEGA prior are used, assuming a density dependent on the OVARF value. OLKJDF may be set >0 when using METHOD=BAYES as well, but THETAs will then be M-H sampled using the OSAMPLE_M1, OSAMPLE_M2, and OSAMPLE_M3 settings. Default is 0.
See $OLKJDF to specify LKJ correlation degrees of freedom for each OMEGA block.
Using LKJ correlation priors for supporting OMEGAs, rather than inverse Wishart priors, is reasonable when using LKJ decorrelation as an uninformative prior, and there is no previous knowledge of the scale of the variances. For example, one could use Identity matrix for the Omega priors (diagonal OMEGA values=1, and very low degrees of freedom, and the LKJ decorrelation prior will not introduce much bias, whereas the inverse Wishart prior would introduce considerable bias. However, the uninformative inverse Wishart prior is reasonable to use if the diagonal Omega values are in a reasonable range of where the data are. In PK/PD modeling, we have the benefit of obtaining reasonable variances from first performing a maximum likelihood analysis, (using FOCE, ITS, IMP, ITS, or SAEM), which often do not require any priors, and then supplying these results as priors for the NUTS analysis, as long as the degrees of freedom is set to <=D, the dimension of the block matrix. In such cases, the Inverse Wishart as an uninformative prior (DF<=D), but with the variances obtained from an earlier maximum likelihood analysis on the same data, is equivalent to LKJ correlation prior in terms of quality and lack of bias.
When using OMEGA information from a previous study to supply as an informative prior to a present study, the Inverse Wishart format of the prior information conjugates well with the information in the cross products of ETAs provided by the present study, and the informative prior information from the previous study offers a natural statistical support, as if the data of the previous study were added to the present study. Furthermore, the inverse Wishart prior supplies the information of the entire block, off-diagonals and diagonals, whereas the LKJ correlation prior method (OLKJDF>0) only uses the diagonal elements, and some general notion of correlation in the OLKJDF value. Such a natural interpretation for the inverse Wishart is evident in the mathematical structure of the total likelihood, when using NUTS_EPARAM=0 or NUTS_EPARAM=1, and Omega priors with inverse Wishart distribution.
OLNTWOPI (NM74)
The objective function is reported including the
\(N_{\eta} N_{\text{ind}}\log(2\pi)\) constant term for SAEM and BAYES,
where \(N_{\eta}\) (NETA)
is the number of ETAs, and \(N_{\text{ind}}\) (NIND) is number
of individuals. See also LNTWOPI (can be used at the same time).
THETABOUNDTEST, OMEGABOUNDTEST, SIGMABOUNDTEST
With NONMEM VI, the estimation step sometimes terminates with the message PARAMETER ESTIMATE IS NEAR ITS DEFAULT BOUNDARY. These options request that the "default boundary test" be performed for THETA, OMEGA, and SIGMA, respectively. THETABOUNDTEST may also be coded TBT or TBOUNDTEST; OMEGABOUNDTEST may also be coded OBT or OBOUNDTEST; SIGMABOUNDTEST may also be coded SBT or SBOUNDTEST. These options are the defaults.
NOTHETABOUNDTEST, NOOMEGABOUNDTEST, NOSIGMABOUNDTEST
Instructs NONMEM to omit the "default boundary test" for this type of variable, i.e., to behave like NONMEM V in this regard. Any option listed above may be preceded by "NO". The THETA, OMEGA, and SIGMA choices are independent of each other. E.g., it is possible to specify NOOBT (to prevent the "default OMEGA boundary test") and permit both the "default THETA boundary test" and "default SIGMA boundary test".
OMITTED
The Estimation Step is not implemented.
OPTMAP=n (NM73)
For alternative MAP (eta optimization) methods. With OPTMAP>0, SLOW option may be needed.
- OPTMAP=0: standard variable metric (Broyden, Fletcher, Goldfarb, and Shanno (BFGS)) optimization method used by NONMEM to find optimal eta values (referred to as eta hat) for each subject at the mode of their posterior densities, using analytical derivatives of F with respect to ETAs (G), and analytical derivatives of F with respect to ETAs (H), that were supplied by NMTRAN or by the user. This is the default.
- OPTMAP=1: variable metric method using numerical finite
difference methods for first derivatives of F with respect to
ETAs. Necessary when not all code used in evaluating F, G and H
for observation event records is abbreviated code (some may be in
verbatim code), and/or some portions of the computation of F,
G and H are evaluated in a hidden subroutine specified by
$SUBROUTINES OTHER=and the user-written code does not compute the eta derivatives. When OPTMAP=1 is present, values of G and H are ignored during eta optimization. This may be used to test user-coded deriatives, because two runs, one with OPTMAP=1 and one without it, should give very similar values for the OBJV, WRES, etc. if the user-coded derivatives are correct. - OPTMAP=2: Nelder-Mead method, which uses a secant method, rather than relying on derivatives.
ORDER=xxxf
The values of x may be T (Theta), S (Sigma), and O (Order). The value of f may be U (Upper) or L (Lower). Affects the way theta, omega, and sigma are displayed in the raw and additional output files. xxx gives the overall order, and f gives the order within OMEGA and SIGMA. Affects the raw output file and all additional output files. The default is TSOL: THETA, SIGMA, OMEGA in Lower triangular form. Does not affect the NONMEM report file.
On the $OMEGA record, elements of Omega are given in lower triangular order, e.g.,
|
|
This can also be coded as
|
|
The NONMEM report file is not consistent. The initial and final parameter estimates are printed in Lower triangular order, e.g.,
|
|
However, in the intermediate printout of the Estimation step, OMEGA and SIGMA are in Upper triangular order. This was not evident with previous version of NONMEM, because the PARAMTER values corresponding to OMEGA and SIGMA are displayed as unconstrained parameters (UCP), and these are not in a 1-1 mapping with the elements of OMEGA and SIGMA. But with NONMEM 7.2 and higher, the intermediate output of the Estimation step includes a line NPARAMETR, in which OMEGA and SIGMA are converted to their "natural" space. E.g.,
|
|
Because OMEGA is symmetric, this is the same as
|
|
In the NONMEM report, upper triangular form is used for all of the "COVARIANCE MATRIX OF THE ESTIMATE", "CORRELATION MATRIX OF ESTIMATE" and the "INVERSE COVARIANCE MATRIX OF ESTIMATE". E.g.,
|
|
This is TOSU order.
- ORDER is not used (Default). The raw output file .ext and additional output files .cov, etc, are in a different order: TSOL. SIGMA precedes OMEGA. The order within OMEGA is lower triangular, consistent with the Initial and final estimates of OMEGA, but different than the intermediate output in the NONMEM report. The order of the last line of the "COVARIANCE MATRIX OF ESTIMATE" in the NONMEM report differs from the order of the lines of the .cov file, and similarly for .cor and .coi.
- Set
ORDER=TOSU. Each line of the raw output file .ext is in the same order as in the intermediate output portion of the NONMEM report. Also, each line of the .cov file is in the same order as the last line of the "COVARIANCE MATRIX OF ESTIMATE" in the NONMEM report, and similarly for .cor and .coi.
Note that the ORDER option does not affect the NONMEM report file, only the raw and additional output files. The SIGMA matrix always printed in the same order as OMEGA, and the ORDER option applies to it as well.
OSAMPLE_M1=n
Used only with the BAYES method. OSAMPLE_M1, OSAMPLE_M2, OSAMPLE_M3, and OACCEPT are the options for the MCMC Metropolis-Hastings algorithm for OMEGA sampling. If OSAMPLE_M1<0 (default), then the OMEGAs are Gibbs sampled using the appropriate inverse Wishart proposal density, and the other options (OSAMPLE_M2 and OACCEPT) are not relevant. Otherwise, for each iteration, a matrix of OMEGAs are generated using an inverse Wishart proposal density that has variance based on the previous samples, done OSAMPLE_M1 times. Next, a matrix of OMEGAS are generated using a Wishart proposal density at the present OMEGA values postion, and degrees of freedom (dispersion factor for variances) scaled to have samples accepted with OACCEPT frequency. This is done OSAMPLE_M2 times (if OSAMPLE_M2<0, then program performs this as many times as there are non-fixed omega elements). Then, individual cholesky elements of OMEGA are varied, each OSAMPLE_M3 times (if OSAMPLE_M3<0, then program performs this as many times as there are non-fixed omega elements). The final OMEGA matrix is kept. Usually these options do not need to be changed from their default values, listed above.
OSAMPLE_M2=n
Used only with the BAYES method. n has meaning only for OMEGA sampled by the Metropolis-Hastings algorithm. n is the number of times that OMEGA is generated using an inverse Wishart proposal density at the present OMEGA position and degrees of freedom
OVARF=x (NM74)
Used with NUTS method and OLKJDF option. OVARF is the weight factor to STD prior to the log sqrt OMEGA diagonal elements, the normal density of the log square root of OMEGA centered about log square root of Omega prior, and scaled with OVARF (see below). That is, log(sqrt(Omega(i))) Normal(log(sqrt(OmegaPrior(i))),1/OVARF). If OVARF<0, then a halft-distribution of degrees of ABS(OVARF) is used as the prior to the sqrt of OMEGA diagonal elements. Use OVARF=-1 for the halfCauchy distribution. Default is 1. See also /docs/reference-manual/control-records/ovarf/>$OVARF control record.
PACCEPT=n
Used only with the BAYES method. n has meaning only for population parameters sampled by the Metropolis-Hastings algorithm. The scaling of variance is adjusted so that samples are accepted n fraction of the time. See PSAMPLE_M2 option. Default is 0.5.
PARAFILE=filename
Name of the "parallel file" (the parallelization profile) that controls parallelization (distributed computing). Default file name if not specified: parallel.pnm or parafile name specified on nmfe command.
PARAFILE=ON turns on parallelization for this $ESTIMATION record.
PARAFILE=OFF turns off parallelization for this $ESTIMATION
record.
PARAFPRINT=n (NM74)
The print iteration intervals to the parallelization log file can
be controlled by this option during parallelization of the $EST
step. See also $COVARIANCE record and nmfe command. Default is
PARAFPRINT=1.
PHYTYPE=n (NM74)
Default is 0. By default, after an estimation is performed, the phi(), conditional means of the individual parameters, and their variances, are reported in the "root.phi" file. If you wish to have conditional mean ETAs reported, set PHITYPE=1.
POSTHOC | NOPOSTHOC
POSTHOC: used when the FO method is used. After the Estimation Step terminates, the eta values are estimated for each individual. To estimate the ETAs based on the initial estimates of THETA, OMEGA, and SIGMA (found either in the control stream or in a model specification file), also specify MAXEVAL=0 (which omits the Estimation Step). The conditional estimates of the ETAs are referred to as Conditional Parametric ETAs (CPE).
NOPOSTHOC: ETAs are not estimated. This is the default with METHOD=0. May not be used with METHOD=1.
PREDICTION
Indicates that Y (with NM-TRAN abbreviated code) or F (with a
user-supplied PRED or ERROR code) will serve as a prediction
variable, i.e., it will be set to a prediction. Upon simulation, the
simulated observation is possibly also being set in Y or F. (However,
the DV data item may instead be set directly to the simulated
observation.) Also, ETAs (if any) are population ETAs only if
EPSILONs also appear. This is the default. Compare with
LIKELIHOOD | -2LOGLIKELIHOOD options.
PRINT=n
Iteration summaries are printed for the 0th, every nth iteration, and last iteration. When n=0, no summaries are printed. Default: 9999 (so that summaries are printed for 0th and last iterations).
PRIORC (NM74)
The objective function is reported including the prior constant term (constant term to the prior).
PSAMPLE_M1=n
Used only with the BAYES method. Options PSAMPLE_M1, PSAMPLE_M2, PSAMPLE_M3, PACCEPT are for the MCMC Metropolis-Hastings algorithm. They only have meaning for population parameters (THETA/SIGMA) that are not Gibbs sampled. Normally NONMEM determines whether THETA and SIGMA parameters are Gibbs sampled or not, based on the model setup (see MU_ Referencing section below). For each iteration, a vector of THETAs/SIGMAs are generated using a multivariate normal proposal density that has mean/variances based on the previous samples, done PSAMPLE_M1 times. Next, a vector of parameters are generated using a multivariate normal proposal density with mean at the present parameter position, and variance scaled to have samples accepted with PACCEPT frequency. This is done PSAMPLE_M2 times (if PSAMPLE_M2<0, then program performs this as many times as there are M-H parameters). Finally, each parameter is individually sampled PSAMPLE_M3 times. The final accepted parameter vector is kept. Usually these options do not need to be changed from their default values, listed above.
PSAMPLE_M2=n
Used only with the BAYES method. n has meaning only for population parameters sampled by the Metropolis-Hastings algorithm. n is the number of times that a vector of THETA's and SIGMA's are generated using a multivariate normal proposal densit
PSAMPLE_M3=n
Used only with the BAYES method. n has meaning only for population parameters sampled by the Metropolis-Hastings algorithm. n is the number of times in mode 3 that each parameter is individually sampled. Default is 1.
PSCALE_MIN=n, PSCALE_MAX=n (NM74)
In MCMC sampling, the scale factor used to vary the size of the variance of the proposal density population parameters (theta/sigma) that are not Gibbs sampled, in order to meet the PACCEPT condition, is by default bounded by PSCALE_MIN of 0.01, and PSCALE_MAX=1000. This should left alone for MCMC sampling, but on occasion there may be a reason to expand the boundaries (perhaps to PSCALE_MIN=1.0E-06, PSCALE_MAX=1.0E+06).
RANMETHOD=[n|M|S|m|P] (NM72)
-
n: the random number generator used for all Monte Carlo EM and Bayesian methods.
- n=0: ran0 of reference [4], minimal standard generator
- n=1: ran1 of reference [4], Bays and Durham.
- n=2: ran2 of reference [4].
- n=3: ran3 of reference [4], Knuth. (Default)
- n=4: NONMEM's traditional random number generator used in
$SIMULATION
-
S: Sobol method without scrambling, used during importance or direct sampling (methods IMP, IMPMAP, and DIRECT) and only for the purpose of creating quasi-random samples of ETAs. For example,
1RANMETHOD=2Sselects random number generator ran2 for general purposes, and Sobol sequence for the eta vector generation.
-
m: the type of scrambing desired
- m=0: no scrambing (S0 is the same as S);
- m=1: Owen type scrambling;
- m=2: Faure-Tezuka type scrambling;
- m=3: Owen plus Faure-Tezuka type scrambling.
For example,
1RANMETHOD=S1indicates Sobol sequence with Owen scrambling for eta vector generation. Since there is no integer in the first position of RANMETHOD indicated, the general random number generator remains unchanged from the RANMETHOD specification previously specified, or ran method 3, if none was specified earlier. On the other hand,
1RANMETHOD=1S2indicates ran1 type random number generator for general purposes, Sobol sequence with Faure-Tezuka scrambling for eta vector generation.
The Sobol sequence method of quasi-random number generation can reduce the Monte Carlo noise in the objective function evaluation during importance sampling under some circumstances. When the sampling density fits the posterior density well, such as with rich, continuous data, the Sobol sequence method does not reduce the Monte Carlo noise by much. If you are fitting categorical data, or sparse data, and perhaps you are using the t distribution (DF>0) for the importance sampling density, then Sobol sequence generation may be helpful in reducing Monte Carlo noise. The RANMETHOD specification propagates to subsequent
$ESTrecords in a given problem, but does not propagate to$CHAINor$TABLErecords.In NM72, only DIRECT and IMP/IMPMAP methods could utilize the Sobol quasi-random method. As of NM73, Sobol may be used for BAYES and SAEM methods as well. From experience, The S0 and S1 methods produce considerable bias for SAEM and BAYES, whereas S2 and S3 perform better.
- P (NM73): each subject will receive its own seed path, that will stay with that subject regardless of whether the job is run as a single process or parallel process. This assures that stochastically similar answers will be obtained for Monte Carlo estimation methods, regardless of the number of processes or different kinds of parallelization setups used to solve the problem. There is additional memory cost in using this option because the seed and seed status (additional internal variables of the random number generator that establish the random sequence) must be stored for each subject, and for SOBOL/QR sampling there may even be a reduction in speed because the random sampling algorithm has to be re-set for each subject. To reiterate, a single job run without the P descriptor will not be stochastically similar to a single job run with the P descriptor (although they will be statistically similar), or to any parallel job run. But, a single job run using the P descriptor will be stochastically similar to any parallel job run also using the P descriptor. If maintaining stochastic similarity regardless of how the job is run (single or any parallel profile) is important to you, then always set the P descriptor (so, RANMETHOD=P, at least).
The RANMETHOD specification propagates to subsequent $EST records in a given problem, but does
not propagate to $CHAIN or $TABLE records.
When using the t-distribution sampling density (DF>0), by default the algorithm creates a random vector from n independent univariate t-distributed samples. This is called the U algorithm, and the most efficient use of the U type t-distribution is when DF=1,2,4,5,8, or 10. These algorithms were designed to work well with the Sobol method’s ability to reduce Monte Carlo noise.
As of NM74, another way of producing the vector is from a multi-variate t-distribution algorithm (suggested by Robert Leary), which can be selected by placing an M in the RANMETHOD descriptor, placed after the random number method number, for example:
|
|
(the default setting is U for composite univariate). The multivariate t-distribution algorithm (M) produces samples that have radially symmetric densities (that is, the density is a function of the sum of squares of vector elements) may provide a better fitting sampling density for some kinds of models, and hence, more efficient sampling. However, the individual random elements in the vector are not statistically independent, and when used with the Sobol method, does not result in as much reduced Monte Carlo noise as when the composite univariate t-distribution vector (U) is used. An alternative method is by a mixture of two normal densities, using IACCEPTL.
REPEAT | NOREPEAT
REPEAT: the search is repeated with the initial estimates being the final estimates from the first search and with new UCP, so that a UCP value of 0.1 now corresponds to a final estimate from the first search. Cannot be used with STIELTJES.
NOREPEAT: the estimate obtained at the end of the minimization search is taken to be the final parameter estimate. This is the default. Cannot be used with STIELTJES.
REPEAT1 | NOREPEAT1
REPEAT1: the search of the first stage of the Stieltjes method is repeated with the initial estimates being the final estimates from the first search and with new UCP, so that a UCP value of 0.1 now corresponds to a final estimate from the first search. May only be used with STIELTJES.
NOREPEAT1: the estimate obtained at the end of the search of the first stage of the Stieltjes method is taken to be the final parameter estimate at the first stage. This is the default. May only be used with STIELTJES.
REPEAT2 | NOREPEAT2
REPEAT2: the search of the second stage of the Stieltjes method is repeated with the initial estimates being the final estimates from the first search and with new UCP, so that a UCP value of 0.1 now corresponds to a final estimate from the first search. May only be used with STIELTJES.
NOREPEAT2: the estimate obtained at the end of the search of the second stage of the Stieltjes method is taken to be the final parameter estimate at the second stage. This is the default. May only be used with STIELTJES.
SADDLE_HESS=n (NM74)
Sometimes the variable metric search algorithm used for FO/FOCE/Laplace ends near a local minimum with an eigenvalue that is near zero, suggesting a saddle point or inestimable or non-identifiable parameters. You can request the saddle point reset, which repositions the values about 1 OFV unit away, and resumes the search, in hopes of continuing toward a minimum with a smaller OFV. This is based on the method by Yasunori and Nyberg, in the Perl Speaks NONMEM software. If the final OFV results is nearly the same value as just before the saddle point reset, and one or more of the final parameters differ from those just before the saddle reset (see .ext file, or Nparameters on the iteration just before the saddle reset mark), then this suggests that those parameters may be inestimable or non-identifiable.
SADDLE_HESS=0 (default) selects the Hessian matrix last generated by the variable metric method. This Hessian is not the true second derivative, but is a guaranteed positive definite matrix. Perturbing the estimates using this matrix requires very little computation, and is often sufficient to reposition the problem away from a saddle point.
SADDLE_HESS=1 causes the full second
derivative information matrix (identical to R matrix in the $COV
step) to be evaluated. Using SADDLE_HESS=1 to reposition the estimates may work better than the SADDLE_HESS=0 setting, but the computational expense is high, equivalent to that of a $COV step.
SADDLE_RESET=n (NM74)
SADDLE_RESET is the number of times that a reset should occur in the course of the search. Normally, should be set to 1. Default is 0.
SEED=n, CLOCKSEED=[0|1] (NM75)
SEED (default 11456) sets the initial seed for the random number generator used for the Monte-Carlo methods. Alternatively, one can set CLOCKSEED=1 (default CLOCKSEED=0) so that the actual starting seed will be 10000*(seconds after midnight)+SEED. This allows a control stream to produce different stochastic results for automated replications, without the need to modify the seed value in the control stream file in each replication.
SELECT=[0|1|2|3] (NM73)
Used with METHOD=CHAIN and $CHAIN to specify how the sample is selected.
- SELECT=0: if ISAMPEND>=ISAMPLE, then the default action for
selecting between ISAMPLE and ISAMPEND is taken, which
for
$EST METHOD=CHAINis to find the one giving the best OBJ at the initial values, and for$CHAINis to randomly select a sample, with replacement. This is the default. - SELECT=1: the sample is selected sequentially from ISAMPLE to
ISAMPEND with each new use of
$CHAINand$SIMLwith multiple sub-problems for the given problem, and with each new$EST METHOD=CHAINwith multiple sub-problems and across problems. When ISAMPEND is reached, the sample selection begins at ISAMPLE again. - SELECT=2: uniform random selection of sample, without
replacement. Should the sample selection become exhausted, which
would occur if CHAIN or
$CHAINrecords are utilized for more than ISAMPEND-ISAMPLE+1 times, subsequent sample selection then occurs with replacement. - SELECT=3: uniform random selection of sample, with replacement
(this is equivalent to SELECT=0 for
$CHAIN).
SIGDIGITS=n
Number of significant digits required in the final parameter estimate. SIGDIGITS is not used by the Monte-Carlo methods. Default: 3. May also be coded NSIGDIGITS.
SLKJDF=n (NM74)
Used with NUTS method, SLKJDF stands for Sigma LKJ density degrees of freedom. When 0, the usual inverse Wishart prior is used for Sigmas. When SLKJDF>0, then the LKJ density is used as the prior, with SLKJDF degrees of freedom. In addition, only diagonal elements of the Sigma prior are used. SLKJDF may be set >0 when using METHOD=BAYES as well, but Sigmas (in cholesky format) will then be M-H sampled using the PSAMPLE_M1, PSAMPLE_M2, and PSAMPLE_M3 settings (choleskys of sigma elements are treated as extensions of the THETA parameters in M-H sampling methods). Default is 0.
See record $SLKJDF (NM75) to specify LKJ correlation degrees of freedom for each SIGMA block.
SIGL=n
n is used to calculate the step-size for finite difference derivatives independent of the SIGDIGITS value. If n=0 or n=100 then SIGL is ignored and SIGDIGITS is used as in versions prior to NONMEM 7. SIGL should usually be 2 to 3 times the value of NSIG. It is not used by the SAEM or BAYES methods.
SIGLO=n (NM72)
The precision to which the individual ETAs are optimized. The SIGL value set by the user continues to be the precision (or delta ) setting for the finite difference algorithms in the higher level estimation process for THETAS, OMEGAS, and SIGMAS. By default, if SIGLO is not specified, then SIGLO is set to the same value as SIGL. Should SIGLO be used, the recommended setting would be:
|
|
SLOW | NOWSLOW | FAST
SLOW (or SLOW=1, see option STIELTJES below for SLOW=2) requests a slower method of computation. Required when either a mixture model is used along with CENTERING, or NUMERICAL is used. If not present, the option is automatically supplied in these two cases. For problems where NONMEM VI does not behave as well (e.g. yields a higher OFV at termination) compared to NONMEM V, inclusion of the SLOW option may sometimes, but not always, yield NONMEM VI results that are similar to NONMEM V.
NOSLOW requests a faster method of computation. This is the default.
FAST(NM74): the option is available for FOCE/ITS methods. The FAST method allows use of analytical theta derivatives to facilitate FOCE analysis. All THETAs should be MU-referenced. For THETAs that should not have inter-subject variability, or should not be MU referenced, do it anyway by adding addional ETAs and assigning them to these THETAs through MU referencing, but with the associated OMEGA FIXED to 0.0.
SORT | NOSORT
SORT: individual contribution to the objective function value and individual contributions to the gradients are sorted before they are summed, so that smaller numbers are summed before larger numbers.
NOSORT: individual contribution to the objective function value and individual contributions to the gradients are summed in the order in which the individual records appear in the NONMEM data set, as was done prior to NONMEM VI. This is the default.
STDOBJ=x (NM74)
For importance sampling and direct sampling only, if ISAMPEND is specified as an upper integer value, and STDOBJ is set to a real value greater than 0, then NONMEM will vary the number of Monte Carlo samples under each subject between ISAMPLE and ISAMPEND, until the stochastic standard deviation of the objective function falls below STDOBJ.
STIELTJES
A set of tentative population estimates are first obtained using some 1st- or 2nd-order method (but not teh METHOD=HYBRID method). A tentative value for the integral (i.e. an area) is obtained. Then numerical integration is used to obtain second-stage estimates. A more detailed description of STIELTJES follows:
Stieltjes is a two-stage method. In the first stage, a set of
tentative population estimates are "quickly" obtained using some
1st- or 2nd-order population conditional method specified on the
$ESTIM record in the usual way to obtain fairly reliable estimates. Include the METHOD=CONDITIONAL option on the $ESTIMATION
record. However, the METHOD=HYBRID method cannot be used. At
this first stage, for each individual, a tentative value for the
integral (i.e. an area) is obtained. The first-stage estimates
are used as initial estimates in the second stage, wherein
another objective function is minimized to obtain a final set of
parameter estimates. Numerical integration is used to obtain second-stage estimates. The integral will be exact in the case
where the intraindividual model is of mean-variance kind and is
linear in eta. The second stage takes much more time than the
first stage, and the resulting parameter estimates are at least
as good as the first-stage estimates, perhaps better. In the
second stage, conditional estimates of eta are not used, but the
functions that are minimized to obtain conditional estimates are
still evaluated at many values of eta. Generally though, sufficiently good estimates are obtained by using conditional estimation methods, or even the First-Order Estimation method, and one
needs to use the Stieltjes Estimation method only with somewhat
pathological data designs. Simply also include the STIELTJES
option (along with an optional GRID option) on the $ESTIMATION
record.
A default grid on eta-space is used for numerical integration, but one one may set one's own grid. A grid is specified in terms of ellipsoidal coordinates, with origin at eta=0. With these coordinates, the length of a point P is given in fractile units the first-stage estimate of the fraction of area enclosing all points with same or less length as that of P. A direction is specified by m=n-1 angles, where n is the number of (active) ETAs. At the second stage, the fraction of area enclosing all points of length less than r0 is taken to be identical to that computed at the first-stage. To obtain the fraction of area enclosing points of length greater than r0, numerical integration is used. A grid is obtained by first taking the interval [r0,r1] of the length axis and dividing this interval into nr equal subintervals. For each such subinterval, there is a region enclosing all points of length lying in the subinterval. There is yet another region (the tail region) enclosing all points of length between r1 and .9999. Within each region, the integrand is evaluated at a number of points, distributed fairly uniformly throughout the region, but with approximately the same length. The number of points is taken to be (ns**m)*(2**n), where ns is some integer. Think of ns as the number of points in a single quadrant of a 2-dimensional ellipse in n-space, and ns**m as the number of points in a orthant of m-dimensional space. 2**n is the total number of orthants. See option GRID above to specify grid information.
SVARF=x (NM74)
Used with NUTS method and SLKJDF option. SVARF is the weight factor to STD prior to the log sqrt SIGMA diagonal elements, the normal density of the log square root of SIGMA centered about log square root of SIGMA prior, and scaled with SVARF. That is, $$ \log(\sqrt{\Sigma(i)} \sim \mathcal{N}(\log(\sqrt{\Sigma_{\text{Prior}}(i)}),1/\text{SVARF}). $$
If SVARF<0, then a halft-distribution of degrees of ABS(SVARF) is used as the prior to the square-root of SIGMA diagonal elements. Use SVARF=-1 for the halfCauchy distribution. Default is 1. See also $SVARF control record.
TBLN=n (NM75)
Used with $EST METHOD=CHAIN and $CHAIN to select a
table within a raw output file.
THIN=n (NM74)
The Bayesian records retained in the raw output file may be adjusted by every THINth iteration. So, if THIN=10, then every 10th iteration is recorded in the raw output file. The PRINT option controls only the iterations printed to the console and NONMEM report file. Default is 1.
TPU=n (NM75)
TTDF=n (NM74)
TTDF stands for THETA t-density degrees of freedom, used only for METHOD=NUTS and METHOD=BAYES. TTDF can be a real number. The default TTDF=0 indicates the (multivariate) normal distribution prior is used for THETAs, and TTDF>0 indicates a t-distributed prior. With TTDF>0, METHOD=BAYES uses M-H sampler for THETAs, using the PSAMPLE_M1, PSAMPLE_M2, and PSAMPLE_M3 settings.
See also $TTDF control record to specify degrees of freedom for each THETA. The value of TTDF overrides $TTDF.
ZERO=list
Required with METHOD=HYBRID. A list of indices for ETAs which are fixed to zero during the Estimation Step. "list" contains one or more integers. If more than one, they must be surrounded by parentheses. The list must be contained entirely on one line. The indices may be separated by commas or spaces.
Reserved Variables that are of Interest During the Estimation Step
MUFIRSTREC (NM74)
The MUFIRSTREC option can speed up the NUTS method, and also
ordinary BAYES, FAST FOCE, ITS, and the EM methods. Set
MUFIRSTREC=1 in $PRED or $PK. MUFIRSTREC=1 selects the covariate
of the first record of the subject, rather than averaging among
its records when using that covariate in a MU reference.
The first statement of the $PRED or $PK block should be
include nonmem_reserved_general.
OBJQUICK (NM74)
The OBJQUICK option can speed up the NUTS method, and also ordinary BAYES, FAST FOCE, ITS, and the EM methods. Set OBJQUICK in $PRED or $PK.
- OBJQUICK=0: Default. Standard NONMEN processing of the model occurs.
- OBJQUICK=1: Certain tests and initializations are skipped.
- OBJQUICK=2: A simplified modeling process occurs, but which
cannot be used when
$LEVELor$MIXis used in the model. In addition, parallelization is not performed.
The OBJQUICK and MUFIRSTREC can also speed up the other analysis
methods, such as ordinary BAYES, FAST FOCE, ITS, and the EM methods. The first statement of the $PRED or $PK block should include nonmem_reserved_general:
|
|
Reference
[1] M.D. Hoffman, A. Gelman The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo. J. of Machine Learning Research 2014; 15: 1593-1623.
[2] Stan development team. Stan User’s Guide and Reference Manual. Stan Version 2.32.0.
[3] P. Omiros, G.O. Roberts, and M. Sköld. 2007. A General Framework for the Parametrization of Hierarchical Models. Statistical Science 22 (1): 59–73.
[4] W.H. Press, S.A. Teukolsky, W.T. Vettering, B.P. Flannery. Numerical Recipes, The Art Of Scientifc Programming. 2nd Edition, Cambridge University Press, New York, 1992, pp. 269-305.
[5] M. Lavielle. Monolix Users Manual. Version 2.1. Orsay, France: Laboratoire de Mathematiques, U. Paris-Sud; 2007.