$DATA

Describes the NM-TRAN data set

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
 $DATA  [filename|*] [(format)] [IGNORE=c1] [NULL=c2]
        [IGNORE=(list)...|ACCEPT=(list)...]
        [PRED_IGNORE_DATA]
        [NOWIDE|WIDE] [CHECKOUT]
        [RECORDS=n1|RECORDS=label]
        [LRECL=n2] [NOREWIND|REWIND]
        [NOOPEN] [LAST20=n3] [TRANSLATE=(list)]
        [BLANKOK]
        [MISDAT=r...]
        [FDATACSV...]
        [REPL=n...]

;# EXAMPLE
 $DATA       DATAFILE

Discussion

This record specifies the data set to be used. It is required with the first problem specification. It must precede any other NM-TRAN control record that refers to specific data item types. May also be coded $INFILE.

It optional with the second or subsequent problems. If omitted, NONMEM re-uses the data set from the previous problem. Note that with the previous problem the user might have modified data items from those found in the NONMEM data set at the beginning of that problem (e.g. the modification may have occurred in the simulation step, or in the initialization/finalization step), and then the data set used in the current problem (NONMEM's internal copy of the data set) contains the modified data items.

Options

filename | *

Name of the file containing the data set. Must be the first option. If it contains commas, semicolons, or parentheses, then it must be surrounded by single quotes ' or double quotes ". Filename may also contain equal signs if it is enclosed in quotes. If the file is opened by NM-TRAN, filename may contain embedded spaces if it is enclosed in quotes, and may contain at most 80 characters. If the file is opened by NONMEM, the filename may not contain embedded spaces, and may contain at most 71 characters. If filename is the same as any option of the $DATA record, it must be enclosed in quotes.

One may use * in a problem subsequent to the first to omit the $DATA record (NONMEM is told to re-use the previous data set). It allows no other option except CHECKOUT.

(format)

The format refers to a FORTRAN format specification to be used by NONMEM to read the NONMEM data records. Note that this specification is to be enclosed in parentheses. Format codes F, E, and X may be used, but not I. The format will also be used by NONMEM to read the NONMEM data records, after it had been modified to account for generated data items. If a format is provided, the label DROP, the WIDE option, or the NULL option, may not be used. If a format is omitted, the NONMEM records can still be read, and a format specification which is appropriate for reading the NONMEM data records will be generated. In this case a NONMEM data set is found in file FDATA.

RECORDS=n1

The number of records to be read from the NM-TRAN data set. Comments are not counted. If NM-TRAN does not drop any records from its data set (see IGNORE list and ACCEPT list below), then n1 is also the number of records written to the NONMEM data set. If NM-TRAN drops records, then the total number of records written to the NONMEM data set is n1 minus the number of dropped records. With NONMEM 7.5, records may also be dropped using the PRED_IGNORE_DATA block of abbreviated code. The same total applies to these dropped record. See PRED_IGNORE_DATA below.

If omitted, the records written to the NONMEM data set are all the records in the NM-TRAN data set up to the end-of-file (or up to a NONMEM FINISH record) minus the number of comment and dropped records. May also be coded NRECORDS, RECS, or NRECS.

If the option is coded as RECORDS=label for data item label, NONMEM understands the data records for the problem to start with the first data record of the NONMEM data set (at the place where the file is positioned before data records are read; see the NOREWIND option below), and to include as well, those and only those subsequent contiguous data records having the same value of the data item as does the first record. It counts the total number of these data records, minus any comment or dropped records, and puts this number in the NONMEM control file.

In particular, the ID label may be used (or alternatively, the option may be coded RECORDS=IR, RECORDS=INDREC, or RECORDS=INDIVIDUALRECORD). If a label other than ID is used, the $INPUT record must precede the $DATA record. If the data are single-subject data, the ID data items used to determine the data records for the problem are those labeled ID (not .ID.).

If there is more than one problem specification with a $DATA record that includes an option of the form =RECORDS==label, then either none of these $DATA records may also include a format specification, or all of them must include the same format specification.

LRECL=n2

Only needed when the format is omitted, and when either (i) the operating system (e.g. IBM/CMS) raises a fatal error when a FORTRAN program tries to read more characters from a logical record than the number of characters in the record, or (ii) the operating system imposes a maximum record length which is smaller than 999 characters (e.g. CRAY/CTSS). The number n2 is the number of characters in the NONMEM data records.

IGNORE=c1

Specifies that any data record having the character c1 in column 1 should be ignored, i.e., these records are not included in the NONMEM data set. This allows comment records to be included in the NM-TRAN data set. In general, records having the character c1 in column 1 will be called "comment records". Also permitted: IGNORE='c' or IGNORE="c", where c may be any character except space. Default: "#". Therefore when IGNORE is omitted, records beginning with "#" is treated as a comment record.

IGNORE=@ signifies that any data record having an alphabetic character or @ as its first non-blank character (not just in column 1) should be ignored. Alphabetic characters are the letters A-Z and a-z. This permits a table file having header lines to be used as an NM-TRAN data set.

IGNORE | ACCEPT=(list)

"List" is a list of one or more data item labels, with logical operators and values, of the form "label=value", "label.EQ.value", "label.NE.value", "label.GT.value", "label.GE.value", "label.LT.value", and "label.LE.value". (Fortran 90 logical operators such as '=' '/' '<' '<=' '>' '>=' " may also be used.) Thus, the following are identical: "label=value","label==value","label.EQ.value". With NONMEM 7.3, "label.NEN.value" and "label.EQN.value" are permitted. (There is no Fortran 90 operator for this comparison.) If the logical operator is omitted, the default is "=". With each data record, the value of the data item with the given label and the value in the list are compared according to the logical operator, and if result is "true", the record is ignored, i.e. it is not included in the NONMEM data set (see example below). Such records are called "dropped records". With "", "=", "/=', ".EQ." and ".NE.", the value in the data record and the value in the list are compared as character strings. Otherwise, they are converted to numeric and compared numerically. (This is the case with .NEN. and .EQN.) This comparison is made prior to time translation. Hence, the TIME item cannot be compared numerically if it contains non-numeric characters such as ":".

Note: if the data file is a table file from a previous NONMEM run, values that had been integers (0,1,..) in the original data file will be real values (0.0000E+00, 1.0000E+00, …) in the table file. A comparison for equality or inequality should now be for the real value. E.g.

1
IGNORE=(OCC==1.0000E+00).

With NONMEM 7 the default format for the table file is PE11.4, as in the examples above. The FORMAT option of $TABLE may be used to change this and the values used in a subsequent IGNORE will have to be changed accordingly. The .NEN. and .EQN. operators that are described above will always work.

A data item label along with a logical operator and value is called a condition. A list may contain several conditions; these should be separated by commas, and the list should be enclosed in parentheses. Up to 100 different conditions altogether can be specified. IGNORE=(list) may be used with IGNORE=c, where c is a character. Multiple IGNORE options with different lists may be used. A list may span one or more NM-TRAN records. The use of "=" after IGNORE is optional, but parentheses are required with this form of IGNORE. Values may be alphabetic or numeric, and may optionally be surrounded by single quotes ' or double quotes ". Quotes are required if a value contains special characters such as =. However, a value may not contain spaces or commas. No format specification is permitted with this form of IGNORE.

A data item type may be dropped from the NONMEM data set by means of the DROP or SKIP synonym on the $INPUT record, after records are dropped due to a condition based on the data item type. E.g.,

1
2
  $INPUT ... GEN=SKIP ...
  $DATA file IGNORE=(GEN='M')

Records having GEN equal to 'M' will be dropped, and the GEN data item type will then be omitted from the NONMEM data set. A dropped data item may be any alphanumeric string (without a data item delimiter - a blank or a comma).

If there is more than one condition, then records satisfying at least one of these conditions will be dropped. In effect, the conditions for dropping a record are connected by the implied conjunction ".OR.". E.g.

1
  IGNORE=(GEN.EQ.1,AGE.GT.60).

Records having GEN equal to 1 or AGE greater than 60 are dropped. All others are accepted.

Opposite to IGNORE, the ACCEPT option specifies conditions for acceptance of records. For example,

1
  ACCEPT=(GEN.EQ.1,AGE.GT.60)

means records with GEN=1 or AGE>60 are accepted, and all others are dropped. On the other hand

1
  ACCEPT=(GEN.NE.0,AGE.LE.60)

means records with GEN means records with GEN=1 and AGE>60 are accepted, and all others are dropped.

An ACCEPT list cannot be used together with an IGNORE list but may be used with the IGNORE=c option.

PRED_IGNORE_DATA (NM75)

The $DATA IGNORE=(list) and $DATA ACCEPT=(list) options provide a limited means of filtering the input data set, which is performed by NMTRAN. To provide more elaborate filtering for excluding data, PRED can instruct NONMEM to filter out additional data records at the beginning of the run or problem.

This is done by creating a PRED_IGNORE_DATA_TEST==1 IF block presented in $INFN, $PK, or $PRED. For eample:

1
2
3
4
5
6
7
$INFN
IF(PRED_IGNORE_DATA_TEST==1) THEN
 PRED_IGNORE_DATA=0
 IF(AGE>35.0) PRED_IGNORE_DATA=1
 IF( ID>10.AND.ID<18.OR.ID>60.AND.ID<70 ) PRED_IGNORE_DATA=1
 RETURN ;# Assures no additional code in INFN is executed (saves time)
ENDIF

or

1
2
3
4
5
6
7
$PRED
IF(PRED_IGNORE_DATA_TEST==1) THEN
PRED_IGNORE_DATA=0
 IF(AGE>35.0) PRED_IGNORE_DATA=1
 IF( ID>10.AND.ID<18.OR.ID>60.AND.ID<70 ) PRED_IGNORE_DATA=1
RETURN ;# Assures no additional code in PRED is executed (saves time)
ENDIF

If PRED_IGNORE_DATA is set to a non-zero value, then the data record is ignored, and excluded from the internal data set. This allows the user to use more complex, multi-line and FORTRAN syntax based, logical operations on data record exclusions.

When PRED_IGNORE_DATA_TEST=1, then ICALL is set to -1. the following variables have properly defined values during this call:

1
NEWIND, NEWL2,IPROB, NPROB, S1NUM, S2NUM, S1NIT, S2NIT, S1IT, S2IT

So, it is possible to restrict PRED_IGNORE_DATA actions to a particular problem number:

1
2
3
4
5
6
IF(IPROB==2.AND.PRED_IGNORE_DATA_TEST==1) THEN
 PRED_IGNORE_DATA=0
 IF(AGE>35.0) PRED_IGNORE_DATA=1
 IF( ID>10.AND.ID<18.OR.ID>60.AND.ID<70 ) PRED_IGNORE_DATA=1
 RETURN
ENDIF

No other variables are properly defined during PRED_IGNORE_DATA_TEST=1, such as THETAS, data record information such as NIREC, NDREC, etc., and no calls to complicated functions that such as RANDOM() will be valid (simple functions, such as built-in FORTRAN functions, are fine). Furthermore, changes to data items may not be made during this call. Any other functions of $INFN, such as data modification, RANDOM() calls, etc., should be made with separate ICALL==0 or ICALL==1 blocks.

If the user write his own INFN routine in which PRED_IGNORE_DATA code is constructed, then NONMEM needs to be informed with the $DATA PRED_IGNORE_DATA option. For example:

1
2
3
4
$PROB  THEOPHYLLINE   POPULATION DATA
$INPUT      ID DOSE=AMT TIME CP=DV WT
$DATA       THEOPP PRED_IGNORE_DATA
$SUBROUTINES  ADVAN2 INFN=myinfn.f90

with myinfn.f90

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
  SUBROUTINE INFN(ICALL,THETA,DATREC,INDXS,NEWIND)
  USE SIZES,     ONLY: DPSIZE,ISIZE
  USE NMPRD_INT, ONLY: PRED_IGNORE_DATA,PRED_IGNORE_DATA_TEST
  INTEGER(KIND=ISIZE) :: ICALL,INDXS,NEWIND
  REAL(KIND=DPSIZE) :: THETA

  REAL(KIND=DPSIZE) :: DATREC
  DIMENSION :: THETA(*),DATREC(*),INDXS(*)
  IF(PRED_IGNORE_DATA_TEST==  1)THEN
  IF (DATREC(3)>3) THEN
  PRED_IGNORE_DATA=1
  ENDIF
  ENDIF
  RETURN
  END

If PRED_IGNORE_DATA_TEST and PRED_IGNORE_DATA are used only in verbatim code, it is also necessary to code the $DATA PRED_IGNORE_DATA option.

NULL=c2

Default to null/blank, the option specifies a character c2 that will replace null data items in the NM-TRAN data, e.g. NULL=0. Null data items consist of a single dot . or consecutive commas or consecutive tab characters. c2 may be any character except space (" ") or semicolon (";"). Also permitted: NULL='c' and NULL="c", where c may be any character. If omitted, NM-TRAN replaces each null with a space.

NOWIDE | WIDE

NOWIDE requests that NM-TRAN attempt to limit FDATA to 80-character records. Space between adjacent columns may be suppressed and multi-line records may be generated. This is the default. WIDE requests that FDATA contain single-line records, and that at least one space separate columns. (Records in FDATA will never be wider than 300 characters.) With this option, there will be no FINISH (FIN) record in the NONMEM data set.

NOREWIND|REWIND

NOREWIND specifies that the file is not to be rewound before it is read, and REWIND specifies that the file is to be rewound. These options are ignored if used on the $DATA record appearing in the first problem specification of the NONMEM control stream, or on a $DATA record appearing in a subsequent problem specification when this record contains a file name different from that contained on the $DATA record of the prior problem specification. In these cases the file is automatically rewound. These options are relevant only when there are multiple problem specifications in the NONMEM control stream, and when the $DATA records appearing in two consecutive problem specifications, corresponding to two problems A and B, contain the same file name. In this situation:

When the $DATA REWIND option is used for problem B, the first NONMEM data set on the file is re-used for problem B. If NONMEM does not modify this data set, then an instruction to rewind the (same) file is also contained in the NONMEM control stream problem specification for problem B. If NONMEM does modify the data set, then the NONMEM data set for problem B is placed on FDATA after the last NONMEM data set already present on FDATA, and no instruction to rewind FDATA is contained in the NONMEM control stream problem specification for problem B.

If the $DATA NOREWIND options is used for problem B, or neither option is used, then the file is not rewound, and the NONMEM data set on this file that follows the one used for problem A is used for problem B. In this case note that the $DATA record with problem A must have contained the RECORDS option or the NONMEM data set used for problem A must end with a FINISH record. Also in this case, no instruction to rewind the file containing the NONMEM data set for problem B (whether this file is the file named in the $DATA record or whether it is FDATA) is contained in the NONMEM control stream problem specification for problem B.

CHECKOUT

Requests that NONMEM implement the data checkout mode, in which the PRED routine is not called and predictions, residuals, weighted residuals and the objective function are not computed. May also be coded CHECKDATA. No tasks other than $TABLE or $SCAT can be specified. With NONMEM 7.5, an additional file, FDATA.csv is produced that outputs the contents of its input data file (typically FDATA) in a comma delimited file format, so you can check how NONMEM interprets the input data. If the $DATA REPL option or the REPL_ data item is used, the replicated form of the data will appear in FDATA.csv.

NOOPEN

NM-TRAN will not open the named data file. This permits the data file to be created by one problem and used in a subsequent problem of the same run. May not be used with options IGNORE, DROP, or when data items ID, MDV, or EVID must be generated by NM-TRAN. With NOOPEN, a format specification is required. No day-time translation takes place.

LAST20=n3

Override the LAST20 constant in resource/TRGLOBAL.f90 (default: 50). One or two digit years > LAST20 are assumed to be in the 1900's, One or two digit years <= LAST20 are assumed to be in the 2000's. E.g,. suppose LAST20=50. Then two digit years are interpreted as follows:

1
2
 00-50 = 2000-2050
 51-99 = 1951-1999

LAST20=-1 can be used when two digit years span the year 2051. All two digit years will be assumed to be in the same century. If year is recorded with four digits, it is always processed correctly and the value of LAST20 is of no consequence.

TRANSLATE=(list)

"list" describes modifications to be made to the contents of the data file. It may contain one of:

1
TIME/F, TIME/F/D

and/or one of

1
II/F, II/F/D

F ("factor") may be an integer or a real value. If F is a real number, the translated value in FDATA will have the same number of digits after the decimal point. If F is an integer and D is omitted, there will be 2 digits. Alternately, the number of digits may be specified explicitly by D ("digits"). If D is a real number, it is truncated to integer. If D is specified as 0, it defaults to 2. The maximum value of D is 12. The number of digits that may be requested in F is limited by the precision of the computer.

For example, either of the following can be used to request values of TIME in FDATA that have 4 digits to the right of the decimal point:

1
2
TIME/1.0000
TIME/1/4

Another example is

1
II/0.01/6

which divides II values by 0.01, and writes 6 digits to the right of the decimal point.

If F is specified as "24" (or 24.0..), the options involving TIME (II) can be used to convert the units of time (of the steady-state interval) from hours to days. The TIME (II) data item is first processed as if the option were not present. Then the resulting value is divided by F.

Note: The value of TIME is divided by F, whether or not day-time translation occurs (i.e., whether or not relative times are being computed by NM-TRAN). Similarly, the value of II is divided by F whether or not ":" appears in any II data value.

BLANKOK

Specifies that blank lines are permitted in the NM-TRAN data set. With all versions prior to NONMEM VI 2.0 a blank line was permit- ted, and was copied to the NONMEM data set. A warning message was issued. With later versions, NM-TRAN stops with an error message when there is a blank line in the NM-TRAN data set. Option BLANKOK restores the previous behavior. There is no abbreviation. BLANKOK must be coded in full.

MISDAT=r (NM74)

A numerical value indicating a missing data value in the data set, which is displayed on $TABLE table outputs, but is safely interpreted as 0 by other steps of NONMEM. May be used up to 20 times. Example:

1
  $DATA mydatafile MISDAT=1.0E-99 MISDAT=1.0E-102.

REPL=n (NM75)

For clinical trial simulations, one can create a data file containing template subjects, then replicate them a number of times. As of NM75, NONMEM will replicate each subject REPL=n times, and then utilize this expanded data set 1. For example, suppose a data file, called template.csv, contains 3 subjects, each one representing a particular covariate type, dosing type, and/or sample time pattern. The desire is that each of these subjects be replicated 100 times in the data set. This is done with the following record.

1
$DATA template.csv ignore=@ REPL=100

Make sure that the ID values of the template are integer valued. If the number of replications is greater than the difference between two consecutive ID values in the template, NONMEM will add a fractional value to each replicate subject, so that the integer portion of the ID is the original ID, and the fractional portion represents the replication number (NONMEM does not require integer valued ID’s, only unique ID numbers that have no more than 14 significant digits). Otherwise, the replicate ID’s will be integer incremented. To assure only integer ID’s are created, make sure the template ID’s are spaced by an amount greater than the number of replications desired for each subject.

To specify a different replication for each subject, a reserved data item has been introduced in nm75, called REPL_, the value of which will be used as the replication number for that subject. If both are used the REPL_ data item applies first, and the REPL option applies second. If the value is fractional, it will be truncated to the nearest integer. Only the REPL_ value of the first record of each individual will be used to determine its replication. For example, with control stream

1
2
$INPUT ID TIME  AMT  RATE EVID MDV DV  REPL_
$DATA warfarin.dat ignore=C

and data file

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
CID TIME    AMT  RATE EVID MDV DV    REPL_
1.0 0.0    70.0  0.0  1    1   0.0   2
1.0 0.5     0.0  0.0  0    0   1.0   .
1.0 1.0     0.0  0.0  0    0   1.0   .
1.0 2.0     0.0  0.0  0    0   1.0   .
1.0 6.0     0.0  0.0  0    0   1.0   .
1.0 24.0    0.0  0.0  0    0   1.0   .
1.0 36.0    0.0  0.0  0    0   1.0   .
1.0 72.0    0.0  0.0  0    0   1.0   .
1.0 120.0   0.0  0.0  0    0   1.0   .
2.0 0.0    70.0  0.0  1    1   0.0   3
2.0 0.5     0.0  0.0  0    0   1.0   .
2.0 1.0     0.0  0.0  0    0   1.0   .
2.0 2.0     0.0  0.0  0    0   1.0   .
2.0 6.0     0.0  0.0  0    0   1.0   .
2.0 24.0    0.0  0.0  0    0   1.0   .
2.0 36.0    0.0  0.0  0    0   1.0   .
2.0 72.0    0.0  0.0  0    0   1.0   .
2.0 120.0   0.0  0.0  0    0   1.0   .

data of subject 1 will be replicated 2 times within NONMEM, and data of subject 2 will be replicated 3 times. The ID values of the replicates will be 1.00, 1.01, 2.0, 2.01, 2.02.

Using the $DATA REPL option along with the REPL_ data item such as:

1
2
$INPUT ID TIME  AMT  RATE EVID MDV DV  REPL_
$DATA warfarin.dat ignore=C REPL=100

will result in 200 replications for subject 1, and 300 replications for subject 2. The order in which they appear will be as follows. Two replicates of subject 1 followed by 3 replicates of subject 2, and this pattern is replicated 100 times. If one subject type takes much longer than another to compute during estimation, this pattern allows the best load balancing for parallelization.

the DV patterns will be identically replicated as well, so replicated data templates make sense only if you will be using $SIML or $DESIGN to subsequently create the unique DV patterns for each replicated subject.

NOFDATACSV (NM760)

An additional file, FDATA.csv is produced that outputs the contents of its input data file (typically FDATA) in a comma input data. The records in FDATA.csv may differ from those in FDATA in the following cases. If REPL/REPL_ is used, the replicated form of the data will appear in FDATA.csv. Also, records excluded by PRED_IGNORE_DATA will not be present in FDATA.csv. The creation of FDATA.csv can cause a long period of data file loading and processing at the beginning of a run, if the FDATA file is very large (such as >1 MB). As of nm760, the creation of this file can be prevented by adding the $DATA NOFDATACSV option.

Another change in NONMEM VI 2.0 is that tab characters (and other characters that are smaller than blank in the computer's collating sequence, such as carriage return "^M") are permitted in NM-TRAN input files. In the NM-TRAN data set they are treated like commas, i.e., as field delimiters. In the NM-TRAN control stream they are converted to spaces. They are left unchanged in verbatim code. With NONMEM 7, the last non-blank character on the line is replaced by a space if it is a low-value character.

Note: The character ":" in TIME or II data items requests day-time translation of TIME or II values. These values must have the form hh:mm or hh:mm:ss (NM73).


1

Note that the NONMEM data set is typically the file FDATA generated by NM-TRAN. If there is nothing for NM-TRAN to change and the format is supplied on the $DATA record, the file named on the $DATA record is the NONMEM data set.