Count statement for Count models

PML provides a count statement to simplify the coding for modeling count data, which are defined as the number of events occurred per time interval.

count(observed variable, hazard expression[, options][, action code])

This statement creates a special hidden differential equation to integrate the hazard expression (the second argument). This integration is reset right after each observation, so the integral extends from the time of previous observation to the time of the current observation. The log-likelihood is then automatically calculated based on a Poisson or Negative Binomial distribution whose mean is the integrated hazard since the last observation. The options are:

, beta=<expression>

The presence of the beta option makes the distribution Negative Binomial, where the variance is var=mean*[1+mean*alpha^power), and alpha=exp(beta)

, power=<expression>

Power of alpha, default(1).

Zero-inflation can be applied to either the Poisson or Negative-Binomial distributions, in one of two ways, by means of simply specifying the extra probability of zero, by giving the zprob argument, or by giving an inverse link function with ilink and its argument with linkp.

, zprob=<expression>

Zero-inflation probability, default(0) (incompatible with following two options)

, ilink=ilogit

or iprobit or other inverse link function. If not specified, it is blank.

, linkp=<expression>

Argument to inverse link function

, noreset

When present, this option will prevent the accumulated hazard from resetting after each observation.

For the default Poisson distribution, the mean and variance are equal. If that shows inadequate variance, use of the beta option, with optional power, may be indicated. Care should be taken, since if the value of beta*power becomes too negative, that means the distribution is very close to Poisson and the Negative Binomial distribution may incur a performance cost.

Note:The Negative-Binomial Distribution log likelihood expression can generate unreasonable results for reasonable cases. For example, if count data that is actually Poisson is used, the parameter r in the usual (r,p) parameterization can become very large (and p becomes very small). The large value of r can cause a problem in the log gamma function that is used in the evaluation. So such data, which is completely reasonable, may give an unreasonable fit if care is not taken in how the log likelihood is evaluated numerically.

All about Count models

Observing a count, such as the number of episodes of vertigo over a prior time interval of one week, is also a hazard-based measurement. The simplest model of the count is called a Poisson distribution and it is closely related to the coin-toss process in the time-to-event model (discussed in “All about Time-to-Event modeling”). In this model, the mean (average) count of events is simply the accumulated hazard. One can think of the hazard as simply the average number of events per time unit, so the longer the time, the more events. Also, in the Poisson distribution, the variance of the event count is equal to the mean.

In some cases, if one tries to fit a distribution to the event counts, it is found that the variance is greater than what a Poisson distribution would predict. In this case, it is usual to replace the Poisson distribution with a Negative Binomial distribution, which is very similar to the Poisson, except that it contains an additional parameter indicating how much to inflate the variance.

Another alternative is to notice that a count of 0 (zero) is more likely to occur than what would be predicted by either a Poisson or Negative Binomial distribution. If so, this is called “zero inflation.” All of these alternatives can be modeled with the count statement.

The simplest case is a Poisson-distributed count:

count(n, h)

where n is the name of the observed variable, taking any non-negative integer value. h is the hazard expression, which can be time-varying.

To make it a Negative Binomial distribution, the beta keyword can be employed:

count(n, h, beta=b)

where b is the beta expression, taking any real value. The beta expression inflates the variance of the distribution by a factor (1+exp(b)), so if b is strongly negative, exp(b) is very small, so the variance is inflated almost not at all. If b is zero or more, then exp(b) is one or more, so the variance is inflated by an appreciable factor. The reason for encoding the variance inflation this way is to make it difficult to have a variance inflation factor too close to unity, because 1) that makes it equivalent to Poisson, and 2) the computation of the log-likelihood can become very slow.

If the beta keyword is present, an additional optional keyword may be used, power:

count(n, h, beta=b, power=p)

in which case, the variance inflation factor is (1+exp(b)^p), which gives a little more control over the shape of the variance inflation function. If the power is not given, its default value is unity.

Whether the Poisson or Negative Binomial distribution is used, zero-inflation is an option, by using the zprob keyword:

count(n, h, zprob=z)

where z is the additional probability to be given to the response n=0. Alternatively, the probability can be given through a link function, using keywords ilink and linkp:

count(n, h, ilink=ilogit, linkp=x)

where x is any real-numbered value, meaning that ilogit(x) yields the probability to be used for zero-inflation.

Count models can of course be written with the LL statement, but it can get complicated. In the simple Poisson case, it is:

deriv(cumhaz=h)
LL(n, lpois(cumhaz, n), doafter={cumhaz=0})

where cumhaz is the accumulated hazard over the time interval since the preceding observation, and n is the observed number of events in the interval. lpois is the built-in function giving the log-likelihood of a Poisson distribution having mean cumhaz. After the observation, the accumulated hazard must be reset to zero, in preparation for the following observation.

If the simple Poisson model is to be augmented with zero-inflation, it looks like this:

  deriv(cumhaz=h)
  LL(n, (n == 0 ? log(z+(1-z)*ppois(cumhaz, 0)):
        log(1-z)+lpois(cumhaz, n)
       )
     , doafter={cumhaz=0}
    )

where z is the excess probability of seeing zero. Think of it this way: before every observation, the subject flips a coin. If it comes up “heads” (with probability z), then a count of zero is reported. If it comes up “tails” (with probability 1-z), then the reported count is drawn from a Poisson distribution, which might also report zero. So, if zero is seen, its probability is z+(1-z)*pois(cumhaz, 0), where pois is the Poisson probability function. On the other hand, if a count n > 0 is seen, the coin must have come up “tails,” so the probability of that happening is (1-z)*pois(cumhaz, n). The log-likelihood given in the LL statement is just the log of all that.

If it is desired to use the link function instead of z, z can be replaced by ilogit(x). (There are a variety of built-in link functions, including iloglog, icloglog, and iprobit.)

If the Negative Binomial model is to be used, because Poisson does not have sufficient variance, in place of:

lpois(cumhaz, n)

the expression:

lnegbin(cumhaz, beta, power, n)

is used instead, where the meaning of beta and power are as explained above. If beta is zero, that means the variance is inflated by a factor of two. Typically, beta would be something estimated. If it is not desired to use the power argument, the default power of one should be used.