Descriptive Statistics

Phoenix can compute summary statistics for variables in any worksheet. This feature is frequently used to create data to plot means and standard errors, for preclinical summaries, to summarize mod­eling results, or to test for normal distribution of data. Separate statistics for subgroups are obtained through the use of one or more sort variables.

Use one of the following to add a Descriptive Statistics object to a Workflow:

Right-click menu for a Workflow object: New > NCA and Toolbox > Descriptive Stats.
Or Main menu: Insert > NCA and Toolbox > Descriptive Stats.
Or right-click menu for a worksheet: Send To > NCA and Toolbox > Descriptive Stats.

Note:To view the object in its own window, select it in the Object Browser and double-click it or press ENTER. All instructions for setting up and execution are the same whether the object is viewed in its own window or in Phoenix view.

This section includes information on the following:

User interface description

Statistical results and computational formulas

Weighted summary statistics

User interface description

Main Mappings panel

Use the Main Mappings panel to identify which variables are used to compute statistics or weight the data. Required input is highlighted orange in the interface.

None: Data types mapped to this context are not included in any computation or output.

Summary: The variable(s) for which statistics are computed.

Sort: Categorical variable(s) identifying individual data profiles, such as subject ID or gender. Separate statistics are computed for each unique combination of sort variables.

Weight: If a weight variable is present in the dataset, it can be used to weight the summary statis­tics. If a weight is non-numeric or missing, the observation is excluded from the analysis. If a weight is negative, it is changed to zero (0) upon execution.

Options tab

See “Statistical results and computational formulas” for a full list of the available statistics.

DescStats_Options_tab.png 

Use the tree on the right to select the statistics to compute and include in the report by checking the corresponding checkboxes.

The available statistics are grouped into categories and checking/unchecking a category check­box controls all of the checkboxes for statistics within that category. For example, unchecking the Spread Statistics checkbox unchecks the Min, Median, Max, and Range checkboxes.

Click the Select All or Clear All buttons to quickly check/uncheck all checkboxes in the list, respectively, with a single click.

Click the expand_all.png button to expand all categories in the tree. Click the collapse_all.png button to collapse all cat­egories in the tree.

There are a number of preset percentiles available in the Percentiles category (1, 2.5, 5, 10, 25, 50, 75, 90, 95, 97.5, 99). However, if you wish to include other percentiles, enter them as a comma-sepa­rated list in the User-specified percentiles field.

To generate confidence interval statistics, type the desired confidence interval in the Confidence Interval field.

Check the Confidence Intervals checkbox to compute all statistics in that category or expand the category to select a subset of the statistics

To generate standard deviation statistics, type the desired number of standard deviations in the Num­ber of SD Statistics field. The value must be greater than 0 and less than or equal to 10.

Check the Number of SD Statistics checkbox to compute all statistics in that category or expand the category to select a subset of the statistics.

Statistical results and computational formulas

The Descriptive Stats object creates a Statistics worksheet and a Settings text file in the Results tab. The Statistics worksheet includes summaries of all statistical computations. The Settings file contains user-specified settings.

This table lists all possible descriptive statistics output.

Statistic

Description

CI GEO X% Lower

Lower limit of an X% confidence interval for the logs of the data, back-transformed to original scale:
exp(Mean_Log descstats00008.png x SD_Log)

CI GEO X% Upper

Upper limit of an X% confidence interval for the logs of the data, back-transformed to original scale:
exp(Mean_Log + descstats00010.png x SD_Log)
where (1 – a)*100 is the percentage given for the confidence inter­val, and ta/2 is from the t-distribution with N–1 degrees of freedom.

CI X% Lower

Lower limit of an X% confidence interval for the data (i.e., confi­dence interval that tells the range that is expected to have X% of the data):
Mean – (descstats00012.png x SD)

CI X% Lower GEO Mean

Lower limit of an X% confidence interval for the Geometric Mean:
exp(Mean_Log – descstats00014.png x SD_Log/sqrt(N))
(equivalently, exp of the lower CI for Mean_Log).

CI X% Lower Mean

Lower limit of an X% confidence interval for the mean (i.e., the con­fidence interval in which the mean exists with X% certainty):
descstats00016.png

CI X% Lower Var

Lower limit of an X% confidence interval for the variance (i.e., the confidence interval in which the variance exists with X% certainty):
descstats00018.png
where descstats00020.pngU2 is from the descstats00022.png2-distribution with N–1 degrees of freedom. descstats00024.pngU2 cuts off an upper tail of area a/2 where (1–a)*100 is the per­centage for the confidence interval.

CI X% Upper

Upper limit of an X% confidence interval for the data:
descstats00026.png
where (1–a)*100 is the percentage given for the confidence interval, and ta/2 is from the t-distribution with N–1 degrees of freedom.

CI X% Upper GEO Mean

Upper limit of an X% confidence interval for the Geometric Mean:
exp(Mean_Log + descstats00028.png x SD_Log/sqrt(N))
where (1 – a)*100 is the percentage given for the confidence inter­val, and ta/2 is from the t-distribution with N – 1 degrees of freedom.

CI X% Upper Mean

Upper limit of an X% confidence interval for the mean:
descstats00030.png
where (1–a)*100 is the percentage given for the confidence interval, and ta/2 is from the t-distribution with N–1 degrees of freedom.
Thus, a 95% confidence level indicates that a=0.05. Note that for N>30, the t-distribution is close to the normal distribution.

CI X% Upper Var

Upper limit of an X% confidence interval for the variance:
descstats00032.png
where descstats00034.pngL2 is from the descstats00036.png2-distribution with N – 1 degrees of free­dom.
descstats00038.pngL2 cuts off a lower tail of area a/2 where (1 – a)*100 is the per­centage for the confidence interval.

CV%

Coefficient of variation: (SD/Mean)*100

GEO Lower XSD and GEO Upper XSD

Range determined by adding or subtracting “X” log standard devia­tions from the log-mean, back-transformed to original scale:
exp(Mean_Log +/– X*SD_Log)
When X=1, this range is equivalent to: (Geometric_Mean/exp(SD_Log) and Geometric_Mean*exp(SD_Log))

Geometric CV%

Geometric coefficient of variation.
descstats00040.png
Note: Due to a statistic name change from “CV% Geometric Mean,” backward compatibility is not maintained.
Be sure to remap the column in referencing objects.

Geometric Mean

Nth root of the product of the N observations. Equivalently, the expo­nential of the Mean_Log. Each value must be > zero.
descstats00042.png

Geometric SD

Geometric standard deviation of the natural logs of the observa­tions:
exp (SD_Log)

Harmonic Mean

Reciprocal of the arithmetic mean of the reciprocals of the observa­tions:
descstats00044.png

IQR

Interquartile range is the difference between the first and third quar­tiles (i.e., the middle 50% of the data). IQR is only included in the output when the Include Percentiles checkbox is checked.

KS PValue

Kolmogorov-Smirnov normality test descstats00046.png value. This quantifies the dis­tance between the empirical distribution function of the data and the cumulative distribution function of the Normal distribution.

The empirical distribution function Fn for n independent and identi­cally distributed observations Xi is defined as:
descstats00048.png
where:
descstats00050.pngis the indicator function and
   = 1 if descstats00052.png
  = 0, otherwise

The Kolmogorov-Smirnov statistic for a given cumulative distribution function F(x) is:
descstats00054.png
where sup x is the supremum of the set of distances. A descstats00056.png-value is then computed to determine the significance of Dn.

Kurtosis

Sample coefficient of excess (sample excess kurtosis):
descstats00058.png
Sample Excess Kurtosis=[Population Excess Kurto­sis*(N+1)+6]*(N–1)/[(N–2) x (N–3)]

Kurtosis Pop

Population coefficient of excess (population excess kurtosis):
descstats00060.png
Population Excess Kurtosis=((Sample Excess Kurtosis x (N–2) x (N–3)/(N–1)) –6)/(N+1)

Lower XSD and Upper XSD

Range determined by adding or subtracting X standard deviations from the mean:
Mean +/– X*SD

Max

Maximum value

Mean

Arithmetic average
descstats00062.png

Mean log

Arithmetic average of the natural logs of the observations:
descstats00064.png

Median

Median value — from the percentiles computations, 50th percentile.

Min

Minimum value

N

Number of observations with non-missing data (i.e., numeric obser­vations).

Nmiss

Number of observations with missing data (i.e., non-numeric obser­vations such as text or blanks).

Nobs

Number of observations (i.e., N+NMiss)

P(ercentiles)

The Pth percentile divides the distribution at a point such that P per­cent of the distribution are below this point.
For a sample size of n, the quantile corresponding to the proportion p (0<p<1) is defined as:
Q(p)=(1 – f)*x(j)+f*x(j+1)
where:
     j = int(p*(n+1)), (integer part)
     f = p*(n+1) – j, (fractional part)
     x(j) = the j-th order statistic
The above is used if 1 £  j < n.
Otherwise, the empirical quantile is the smallest order statistic for j=0 or the largest order statistic for j=n.

Pseudo SD

Jackknife estimate of the standard deviation of the harmonic mean.
For n points, x1, … xn, the pseudo standard deviation is:
pseudo SD = descstats00066.png
where
descstats00068.png
and
descstats00070.png

Range

Range of values (maximum value minus minimum value).

SD

Standard Deviation:
descstats00072.png

SD Log

Standard deviation of the natural logs of the observations:
descstats00074.png

SE

Standard Error:
descstats00076.png

Skewness

Sample coefficient of skewness (sample skewness):
descstats00078.png
Sample Skewness=Population Skewness*sqrt(N*(N–1))/(N–2)

Skewness Pop

Population coefficient of skewness (population skewness):
descstats00080.png
Population Skewness=Sample Skewness*(N–2)/sqrt(N*(N–1))

Sum

Sum of the values in the column mapped to Summary.
descstats00082.png

Variance

Unbiased sample variance:
descstats00084.png

Units

When summary statistics are calculated for a variable with units, some of the output will have units. Assuming that the variable summarized is x and has x-units specified, the units for the summary sta­tistics are:

Statistic

Units

N, Nmiss, Nobs

No units

CV%, Geometric CV%

No units

Skewness, Skewness Pop,
Kurtosis, Kurtosis Pop,
KSPValue

No units

Mean Log, SD Log

No units

Variance

x-unit2 

CI Lower Var, CI Upper Var

x-unit2 

Everything else

x-unit

If more than one Summary variable is mapped, with at least two of those variables having units in the input dataset, and the units differ, a stacked Units column is displayed in the Statistics output work­sheet that reports the units of the Summary variables. In cases where the input data does not have units, or the units are all the same, then the units of the statistics are displayed in the column headers.

Weighted summary statistics

Summary statistics can be weighted by selecting a column in the dataset in the Main Mappings panel to provide weights. If a weight is non-numeric or missing, the observation is excluded from the analy­sis. Weighted descriptive statistics output also includes a text file called Settings that contains user-specified settings. Some summary statistics and all percentiles are excluded in weighted output.

Results and computational formulas for weighted calculations

The output for weighted summary statistics contains a column indicating the summary variable(s), one for each sort variable, and the statistics listed below.

Statistic

Description

CI X% Lower

Lower limit of an X% confidence interval for the weighted data:
Weighted Mean – descstats00086.png x Weighted SD 

CI X% Lower Mean

Lower limit of an X% confidence interval for the weighted mean.

CI X% Lower Var

Lower limit of an X% confidence interval for the weighted variance.

CI X% Upper

Upper limit of an X% confidence interval for the weighted data:
Weighted Mean + descstats00088.png x Weighted SD 

CI X% Upper Mean

Upper limit of an X% confidence interval for the weighted mean.

CI X% Upper Var

Upper limit of an X% confidence interval for the weighted variance.

CV%

Weighted coefficient of variation:
(Weighted SD/Weighted Mean)*100

Kurtosis Pop

Weighted coefficient of excess (population excess kurtosis):
descstats00090.png

Lower XSD and Upper XSD

Range determined by adding or subtracting X weighted standard deviations from the weighted mean:
Weighted Mean +/– X*Weighted SD

Max

Maximum value

Mean

Weighted arithmetic average:
descstats00092.png

Min

Minimum value

N

Number of non-missing observations (including those with weights=zero)

Nmiss

Number of observations with missing data

Nobs

Number of observations (including observations with weights=zero)

Range

Range of values (maximum value minus minimum value)

SD

Weighted standard deviation:
descstats00094.png

SE

Weighted standard error:
descstats00096.png

Skewness Pop

Weighted coefficient of skewness (population skewness):
descstats00098.png

Sum

Weighted Sum:
descstats00100.png

Variance

Weighted variance:
descstats00102.png 
where descstats00104.png is the weighted mean.


Last modified date:6/26/19
Certara USA, Inc.
Legal Notice | Contact Certara
© 2019 Certara USA, Inc. All rights reserved.