Descriptive Statistics object

Descriptive Statistics

Phoenix can compute summary statistics for variables in any worksheet. This feature is frequently used to create data to plot means and standard errors, for preclinical summaries, to summarize modeling results, or to test for normal distribution of data. Separate statistics for subgroups are obtained through the use of one or more sort variables.

Use one of the following to add a Descriptive Statistics object to a Workflow:

Right-click menu for a Workflow object: New > NCA and Toolbox > Descriptive Stats.
Or Main menu: Insert > NCA and Toolbox > Descriptive Stats.
Or right-click menu for a worksheet: Send To > NCA and Toolbox > Descriptive Stats.

Note:To view the object in its own window, select it in the Object Browser and double-click it or press ENTER. All instructions for setting up and execution are the same whether the object is viewed in its own window or in Phoenix view.

This section includes information on the following:

•User interface description

•Statistical results and computational formulas

•Weighted summary statistics

User interface description

Main Mappings panel
Options tab

Main Mappings panel

Use the Main Mappings panel to identify which variables are used to compute statistics or weight the data. Required input is highlighted orange in the interface.

None: Data types mapped to this context are not included in any computation or output.

Summary: The variable(s) for which statistics are computed.

Sort: Categorical variable(s) identifying individual data profiles, such as subject ID or gender. Separate statistics are computed for each unique combination of sort variables.

Weight: If a weight variable is present in the dataset, it can be used to weight the summary statistics. If a weight is non-numeric or missing, the observation is excluded from the analysis. If a weight is negative, it is changed to zero (0) upon execution.

Options tab

See “Statistical results and computational formulas” for a full list of the available statistics.

Use the tree on the right to select the statistics to compute and include in the report by checking the corresponding checkboxes.

The available statistics are grouped into categories and checking/unchecking a category checkbox controls all of the checkboxes for statistics within that category. For example, unchecking the Spread Statistics checkbox unchecks the Min, Median, Max, and Range checkboxes.

Click the Select All or Clear All buttons to quickly check/uncheck all checkboxes in the list, respectively, with a single click.

Click the button to expand all categories in the tree. Click the button to collapse all categories in the tree.

There are a number of preset percentiles available in the Percentiles category (1, 2.5, 5, 10, 25, 50, 75, 90, 95, 97.5, 99). However, if you wish to include other percentiles, enter them as a comma-separated list in the User-specified percentiles field.

To generate confidence interval statistics, type the desired confidence interval in the Confidence Interval field.

Check the Confidence Intervals checkbox to compute all statistics in that category or expand the category to select a subset of the statistics

To generate standard deviation statistics, type the desired number of standard deviations in the Number of SD Statistics field. The value must be greater than 0 and less than or equal to 10.

Check the Number of SD Statistics checkbox to compute all statistics in that category or expand the category to select a subset of the statistics.

Statistical results and computational formulas

The Descriptive Stats object creates a Statistics worksheet and a Settings text file in the Results tab. The Statistics worksheet includes summaries of all statistical computations. The Settings file contains user-specified settings.

This table lists all possible descriptive statistics output.

Statistic	Description
CI GEO X% Lower	Lower limit of an X% confidence interval for the logs of the data, back-transformed to original scale: exp(Mean_Log x SD_Log)
CI GEO X% Upper	Upper limit of an X% confidence interval for the logs of the data, back-transformed to original scale: exp(Mean_Log + x SD_Log) where (1 – a)*100 is the percentage given for the confidence interval, and t_a_/2 is from the t-distribution with N–1 degrees of freedom.
CI X% Lower	Lower limit of an X% confidence interval for the data (i.e., confidence interval that tells the range that is expected to have X% of the data): Mean – ( x SD)
CI X% Lower GEO Mean	Lower limit of an X% confidence interval for the Geometric Mean: exp(Mean_Log – x SD_Log/sqrt(N)) (equivalently, exp of the lower CI for Mean_Log).
CI X% Lower Mean	Lower limit of an X% confidence interval for the mean (i.e., the confidence interval in which the mean exists with X% certainty):
CI X% Lower Var	Lower limit of an X% confidence interval for the variance (i.e., the confidence interval in which the variance exists with X% certainty): where _U² is from the ²-distribution with N–1 degrees of freedom. _U² cuts off an upper tail of area a/2 where (1–a)*100 is the percentage for the confidence interval.
CI X% Upper	Upper limit of an X% confidence interval for the data: where (1–a)*100 is the percentage given for the confidence interval, and t_a_/2 is from the t-distribution with N–1 degrees of freedom.
CI X% Upper GEO Mean	Upper limit of an X% confidence interval for the Geometric Mean: exp(Mean_Log + x SD_Log/sqrt(N)) where (1 – a)*100 is the percentage given for the confidence interval, and t_a_/2 is from the t-distribution with N – 1 degrees of freedom.
CI X% Upper Mean	Upper limit of an X% confidence interval for the mean: where (1–a)*100 is the percentage given for the confidence interval, and t_a_/2 is from the t-distribution with N–1 degrees of freedom. Thus, a 95% confidence level indicates that a=0.05. Note that for N>30, the t-distribution is close to the normal distribution.
CI X% Upper Var	Upper limit of an X% confidence interval for the variance: where _L² is from the ²-distribution with N – 1 degrees of freedom. _L² cuts off a lower tail of area a/2 where (1 – a)*100 is the percentage for the confidence interval.
CV%	Coefficient of variation: (SD/Mean)*100
GEO Lower XSD and GEO Upper XSD	Range determined by adding or subtracting “X” log standard deviations from the log-mean, back-transformed to original scale: exp(Mean_Log +/– XSD_Log) When X=1, this range is equivalent to: (Geometric_Mean/exp(SD_Log) and Geometric_Meanexp(SD_Log))
Geometric CV%	Geometric coefficient of variation. Note: Due to a statistic name change from “CV% Geometric Mean,” backward compatibility is not maintained. Be sure to remap the column in referencing objects.
Geometric Mean	N^th root of the product of the N observations. Equivalently, the exponential of the Mean_Log. Each value must be > zero.
Geometric SD	Geometric standard deviation of the natural logs of the observations: exp (SD_Log)
Harmonic Mean	Reciprocal of the arithmetic mean of the reciprocals of the observations:
IQR	Interquartile range is the difference between the first and third quartiles (i.e., the middle 50% of the data). IQR is only included in the output when the Include Percentiles checkbox is checked.
KS PValue	Kolmogorov-Smirnov normality test value. This quantifies the distance between the empirical distribution function of the data and the cumulative distribution function of the Normal distribution. The empirical distribution function F_n for n independent and identically distributed observations X_i is defined as: where: is the indicator function and = 1 if = 0, otherwise The Kolmogorov-Smirnov statistic for a given cumulative distribution function F(x) is: where sup x is the supremum of the set of distances. A -value is then computed to determine the significance of D_n.
Kurtosis	Sample coefficient of excess (sample excess kurtosis): Sample Excess Kurtosis=[Population Excess Kurtosis(N+1)+6](N–1)/[(N–2) x (N–3)]
Kurtosis Pop	Population coefficient of excess (population excess kurtosis): Population Excess Kurtosis=((Sample Excess Kurtosis x (N–2) x (N–3)/(N–1)) –6)/(N+1)
Lower XSD and Upper XSD	Range determined by adding or subtracting X standard deviations from the mean: Mean +/– X*SD
Max	Maximum value
Mean	Arithmetic average
Mean log	Arithmetic average of the natural logs of the observations:
Median	Median value — from the percentiles computations, 50^th percentile.
Min	Minimum value
N	Number of observations with non-missing data (i.e., numeric observations).
Nmiss	Number of observations with missing data (i.e., non-numeric observations such as text or blanks).
Nobs	Number of observations (i.e., N+NMiss)
P(ercentiles)	The P^th percentile divides the distribution at a point such that P percent of the distribution are below this point. For a sample size of n, the quantile corresponding to the proportion p (0<p<1) is defined as: Q(p)=(1 – f)x(j)+fx(j+1) where: j = int(p(n+1)), (integer part) f = p(n+1) – j, (fractional part) x(j) = the j-th order statistic The above is used if 1 £ j < n. Otherwise, the empirical quantile is the smallest order statistic for j=0 or the largest order statistic for j=n.
Pseudo SD	Jackknife estimate of the standard deviation of the harmonic mean. For n points, x₁, … x_n, the pseudo standard deviation is: pseudo SD = where and
Range	Range of values (maximum value minus minimum value).
SD	Standard Deviation:
SD Log	Standard deviation of the natural logs of the observations:
SE	Standard Error:
Skewness	Sample coefficient of skewness (sample skewness): Sample Skewness=Population Skewnesssqrt(N(N–1))/(N–2)
Skewness Pop	Population coefficient of skewness (population skewness): Population Skewness=Sample Skewness(N–2)/sqrt(N(N–1))
Sum	Sum of the values in the column mapped to Summary.
Variance	Unbiased sample variance:

Units

When summary statistics are calculated for a variable with units, some of the output will have units. Assuming that the variable summarized is x and has x-units specified, the units for the summary statistics are:

Statistic	Units
N, Nmiss, Nobs	No units
CV%, Geometric CV%	No units
Skewness, Skewness Pop, Kurtosis, Kurtosis Pop, KSPValue	No units
Mean Log, SD Log	No units
Variance	x-unit²
CI Lower Var, CI Upper Var	x-unit²
Everything else	x-unit

If more than one Summary variable is mapped, with at least two of those variables having units in the input dataset, and the units differ, a stacked Units column is displayed in the Statistics output worksheet that reports the units of the Summary variables. In cases where the input data does not have units, or the units are all the same, then the units of the statistics are displayed in the column headers.

Weighted summary statistics

Summary statistics can be weighted by selecting a column in the dataset in the Main Mappings panel to provide weights. If a weight is non-numeric or missing, the observation is excluded from the analysis. Weighted descriptive statistics output also includes a text file called Settings that contains user-specified settings. Some summary statistics and all percentiles are excluded in weighted output.

Results and computational formulas for weighted calculations

The output for weighted summary statistics contains a column indicating the summary variable(s), one for each sort variable, and the statistics listed below.

Statistic	Description
CI X% Lower	Lower limit of an X% confidence interval for the weighted data: Weighted Mean – x Weighted SD
CI X% Lower Mean	Lower limit of an X% confidence interval for the weighted mean.
CI X% Lower Var	Lower limit of an X% confidence interval for the weighted variance.
CI X% Upper	Upper limit of an X% confidence interval for the weighted data: Weighted Mean + x Weighted SD
CI X% Upper Mean	Upper limit of an X% confidence interval for the weighted mean.
CI X% Upper Var	Upper limit of an X% confidence interval for the weighted variance.
CV%	Weighted coefficient of variation: (Weighted SD/Weighted Mean)*100
Kurtosis Pop	Weighted coefficient of excess (population excess kurtosis):
Lower XSD and Upper XSD	Range determined by adding or subtracting X weighted standard deviations from the weighted mean: Weighted Mean +/– X*Weighted SD
Max	Maximum value
Mean	Weighted arithmetic average:
Min	Minimum value
N	Number of non-missing observations (including those with weights=zero)
Nmiss	Number of observations with missing data
Nobs	Number of observations (including observations with weights=zero)
Range	Range of values (maximum value minus minimum value)
SD	Weighted standard deviation:
SE	Weighted standard error:
Skewness Pop	Weighted coefficient of skewness (population skewness):
Sum	Weighted Sum:
Variance	Weighted variance: where is the weighted mean.