Statistical results and computational formulas

The Descriptive Statistics object creates a Statistics worksheet and a Settings text file in the Results tab. The Statistics worksheet includes summaries of all statistical computations. The Settings file con­tains user-specified settings.

CI GEO X% Lower: Lower limit of an X% confidence interval for the logs of the data, back-trans­formed to original scale:

exp(Mean_Log – ta/2 x SD_Log)

CI GEO X% Upper: Upper limit of an X% confidence interval for the logs of the data, back-trans­formed to original scale:

exp(Mean_Log + ta/2 x SD_Log)

where (1 – a)*100 is the percentage given for the confidence interval
and ta/2 is from the t-distribution with N–1 degrees of freedom.

CI X% Lower: Lower limit of an X% confidence interval for the data (i.e., confidence interval that tells the range that is expected to have X% of the data)

Mean – (ta/2 x SD)

CI X% Lower GEO Mean: Lower limit of an X% confidence interval for the Geometric Mean:

descstats00014.png 

(equivalently, exp of the lower CI for Mean_Log).

CI X% Lower Mean: Lower limit of an X% confidence interval for the mean (i.e., the confidence inter­val in which the mean exists with X% certainty)

descstats00016.png 

CI X% Lower Var: Lower limit of an X% confidence interval for the variance (i.e., the confidence inter­val in which the variance exists with X% certainty):

descstats00018.png 

where cu2 is from the  c2-distribution with N–1 degrees of freedom.

cu2 cuts off an upper tail of area a/2 where (1–a)*100 is the percentage for the confidence inter­val.

CI X% Upper: Upper limit of an X% confidence interval for the data:

descstats00020.png 

where (1–a)*100 is the percentage given for the confidence interval,
and ta/2 is from the t-distribution with N–1 degrees of freedom.

CI X% Upper GEO Mean: Upper limit of an X% confidence interval for the Geometric Mean:

descstats00022.png 

where (1 – a)*100 is the percentage given for the confidence interval,
and ta/2 is from the t-distribution with N – 1 degrees of freedom.

CI X% Upper Mean: Upper limit of an X% confidence interval for the mean:

descstats00024.png 

where (1–a)*100 is the percentage given for the confidence interval,
and ta/2 is from the t-distribution with N–1 degrees of freedom.

Thus, a 95% confidence level indicates that a=0.05. Note that for N>30, the t-distribution is close to the normal distribution.

CI X% Upper Var: Upper limit of an X% confidence interval for the variance:

descstats00026.png 

where cL2is from the c2-distribution with N – 1 degrees of freedom.

cL2 cuts off a lower tail of area a/2 where (1 – a)*100 is the percentage for the confidence inter­val.

CV%: Coefficient of variation: (SD/Mean)*100

GEO Lower XSD and GEO Upper XSD: Range determined by adding or subtracting “X” log stan­dard deviations from the log-mean, back-transformed to original scale:

descstats00028.png 

When X=1, this range is equivalent to: (Geometric_Mean/exp(SD_Log) and Geomet­ric_Mean*exp(SD_Log))

Geometric CV%: Geometric coefficient of variation:

descstats00030.png 

Geometric Mean: Nth root of the product of the N observations. Equivalently, the exponential of the Mean_Log. Each value must be > zero:

descstats00032.png 

Geometric SD: Geometric standard deviation of the natural logs of the observations exp (SD_Log)

Harmonic Mean: Reciprocal of the arithmetic mean of the reciprocals of the observations:

descstats00034.png 

IQR: Interquartile range is the difference between the first and third quartiles (i.e., the middle 50% of the data). IQR is only included in the output when the Include Percentiles checkbox is checked.

KS PValue: Kolmogorov-Smirnov normality test r value. This quantifies the distance between the empirical distribution function of the data and the cumulative distribution function of the Normal distri­bution. The empirical distribution function Fn for n independent and identically distributed observa­tions Xi is defined as:

descstats00036.png 

where descstats00038.png is the indicator function and = 1 if descstats00040.png, otherwise 0.

The Kolmogorov-Smirnov statistic for a given cumulative distribution function F(x) is:

descstats00042.png 

where sup x is the supremum of the set of distances. A r-value is then computed to determine the significance of Dn.

Kurtosis: Sample coefficient of excess (sample excess kurtosis):

descstats00044.png 

Sample Excess Kurtosis=[Population Excess Kurtosis*(N+1)+6]*(N–1)/[(N–2) x (N–3)]

Kurtosis Pop: Population coefficient of excess (population excess kurtosis):

descstats00046.png 

Population Excess Kurtosis=((Sample Excess Kurtosis x (N–2) x (N–3)/(N–1)) –6)/(N+1)

Lower XSD and Upper XSD: Range determined by adding or subtracting X standard deviations from the mean Mean +/– X*SD

Max: Maximum value

Mean: Arithmetic average

descstats00048.png 

Mean log: Arithmetic average of the natural logs of the observations

descstats00050.png 

Median: Median value — from the percentiles computations, 50th percentile.

Min: Minimum value.

N: Number of observations with non-missing data (i.e., numeric observations).

NMiss: Number of observations with missing data (i.e., non-numeric observations such as text or blanks).

NObs: Number of observations (i.e., N+NMiss)

P(ercentiles): The Pth percentile divides the distribution at a point such that P percent of the distribu­tion are below this point. For a sample size of n, the quantile corresponding to the proportion p (0<p<1) is defined as:

Q(p)=(1 – f)*x(j)+f*x(j+1)

where:
     j = int(p*(n+1)), (integer part)
     f = p*(n+1) – j, (fractional part)
     x(j) = the j-th order statistic

The above is used if 1 £  j < n. Otherwise, the empirical quantile is the smallest order statistic for j=0 or the largest order statistic for j=n.

(Definition 6 in Hyndman and Fan, "Sample Quantiles in Statistical Packages", American Statistician, Nov 1996. Equivalent to SAS PCTLDEF 4, Excel PERCENTILE.EXC, and NIST Engineering Statistics Handbook, Section 7.2.6.2, Percentiles.)

Pseudo SD: Jackknife estimate of the standard deviation of the harmonic mean. For n points, x1, … xn, the pseudo standard deviation is:

descstats00052.png 

where:

descstats00054.png 

and:

descstats00056.png 

Range: Range of values (maximum value minus minimum value).

SD: Standard Deviation:

descstats00058.png 

SD Log: Standard deviation of the natural logs of the observations:

descstats00060.png 

SE: Standard Error:

descstats00062.png 

Skewness: Sample coefficient of skewness (sample skewness):

descstats00064.png 

Sample Skewness=Population Skewness*sqrt(N*(N–1))/(N–2)

Skewness Pop: Population coefficient of skewness (population skewness):

descstats00066.png 

Population Skewness=Sample Skewness*(N–2)/sqrt(N*(N–1))

Sum: Sum of the values in the column mapped to Summary:

descstats00068.png 

Variance: Unbiased sample variance:

descstats00070.png 

Units

When summary statistics are calculated for a variable with units, some of the output will have units. Assuming that the variable summarized is x and has x-units specified, the units for the summary sta­tistics are:

N, NMiss, NObs: No units
CV%, Geometric CV%: No units
Skewness, Skewness Pop, Kurtosis, Kurtosis Pop, KSP Value: No units
Mean Log, SD Log: No units
Variance: x-unit2 
Cl Lower Var, Cl Upper Var: x-unit2 
Everything else: x-unit

If more than one Summary variable is mapped, with at least two of those variables having units in the input dataset, and the units differ, a stacked Units column is displayed in the Statistics output work­sheet that reports the units of the Summary variables. In cases where the input data does not have units, or the units are all the same, then the units of the statistics are displayed in the column headers.


Last modified date:7/9/20
Certara USA, Inc.
Legal Notice | Contact Certara
© 2020 Certara USA, Inc. All rights reserved.