Appendix R
The Practice Success Prescription: Team-Based Veterinary Healthcare Delivery by Drs. Leak. Morris Humphries
Thomas E. Catanzaro, DVM, MHA, FACHE, DACHE

Basic Statistical Terminology

Data: If there are "n" observations on the variable "x", then there is a data set.

Average (central tendency): A measure of location of a set of observations that describe central tendency, such as mean, median, and mode. In the definition used by Dr. Tom Catanzaro, Veterinary Consulting International®, for veterinary practices, it is defined as:

The best of the worse, or the worse of the best.
No one should ever want to be either in veterinary healthcare.

 Mean (arithmetic mean ≡ x ): A measure of location. It is the sum of the observations divided by the number of observations in the set. If the continuous variable is "x" and there are "n" observations in the sample, then the sample mean (pronounced "X-bar") would be:


 

The mean has the disadvantage that its value is influenced by outliers. An outlier is an observation, whose value is highly inconsistent with the main body of data, and when excessively large, will increase the mean, and if very small, will decrease the mean. The mean is an appropriate measure of central tendency, if the distribution of the data is symmetrical. The mean is pulled to the right -- increased in value -- if the distribution is skewed to the right, and pulled to the left -- decreased in value -- if the distribution is skewed to the left.

 Median: A measure of location. It is the central value in a set of n observations that have been arranged in rank order. For example, the observations are arranged in increasing or decreasing order of magnitude. The arithmetic mean and median are close or equal in value if the distribution is symmetrical.

 If n is odd, the median is found by starting with the smallest observation in the ordered set and counting until (n + 1)/2 observation is reached. This is the median.

 If the n is even, the median lies midway between the central two observations.
The advantage of the median is that it is not affected by outliers or if the distribution of the data is skewed. Median is less than mean if the data is skewed to the right, and greater than the mean if the data is skewed to the left. A disadvantage of the median is that it does not incorporate all the observations in its calculations, and is hard to handle mathematically.

 Mode: A well-known, but infrequently used, measure of central tendency, defining the most commonly occurring observation in a set of observations. The mode often has a different value from the mean or median. The modal group, or modal class, is the group or class into which most observations fall in a histogram. For example, the most common litter size for a breed of dog.

 unimodal: A distribution that has a single mode or modal group.

 bimodal: A distribution that has two humps (two modal groups) separated by a trough, even if the frequency of occurrence in the two modal classes is not equal.

Histogram: A two-dimensional diagram illustration of the frequency distribution of a continuous variable. Usually the horizontal axis represents the units of measurements of the variable, while the vertical indicate the frequency for that class.

Measures of Dispersion: There are several measures of the spread of the data, each of which has different attributes:

 Range: The difference between the largest and smallest observations although it gives undue weight to extreme values and will, therefore, overestimate the dispersion of most of the observations if outliers are present. The range tends to increase in value as the number of observations increases.

 Variance: This is determined by calculating the deviation of each observation from the mean. Term it large if it is far from the mean and small if it is close to the mean. Some are positive numbers and some are negative numbers, so the effect of the sign of the deviation can be annulled by squaring every deviation, since the square is always positive. The arithmetic mean of these squared deviations is called the variance:


 

The variance uses every available observation, and is a sensible measure of spread. It is not intuitively appealing. We rarely calculate the variance from first principles in this age of hand-held calculators and computers, so we will make no attempt here to show the mechanics of the calculation.

 Standard Deviation (abbreviated SD, or σ = sigma): This is defined as the square root of the variance. The standard deviation may be regarded as an average of the deviations of the observations from the arithmetic mean. It is often denoted by an s in the sample, estimating σ in the population, and is given by:


 

The standard deviation uses all the observations in the data set. It is the measure of spread that's dimensionality is the same as that of the original observations. That means it is measured in the same units as the observations. The SD is of greatest use in relation to a symmetrical distribution of data. "Normal Distribution" is four times the standard deviation (mean ± 2 SD), and gives us the range of the majority of the values in the population.

Anova (analysis of variance): A powerful collection of parametric statistical procedures to analyze data, essentially comparing the mean of various groups of data. It relies on separating the total variation into its component parts, which are associated with defined sources of variation.

Gaussian Distribution (Normal Distribution): A continuous probability distribution. It is a unimodal, bell-shaped distribution and is approximated by many biological variables.

Chi-Squared (x2) Distribution: A continuous probability distribution, which is often used in hypothesis testing of proportions.

Chi-Squared (x2) Test for Trends: A specific chi-squared test used to determine whether there is a linear trend in proportions classified by an ordinal variable.

Regression Coefficient ($= true value of a regression coefficient, while b refers to an estimated regression coefficient): This usually refers to the coefficient that corresponds to the explanatory variable in a simple linear regression equation, such as it is the gradient or slope of the line (β estimated by b).

Correlation Coefficient (ρ) This measures the degree of linear association between two variables. If it is non-parametric, it is called a rank correlation coefficient. Rank correlation coefficient measures the association, not necessarily linear, between two variables, which may be ordinal.

Hypothesis Testing: The process of formulating and testing a proposition about the population, using the sample data.

 Null Hypothesis (H0 ): The term given to the proposition that is under test in a hypothesis testing procedure. In general, it is expressed in terms of no treatment effect. For example, no difference in means (slope is zero). We reject the null hypothesis if P < 0.05.

 P-Value: The P-value in a hypothesis test is the probability of obtaining the observed results, or more extreme results, if the null hypothesis is true. In most all cases, the P-value is determined by computer output.

 Decision: To reject the null hypothesis or not. Usually, but not necessarily, reject H0 if P < 0.05. Note that when there is no linear relationship between the two variables, both the slope β, and the correlation coefficient ρ, are equal to zero.

t-Test (single group, paired, and unpaired): These are significance tests, based on t-distribution (SEb = standard of error of a statistic b):

 t-distribution is a continuous probability distribution. The distribution is symmetrical about the mean, and is characterized by the degrees of freedom.


 

 Test statistic: This is the difference in the sample means divided by its estimated standard error. Most computer packages will perform this calculation, but it is useful to have its derivation. The test statistic follows the t-distribution:


 

 Degrees of freedom (df): The number of independent observations contributing to the value of a statistic, such as the number of observations available to evaluate that statistic minus the number of restrictions on those observations. As the degrees of freedom increase, it becomes more like a Normal Distribution.

Speaker Information
(click the speaker's name to view other papers and abstracts submitted by this speaker)

Thomas E. Catanzaro, DVM, MHA, FACHE, DACHE
Diplomate, American College of Healthcare Executives


MAIN : Appendix R : Appendix R
Powered By VIN
SAID=27