With many of these terms, it is necessary to distinguish between the
characteristics of a parent distribution and the estimates of those
characteristics obtained from a specific sample from the parent population.
For example, one may want to estimate characteristics of the parent population
from measurements taken on only a specific subset from that population.
A common convention, followed here, is to use Greek letters for population
characteristics and Roman letters for sample characteristics. Thus, for
example,
will denote the average of a set of measurements, but
will denote an average characteristic of the underlying population.
Precision is a measure of reproducibility or scatter in the results, without regard for the accuracy of the result. It is a measure of random error only; systematic errors will not affect the precision of a result, although they do affect the accuracy.
The mean of a set of measurements
is
the average:
| (2.1) |
The expectation value of a quantity is the value expected if
averaged over the entire parent population, and will be denoted by angle
brackets:
.
For example, the mean in the parent population that corresponds to the
sample mean x is
| (2.2) |
There is an important distinction to be made between the standard deviation characterizing the random error of a measurement and the standard deviation characterizing a set of accurate observations and hence reflecting physical reality. The latter is often encountered in experimental research, and pertains to the natural variability in the parameter being measured. The former represents the precision with which a constant value of that parameter could be measured in a particular experiment. For example, in experiments using airborne instrumentation variance spectra for measured variables seldom show evidence of noise except at low levels that correspond to digitization noise. This indicates that random measurement errors seldom contribute significantly to the uncertainty in such a measurement. However, there usually is high natural variability that causes repeated sets of measurements in presumably identical conditions to vary significantly, and the standard deviations among repeated measurements of, for example, fluxes of water vapor are large. This standard deviation reflects natural variability, not the random error in the measurement. It results from the variability of particular samples about the underlying population mean, and that variability would still characterize measurements from error-free sensors.
The median is the value that divides the population into equal halves; i.e., half the members lie above and half below the median. The most probable value is that observed most frequently, sometimes referred to as the mode of a distribution. As an example, the expected distribution of time intervals between randomly occurring events is
| (2.3) |
where N(t) is the number of events per time interval that
occur at time t, N0 is the total number of events, t
is the time, and
is a time constant characterizing the process. For this distribution, the
mean time is
,
the median time is
ln(2),
and the mode occurs for t=0.
A deviation
is the difference between a specific measurement or value and the mean.
The standard deviation
is the "root-mean-square" value of the deviations, obtained from
| (2.4) |
For a sample of measurements, the conventional estimate s of
the population standard deviation
is
| (2.5) |
The variance is the average of the squares of the deviations,
or the square of the standard deviation.
| (2.6) |
This form has the computational advantage that all quantities can be
calculated in one pass through the data, while the preceding form requires
two passes, one to calculate the mean and the second to calculate the deviations
from that mean.
If xj is a possible observation, the observed fraction of observations having the value xj is P(xj)=N(xj)/N where Nis the total number of observations and N(xj) is the number having value xj. The underlying population distribution function is then
| (2.7) |
The preceding quantities can then be expressed in terms of the distribution function; for example, the mean is
![]() |
(2.8) |
and the variance is
![]() |
(2.9) |
The extensions to continuous distribution functions are these:
| (2.10) |
| (2.11) |
| (2.12) |
Similarly, the expectation value for any function f of measurable characteristics x is
| (2.13) |
where x can be a set of variables and the multidimensional integration must then cover all possible values of x.
Other characteristics sometimes cited are the probable error, the magnitude of the deviation exceeded by 50% of the deviations, and the average deviation, the expectation value for the absolute magnitude of the deviations. For a Gaussian distribution, the probable error, average deviation, and standard deviation have the ratios 0.674:0.800:1.
If the distribution in measurement errors follows a known probability
distribution, then confidence intervals determined from that distribution
can be used to obtain quantitative estimates of probabilities associated
with such errors. It is this relationship that establishes the often used
correspondence between standard deviation and probability, for the Gaussian
distribution. Specifically, measurements falling more than two standard
deviations (
)
from the true value are expected with about 0.05 probability, so
limits correspond to approximate limits providing 95% coverage. Other distribution
functions can be treated in the same way.