next up previous contents
Next: 7.3 The F-test Up: 7. Hypothesis testing Previous: 7.1 Purpose

7.2 The Chisquare distribution

Section 4.4 developed the connection between the method of least squares and the maximum-likelihood method, and showed that the one-standard-deviation limits for the fit correspond to changes in fitted parameters that increase the chisquare by one. Consider, for example, the chisquare function for the mean u of a set of measurements:
\begin{displaymath}\chi^2(u) = \sum_i {{(y_i-u)^2}\over{\sigma_i^2}} \end{displaymath} (7.1)
 
\begin{displaymath}\chi^2(u\pm\sigma_u) = \sum_i{{(y_i-(u\pm\sigma_u))^2}\over{\sigma_i^2}} \end{displaymath} (7.2)
 
\begin{displaymath}~~~~~~ = \sum_i{{(y_i-u)^2\pm2\sigma_u(y_i-u)+\sigma_u^2}\over{\sigma_i^2}} \end{displaymath} (7.3)
 
\begin{displaymath}~~~~~~ = \chi^2(u)+\sigma_u^2\sum_i{{1}\over{\sigma_i^2}} .\end{displaymath} (7.4)
 

Because

\begin{displaymath}\sigma_u^2 = {{1}\over{\sum_i {{1}\over{\sigma_i^2}}}} , \end{displaymath} (7.5)
 

\begin{displaymath}\chi^2(u\pm\sigma_u) = \chi^2(u) + 1, \end{displaymath} (7.6)
 

the same result obtained more generally in section 5.3.

It is worth re-emphasis that this result depends on the validity of the expected Gaussian distribution of errors, and has the same connection to this assumption as does the least-squares method of fitting to data. The basic equation used here for the chisquare function, (7.1), depends on this assumption, and (7.2) is only valid if the errors from individual measurements entering the mean are uncorrelated. Inference based on the chisquare distribution will not be valid if these conditions are not satisfied.

It is useful to consider the distribution in values expected for the chisquare function when these conditions are satisfied. Consider the case where the correct functional form f(x) is used in a fit to measurements $\{y_i\}$ obtained at values $\{x_i\}$ of the independent variable x. If there are N measurements in the fit, each characterized by the same measurement uncertainty $\sigma$, the variance of the measurements about the best-fit relationship is

\begin{displaymath}V = {{1}\over{N}} \sum_i (y_i-f(x_i))^2 = {{1}\over{N}}\sigma^2\chi^2. \end{displaymath} (7.7)
 

If there are n parameters in the fit,

\begin{displaymath}\langle s^2\rangle = \sigma^2 {{N}\over{N-n}} \end{displaymath} (7.8)
 

when f(x) is the correct functional relationship, so

\begin{displaymath}\langle\chi^2\rangle = N-n = {\rm degrees\ of\ freedom.}\end{displaymath} (7.9)
 

For example, for 25 measurements and a fit with three parameters, $\chi^2\approx 22$ is expected if the functional relationship is correct.

For the chisquare distribution function, the expected value and the distribution about that expected value both depend on the number of degrees of freedom. The distribution is that expected for the sum of the squares of $\nu=N-n$ independent unit-normal variables. The functional form of the chisquare distribution is7.1

\begin{displaymath}P(z,\nu) = {{1}\over{2\Gamma(\nu/2)}}\Bigl({{z}\over{2}}\Bigr)^{{{\nu}\over{2}}-1} e^{-z/2} \end{displaymath} (7.10)
 

where $\nu$ is the number of degrees of freedom and z is the value of the chisquare function. $\Gamma$ is the generalized factorial function, defined so that $\Gamma(1)=1$ and $\Gamma(n+1)=n\Gamma(n)$. (For integer n$\Gamma(n+1)=n!$, and $\Gamma(1/2)=\sqrt{\pi}$.)

Figure 7.1 shows the probability that $\chi^2$ will exceed various limits, as a function of the number of degrees of freedom. These curves make it possible to judge how consistent a fit is with the data. If a chisquare is obtained that corresponds to a very unlikely value (e.g., if only about 5% of all observations are expected to have this large a chisquare), then the fit is not very good and the functional dependence is probably not correct.


 
Figure 7.1: Probability of exceeding various values of the chisquare, for the degrees of freedom labeled on the curves.


The following properties of the chisquare distribution function are often useful:
1.
$\langle\chi^2\rangle=\nu$, as shown before, where $\nu$ is the number of degrees of freedom in the fit.
2.
The variance in the value of $\chi^2$ is $V(\chi^2)=2\nu$.
3.
The chisquare function approaches a Gaussian distribution with the above mean and variance, for large $\nu$.
4.
The most probable value in the distribution occurs for $\chi^2=\nu-2$.
One valuable use of the chisquare statistic is in considerations of how many parameters are needed to account for the observed variability. If the chisquare test indicates an unsatisfactory fit, it may be necessary to add additional parameters to the fit.


next up previous contents
Next: 7.3 The F-test Up: 7. Hypothesis testing Previous: 7.1 Purpose 


NCAR Advanced Study Program
http://www.asp.ucar.edu