next up previous contents
Next: 4.2.6 Estimating exponential-decay time Up: 4.2 Applications Previous: 4.2.4 An example: Fitting

4.2.5 Maximum-likelihood estimation in cases with correlated errors

Consider the Taylor-series expansion of W about the maximum set of values for the parameters, $\{{\it a^*}\}$:
\begin{displaymath}W(a) = W(a^*) + \sum_j {{\partial W}\over{\partial a_j}}\Bigr......tial a_k}}\Bigr\vert _{a^*} (a_j-a_j^*)(a_k-a_k^*) + \cdots . \end{displaymath} (4.27)
 

The second term on the right equals zero, because that is the condition for the validity of the maximum-likelihood solution. The reason for expanding W, rather than ${\cal L}$, is that the joint probability function can be approximated by a Gaussian distribution function and (4.27) gives a generalized Gaussian form for ${\cal L}$:

\begin{displaymath}{\cal L}(\{a\}) = A \exp\{-{{1}\over{2}}\sum_j\sum_k \deltaa_j H_{jk} \delta a_k \} \end{displaymath} (4.28)
 

where A=exp $\{W(a^*)\}$$\delta a_j$(aj-aj*), and

\begin{displaymath}H_{jk} = - {{\partial^2W}\over{\partial a_j\partial a_k}}\Bigr\vert _{a^*}\ . \end{displaymath} (4.29)
 

The matrix H is sometimes called the information matrix, and its inverse is the covariance or error matrix discussed in Chapter 2.

For uncorrelated fluctuations, the mixed derivatives of the likelihood function are zero because the variation of W with ak is not a function of aj if the two parameters are independent. In this case, the likelihood function reduces to the product of two independent Gaussian functions. However, if the parameters are not independent, there are terms in the resulting likelihood that depend on the mixed products of the fluctuations; e.g.,

\begin{displaymath}{\cal L}(a_1,a_2) = {\cal L}_{max} \exp\{-{{H_{11}\deltaa_1^......r{2}} -{{(H_{12}+H_{21}) \delta a_1\delta a_2}\over{2}}\} \ . \end{displaymath} (4.30)
 

In all but degenerate cases, it is possible to transform the parameters $\{a\}$ into new parameters that are independent (or to diagonalize the error matrix). In the two-dimensional case, consider a rotation by an angle $\theta$ so that new parameters y and z are formed from the old parameters a1 and a2:

\begin{displaymath}y = a_1 \cos \theta + a_2 \sin \theta \end{displaymath} (4.31)
 
\begin{displaymath}z = -a_1\sin \theta + a_2 \cos \theta \end{displaymath} (4.32)
 

Contours of constant probability, in terms of the new variables, are specified by the equation

 
\begin{displaymath}{\rm constant} = H_{11}[\cos^2\theta\delta y^2 -2\cos\theta\sin\theta\delta y\delta z + \sin^2\theta\delta z^2]\end{displaymath}
 
\begin{displaymath}~~~~~~~~~~ + H_{22}[\sin^2\theta\delta y^2 +2\cos\theta\sin\theta\delta y\delta z + \cos^2\theta \delta z^2] \end{displaymath}
 
 \begin{displaymath}~~~~~~~~~~ + (H_{12} + H_{21})[\cos\theta\sin\theta\deltay^2......\theta\delta y\delta z -\cos\theta\sin\theta\delta z^2] \ \ .\end{displaymath} (4.33)
 

This has the form of an ellipse, and an appropriate choice of $\theta$ can give the standard form for an ellipse,

\begin{displaymath}{{\delta y^2}\over{A^2}} + {{\delta z^2}\over{B^2}} = 1 \ , \end{displaymath} (4.34)
 

which does not involve cross-terms in y and z. The required rotation can be found by setting the coefficient of the cross product in (4.33) to zero:

\begin{displaymath}-2H_{11}\cos\theta\sin\theta+2H_{22}\cos\theta\sin\theta +(H_{12}+H_{21})(\cos^2\theta-\sin^2\theta) = 0 \end{displaymath} (4.35)
 
\begin{displaymath}\sin 2\theta (H_{22}-H_{11}) + \cos 2\theta (H_{12}+H_{21}) = 0 \end{displaymath} (4.36)
 
\begin{displaymath}\tan 2\theta = {{H_{12}+H_{21}}\over{H_{22}-H_{11}}} \ . \end{displaymath} (4.37)
 

In the transformed variables,

\begin{displaymath}{\cal L} = {\cal L}_{max} \exp\{-{{H_{11}^\prime \deltay^2}\over{2}} - {{H_{22}^\prime \delta z^2}\over{2}}\} \end{displaymath} (4.38)
 

and

\begin{displaymath}\sigma_y^2 = {{1}\over{H_{11}^\prime}} , \end{displaymath} (4.39)
 
\begin{displaymath}\sigma_z^2 = {{1}\over{H_{22}^\prime}} . \end{displaymath} (4.40)
 

The variables y and z are uncorrelated, so the appropriate standard deviation in the initial parameters can be estimated from

\begin{displaymath}\delta a_1 = \delta y \cos\theta - \delta z \sin\theta \end{displaymath} (4.41)
 
\begin{displaymath}\overline{\delta a_1^2} = \cos^2\theta \overline{\delta y^2} +\sin^2\theta \overline{\delta z^2} \end{displaymath} (4.42)
 
\begin{displaymath}\sigma^2_{a_1} = \cos^2\theta \sigma_y^2 + \sin^2\theta\sigma_z^2 . \end{displaymath} (4.43)
 

The procedure can be generalized as follows. Write $\beta$ = ( ${\bf a-a}^*$), so that

 \begin{displaymath}{\cal L}({\bf a}) = {\cal L}_{max} \exp\{-{{1}\over{2}} {\bf\betaH\beta}^t\}\end{displaymath} (4.44)
 

Transform according to the rotation matrix U, to new parameters $\gamma$:

\begin{displaymath}{\bf\gamma} = {\bf U}{\bf\beta} \end{displaymath} (4.45)
 

Because the inverse of a rotation matrix is its transpose, (4.44) can be written

\begin{displaymath}{\cal L}({\bf a}) = {\cal L}_{max} \exp\{-{{1}\over{2}}\beta {\bf U}^{-1}{\bf U H U}^{-1}{\bf U}\beta^t\} \end{displaymath} (4.46)
 
\begin{displaymath}~~~~ = {\cal L}_{max} \exp\{-{{1}\over{2}} \gamma {\bfh}\gamma\} \end{displaymath} (4.47)
 

where

\begin{displaymath}{\bf h} = {\bf U H U}^t . \end{displaymath} (4.48)
 

For appropriate choice of U, h can be made a diagonal matrix. In that case, ${\cal L}$ is the product of independent Gaussian distributions with variances specified by the components h-1jj. The variances in the initial variables $\beta$ can then be found in terms of the (uncorrelated) variances in the variables $\gamma$:

 
 \begin{displaymath}\gamma = {\bf U}\beta \end{displaymath} (4.49)
 
\begin{displaymath}{\bf U}^t\gamma = \beta = \gamma{\bf U} \end{displaymath} (4.50)
 
 \begin{displaymath}\beta^t\beta = {\bf U}^t\gamma^t\gamma{\bf U} \end{displaymath} (4.51)
 
\begin{displaymath}\langle\beta^t\beta\rangle = {\bfU}^t\langle\gamma^t\gamma\rangle U\end{displaymath} 
 
 \begin{displaymath}~~~~~ = {\bf U}^t{\bf h}^{-1}{\bf U}\end{displaymath} 
 
 \begin{displaymath}~~~~~ = ({\bf U}^t{\bf h U})^{-1} . \end{displaymath} (4.52)
 

But, from (4.48),

\begin{displaymath}{\bf H} = {\bf U}^t{\bf h U} \end{displaymath} (4.53)
 

so

\begin{displaymath}\langle\beta^t\beta\rangle = {\bf H}^{-1} . \end{displaymath} (4.54)
 

This again shows the connection between H, the information matrix, and the variances and covariances of the observations.

One approach to determining the errors in the multivariate case with correlated errors is to determine the matrix H by finding the second derivatives of W, then invert H to find the error matrix. The elements of the error matrix then give the variances and covariances for the parameters.


next up previous contents
Next: 4.2.6 Estimating exponential-decay time Up: 4.2 Applications Previous: 4.2.4 An example: Fitting 

NCAR Advanced Study Program

http://www.asp.ucar.edu