next up previous contents
Next: 5.6 Fitting to minimize ... Up: 5. Least-Squares Methods ... Previous: 5.4 Fitting an arbitrary function ...

5.5 Fitting subject to constraints

In many cases, theoretical arguments or assumptions lead to constraints on fitted functions that should be incorporated into fits. For example, the fitted result might be known to pass through the origin. In a straight-line fit this is accomplished simply by fitting with the function g(x)=bx, where the constant term of section 5.2 is omitted. In this case, (5.11) gives the best-fit slope:
\begin{displaymath}b = {{\overline{xy}}\over{\overline{x^2}}} . \end{displaymath} (5.39)
 

In many other cases, it is natural to incorporate desired constraints into the functions used for the fit, and this is usually the easiest approach when it is possible. Other examples are the use of Lagrange polynomials to force polynomial expansions to particular defined values, or the use of solutions to the governing differential equations to fit fields known to be constrained by those equations.

In cases where the constraint cannot be incorporated so simply, there is a powerful general method for incorporating constraints, the method of Lagrange multipliers. For example, it may be necessary to constrain a fit so that at specified values of the (multidimensional) parameters $\{{\bf X}_I\}$ the function has specified values $\{{\bf Y}_I\}$:

\begin{displaymath}\sum_k a_kg_k({\bf X}_I) = {\bf Y}_I . \end{displaymath} (5.40)
 

Examples where such constraints are applicable include:

The least-squares fitting problem is that of finding the minimum value of the $\chi^2$ function subject to a set of constraining equations, for example like (5.40), that may be expressed in the form 
\begin{displaymath}F_\ell(a_1,a_2,\dots) = 0 \ \ {\rm for}\ \ell=(1,2,\dots,L) .\end{displaymath} (5.41)
 

 

The total derivative of the $\chi^2$ function must be zero at the solution:

\begin{displaymath}d\chi^2 = \sum_{j=1}^J {{\partial\chi^2}\over{\partial a_j}} da_j= 0 . \end{displaymath} (5.42)
 

If the infinitesimal increments daj are all independent,

\begin{displaymath}{{\partial\chi^2}\over{\partial a_j}} = 0 \end{displaymath} (5.43)
 

as before. However, (5.41) requires that not all the variations daj be independent; the L relationships among the parameters reduce the number of independent parameters to (J-L).

The method of Lagrange multipliers involves the introduction of L new parameters $\lambda_\ell$. Multiply the constraining equations (5.41) by these new parameters and add the total derivative of the result to the minimization equation (5.42):

\begin{displaymath}\sum_{j=1}^J \Bigl({{\partial\chi^2}\over{\partial a_j}} +\s......_\ell {{\partial F_\ell}\over{\partial a_j}}\Bigr) da_j = 0 . \end{displaymath} (5.44)
 

This equation is valid for arbitrary $\lambda_\ell$, so the Lagrange multipliers can be selected to satisfy the L equations

\begin{displaymath}{{\partial\chi^2}\over{\partial a_j}} + \sum_\ell \lambda_\el......_\ell}\over{\partial a_j}} = 0 \ \ {\rm for} (J-L)<j\leJ) \ . \end{displaymath} (5.45)
 

The first J of the parameters $\{a_j\}$ can then be considered as independent, so

\begin{displaymath}{{\partial\chi^2}\over{\partial a_j}} + \sum_\ell \lambda_\el...... F_\ell}\over{\partial a_j}} = 0\ \ {\rm for}\ 1\le j\leJ-L. \end{displaymath} (5.46)
 

These J-L equations and the L equations (5.45) are J simultaneous equations to be solved for the J-L independent parameters $\{a_j\}$ and the L Lagrange multipliers $\lambda_\ell$.
 


Example 5.1: Consider a set of measurements $\{\alpha_i\}$$\{\beta_i\}$, and $\{\gamma_i\}$ of the three interior angles of a triangle. If the measurement uncertainties are characterized by $\sigma_\alpha$$\sigma_\beta$, and $\sigma_\gamma$, respectively, the $\chi^2$function to be minimized by a least-squares fit to determine the best values of the angles ( $\alpha^*, \beta^*, \gamma^*$) is
\begin{displaymath}\chi^2(\alpha^*,\beta^*,\gamma^*) = \sum_i\Bigl({{(\alpha_i-......a^2}} + {{(\gamma_i-\gamma^*)}\over{\sigma_\gamma^2}}\Bigl) . \end{displaymath} (5.47)
 

Minimization of this function with respect to $\alpha^*$$\beta^*$, and $\gamma^*$ would give the best-fit values $\alpha^*=\overline{\alpha}$$\beta^*=\overline{\beta}$, and $\gamma^*=\overline{\gamma}$. However, we know that the sum of the angles must be 180$^\circ$, and this fit does not necessarily satisfy this constraint.

To incorporate the constraint, define the constraining function

\begin{displaymath}F(\alpha^*,\beta^*,\gamma^*) = \alpha^*+\beta^*+\gamma^*-180^\circ = 0 . \end{displaymath} (5.48)
 

The equations to be solved are then

\begin{displaymath}{{\partial\chi^2}\over{\partial\phi^*}} + \lambda{{\partialF......sum_i{{2(\phi_i-\phi^*)}\over{\sigma_\alpha^2}} + \lambda = 0 \end{displaymath} (5.49)
 

where $\phi$ is any of the angles ($\alpha$$\beta$, or $\gamma$). We then obtain four equations from which to determine the three best-fit angles and the Lagrange multiplier $\lambda$:

\begin{displaymath}\alpha^* = \overline{\alpha} - {{\lambda}\over{2N}}\sigma_\alpha^2 \end{displaymath} (5.50)
 
\begin{displaymath}\beta^* = \overline{\beta} - {{\lambda}\over{2N}} \sigma_\beta^2\end{displaymath} (5.51)
 
\begin{displaymath}\gamma^* = \overline{\gamma} - {{\lambda}\over{2N}}\sigma_\gamma^2 \end{displaymath} (5.52)
 
\begin{displaymath}\alpha^* + \beta^* + \gamma^* = 180 . \end{displaymath} (5.53)
 

The solution for the Lagrange multiplier is

\begin{displaymath}\lambda ={{2N(\overline{\alpha}+\overline{\beta}+\overline{\......circ)}\over{\sigma_\alpha^2+\sigma_\beta^2+\sigma_\gamma^2}}, \end{displaymath} (5.54)
 

a value that determines how much the fit must be adjusted from the unconstrained result to enforce the constraint. Notice that the result is that each angle is adjusted by an amount proportional to the square of the measurement uncertainty in that angle. If all angles are measured with the same accuracy, the result is that the adjustment required to enforce the constraint is applied to each angle equally. The final result is

\begin{displaymath}\alpha^* = \overline{\alpha} - {{\sigma_\alpha^2(\overline{\......^\circ)}\over{\sigma_\alpha^2+\sigma_\beta^2+\sigma_\gamma^2}}\end{displaymath} (5.55)
 
\begin{displaymath}\beta^* = \overline{\beta} - {{\sigma_\beta^2 (\overline{\alp......^\circ)}\over{\sigma_\alpha^2+\sigma_\beta^2+\sigma_\gamma^2}}\end{displaymath} (5.56)
 
\begin{displaymath}\gamma^* = \overline{\gamma} - {{\sigma_\gamma^2(\overline{\......^\circ)}\over{\sigma_\alpha^2+\sigma_\beta^2+\sigma_\gamma^2}}\end{displaymath} (5.57)
 


next up previous contents
Next: 5.6 Fitting to minimize ... Up: 5. Least-Squares Methods ... Previous: 5.4 Fitting an arbitrary function ... 

The NCAR Advanced Study Program
http://www.asp.ucar.edu