Lagrange Multipliers Intuition ELI5
November 13, 2020
TLDR
- Lagrange multipliers is a constrained optimization method
- Lagrange multipliers restricts solutions to points that are feasible and stationary
- Key intuitive points: $\nabla{c}$ of our constraint function is normal to the surface. If $\nabla{f}$, the function we want to optimize, is also normal to the surface, it is parallel to $\nabla{c}$
- If a point satisfies this parallelism, it is feasible and stationary.
- The lagrangian, that combines the constraints and main function into one function, lets you use calculus to find these stationary points where $\nabla{f} = \lambda\nabla{c}$, since they are parallel they must be a multiple $\lambda$ of one another.
The method of Lagrange multipliers is a constrained optimization technique. I first came across it in my Calculus 3 class, and probably knew less about it after “learning” it.
Motivation
Imagine we want to optimize a function $$min f(x)$$ such that a constraint function $$c(x) = 0$$
The canonical and good choice of an example to show is a constraint of a circle.
TODO: figure.
We note two key points about our functions $f$ and $c$. Firstly, the gradient w.r.t $c$ anywhere inside the feasible region is normal to the surface. The reason is that along the direction of the feasible region that still complies with the constraint, the gradient is always 0. What would the direction be where the gradient is maximal? Examine and convince yourself this is true.
With that in mind, secondly now consider what $\nabla{f}$ is along the constraint surface. It varies and depends on $f$. But sometimes, it’s also normal to the surface, and therefore parallel or anti-parallel to $\nabla{g}$.
What does it mean when it is normal to the surface, and when they are parallel? Well, if we can ONLY move along the feasible region to decrease the function we want to minimize, but the direction of maximal change is normal to the feasible region, we are stuck! This means we have a stationary point.
We can find them and test them out, using lagrange multipliers.
Shoptalk
We noted that at some feasible points, $\nabla{f}$ and $\nabla{c}$ are parallel, therefore $\nabla{f} = \lambda\nabla{c}$ for some constant $\lambda$.
What if we created a function that let’s us find where this happens? That’s where our lagrangian function comes into play.
$$L(x, \lambda) = f(x) + \lambda c(x)$$
We have moved our constraints into this one function. And when we derive $L$ partially w.r.t $x$ and $\lambda$, we get stationary points. We can then plug in these points into our original function $f$. Finding lambda itself doesn’t actually matter.
We take the gradient of our lagrangian function w.r.t each variable, get a system of equations to solve, and come up with variables to plug in.
More to come
- What do we do if our constraint is an inequality? KKT
- Does lagrange multipliers always work?
- More detailed version with all math.