Lagrange Multipliers Intuition ELI5

November 13, 2020

TLDR

Lagrange multipliers is a constrained optimization method
Lagrange multipliers restricts solutions to points that are feasible and stationary
Key intuitive points: $\nabla{c}$ of our constraint function is normal to the surface. If $\nabla{f}$, the function we want to optimize, is also normal to the surface, it is parallel to $\nabla{c}$
If a point satisfies this parallelism, it is feasible and stationary.
The lagrangian, that combines the constraints and main function into one function, lets you use calculus to find these stationary points where $\nabla{f} = \lambda\nabla{c}$, since they are parallel they must be a multiple $\lambda$ of one another.

The method of Lagrange multipliers is a constrained optimization technique. I first came across it in my Calculus 3 class, and probably knew less about it after “learning” it.

Motivation

Imagine we want to optimize a function $$min f(x)$$ such that a constraint function $$c(x) = 0$$

The canonical and good choice of an example to show is a constraint of a circle.

TODO: figure.

We note two key points about our functions $f$ and $c$. Firstly, the gradient w.r.t $c$ anywhere inside the feasible region is normal to the surface. The reason is that along the direction of the feasible region that still complies with the constraint, the gradient is always 0. What would the direction be where the gradient is maximal? Examine and convince yourself this is true.

Convince me

$c(x) = 0$ everywhere inside the feasible region. You can traverse along the path of your feasible region and stay feasible, but what is the gradient when you move along this path? It's $0$, to stay feasible. The point of maximum change is definitely not along this path. In fact, how do you get as far away from it as possible? You MUST be normal to the feasible surface.

With that in mind, secondly now consider what $\nabla{f}$ is along the constraint surface. It varies and depends on $f$. But sometimes, it’s also normal to the surface, and therefore parallel or anti-parallel to $\nabla{g}$.

What does it mean when it is normal to the surface, and when they are parallel? Well, if we can ONLY move along the feasible region to decrease the function we want to minimize, but the direction of maximal change is normal to the feasible region, we are stuck! This means we have a stationary point.

Convince me

If we are at $x^*$, and we want to decrease our function $f(x^*)$, and the only way we can move is away, as much as possible, from our feasible region, we are stuck.

And since we are maximizing/minimizing, what if some of these points were optima - possibly global optima?

We can find them and test them out, using lagrange multipliers.

Shoptalk

We noted that at some feasible points, $\nabla{f}$ and $\nabla{c}$ are parallel, therefore $\nabla{f} = \lambda\nabla{c}$ for some constant $\lambda$.

What if we created a function that let’s us find where this happens? That’s where our lagrangian function comes into play.

$$L(x, \lambda) = f(x) + \lambda c(x)$$

We have moved our constraints into this one function. And when we derive $L$ partially w.r.t $x$ and $\lambda$, we get stationary points. We can then plug in these points into our original function $f$. Finding lambda itself doesn’t actually matter.

We take the gradient of our lagrangian function w.r.t each variable, get a system of equations to solve, and come up with variables to plug in.

More to come

What do we do if our constraint is an inequality? KKT
Does lagrange multipliers always work?
More detailed version with all math.