Stephen Huan

Miscellaneous mathematical observations

  ·  # math
  1. Gradient is orthogonal to contours
  2. Positive definite implies Cauchy–Schwarz
  3. 証明は絶対
    1. References

Gradient is orthogonal to contours

Geometric intuition for the gradient.

Definition. For a scalar function f:RnR f: \R^n \to \R we say a contour or level set of f f is a parameterized curve r(t):RRn \bm{ r}(t): \R \to \R^n such that for every point on the curve, the function value is the same, that is, there is a fixed constant cR c \in \R such that f(r(t))=c\begin{aligned} f(\bm{ r}(t)) = c \end{aligned}

Claim. The gradient f \nabla f is orthogonal to the contour at every point.

Proof. Take the derivative of both sides of (1) with respect to t t .

dfdt=dfdr1dr1dt++dfdrndrndtf(r(t))=cdfdt=0\begin{aligned} \hphantom{ \frac{df}{dt} = \frac{df}{d r_1} \frac{d r_1}{dt} + \dotsb + \frac{df}{d r_n} \frac{d r_n}{dt} } \mathllap{f(\bm{ r}(t))} &= c \\ \frac{df}{dt} &= 0 \end{aligned}

Expanding using the multivariate chain rule,

dfdt=dfdr1dr1dt++dfdrndrndt=0(dfdr1dfdrn)(dr1dtdrndt)=0f(t),r(t)=0\begin{aligned} \frac{df}{dt} = \frac{df}{d r_1} \frac{d r_1}{dt} + \dotsb + \frac{df}{d r_n} \frac{d r_n}{dt} &= 0 \\ \begin{pmatrix} \frac{df}{d r_1} & \cdots & \frac{df}{d r_n} \end{pmatrix} \begin{pmatrix} \frac{d r_1}{dt} \\ \vdots \\ \frac{d r_n}{dt} \end{pmatrix} &= 0 \\ \providecommand{\innerhelper}{ \@ifstar{\innerstar}{\innernostar} } \providecommand{\innernostar}[2]{\langle #1, #2 \rangle} \providecommand{\innerstar}[2]{\left \langle #1, #2 \right \rangle} \innerhelper{\nabla f(t)}{\bm{ r}'(t)} &= 0 \end{aligned}

which is what we wanted to show. \square

Thus the gradient is orthogonal to every tangent of the contour, so moving along the contour moves orthogonally to the gradient. This should make intuitive sense since the change in the function value after moving in the direction Δx \Delta \bm{ x} is approximately (to first order Taylor series approximation)

f(x+Δx)f(x)+f(t)Δx\begin{aligned} f(\bm{ x} + \Delta \bm{ x}) \approx f(\bm{ x}) + \nabla f(t)^\top \Delta \bm{ x} \end{aligned}

Therefore if we want the function value to not change (which is the definition of a contour), we want the direction of movement to be orthogonal to the gradient, which is precisely what we have shown happens.

Positive definite implies Cauchy–Schwarz

A quick derivation of the Cauchy–Schwarz inequality.

Theorem. (Cauchy–Schwarz inequality). Every pair of vectors u,v u, v satisfies
u,uv,vu,v2\begin{aligned} \providecommand{\innerhelper}{ \@ifstar{\innerstar}{\innernostar} } \providecommand{\innernostar}[2]{\langle #1, #2 \rangle} \providecommand{\innerstar}[2]{\left \langle #1, #2 \right \rangle} \innerhelper{u}{u} \providecommand{\innerhelper}{ \@ifstar{\innerstar}{\innernostar} } \providecommand{\innernostar}[2]{\langle #1, #2 \rangle} \providecommand{\innerstar}[2]{\left \langle #1, #2 \right \rangle} \innerhelper{v}{v} &\geq \providecommand{\innerhelper}{ \@ifstar{\innerstar}{\innernostar} } \providecommand{\innernostar}[2]{\langle #1, #2 \rangle} \providecommand{\innerstar}[2]{\left \langle #1, #2 \right \rangle} \innerhelper{u}{v}^2 \end{aligned}
Proof. For vectors u,vRn u, v \in \R^n , let V(uv)Rn×2 V \coloneqq \begin{pmatrix} u & v \end{pmatrix} \in \R^{n \times 2} be the matrix with u u and v v as columns.
ΘVV=(u,uu,vv,uv,v)det(Θ)=u,uv,vu,v20    u,uv,vu,v2\begin{aligned} \Theta \coloneqq V^{\top} V = \begin{pmatrix} \providecommand{\innerhelper}{ \@ifstar{\innerstar}{\innernostar} } \providecommand{\innernostar}[2]{\langle #1, #2 \rangle} \providecommand{\innerstar}[2]{\left \langle #1, #2 \right \rangle} \innerhelper{u}{u} & \providecommand{\innerhelper}{ \@ifstar{\innerstar}{\innernostar} } \providecommand{\innernostar}[2]{\langle #1, #2 \rangle} \providecommand{\innerstar}[2]{\left \langle #1, #2 \right \rangle} \innerhelper{u}{v} \\ \providecommand{\innerhelper}{ \@ifstar{\innerstar}{\innernostar} } \providecommand{\innernostar}[2]{\langle #1, #2 \rangle} \providecommand{\innerstar}[2]{\left \langle #1, #2 \right \rangle} \innerhelper{v}{u} & \providecommand{\innerhelper}{ \@ifstar{\innerstar}{\innernostar} } \providecommand{\innernostar}[2]{\langle #1, #2 \rangle} \providecommand{\innerstar}[2]{\left \langle #1, #2 \right \rangle} \innerhelper{v}{v} \end{pmatrix} & \\ \det(\Theta) = \providecommand{\innerhelper}{ \@ifstar{\innerstar}{\innernostar} } \providecommand{\innernostar}[2]{\langle #1, #2 \rangle} \providecommand{\innerstar}[2]{\left \langle #1, #2 \right \rangle} \innerhelper{u}{u} \providecommand{\innerhelper}{ \@ifstar{\innerstar}{\innernostar} } \providecommand{\innernostar}[2]{\langle #1, #2 \rangle} \providecommand{\innerstar}[2]{\left \langle #1, #2 \right \rangle} \innerhelper{v}{v} - \providecommand{\innerhelper}{ \@ifstar{\innerstar}{\innernostar} } \providecommand{\innernostar}[2]{\langle #1, #2 \rangle} \providecommand{\innerstar}[2]{\left \langle #1, #2 \right \rangle} \innerhelper{u}{v}^2 &\geq 0 \\ \implies \providecommand{\innerhelper}{ \@ifstar{\innerstar}{\innernostar} } \providecommand{\innernostar}[2]{\langle #1, #2 \rangle} \providecommand{\innerstar}[2]{\left \langle #1, #2 \right \rangle} \innerhelper{u}{u} \providecommand{\innerhelper}{ \@ifstar{\innerstar}{\innernostar} } \providecommand{\innernostar}[2]{\langle #1, #2 \rangle} \providecommand{\innerstar}[2]{\left \langle #1, #2 \right \rangle} \innerhelper{v}{v} &\geq \providecommand{\innerhelper}{ \@ifstar{\innerstar}{\innernostar} } \providecommand{\innernostar}[2]{\langle #1, #2 \rangle} \providecommand{\innerstar}[2]{\left \langle #1, #2 \right \rangle} \innerhelper{u}{v}^2 \end{aligned}
where we use that Θ \Theta is symmetric positive-definite so its determinant is positive. \square

A similar line of reasoning can be used to show that the variance is positive.

Theorem. For any random variable X X , define its variance as
Var[X]E[X2]E[X]2\begin{aligned} \providecommand{\Varhelper}{ \@ifstar{\Varstar}{\Varnostar} } \providecommand{\Varnostar}[1]{\mathbb{V}\text{ar}[#1]} \providecommand{\Varstar}[1]{\mathbb{V}\text{ar}\left[#1\right]} \Varhelper{X} \coloneqq \providecommand{\Ehelper}{ \@ifstar{\Estar}{\Enostar} } \providecommand{\Enostar}[1]{\mathbb{E}[#1]} \providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]} \Ehelper{X^2} - \providecommand{\Ehelper}{ \@ifstar{\Estar}{\Enostar} } \providecommand{\Enostar}[1]{\mathbb{E}[#1]} \providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]} \Ehelper{X}^2 \end{aligned}
Then Var[X]0 \providecommand{\Varhelper}{ \@ifstar{\Varstar}{\Varnostar} } \providecommand{\Varnostar}[1]{\mathbb{V}\text{ar}[#1]} \providecommand{\Varstar}[1]{\mathbb{V}\text{ar}\left[#1\right]} \Varhelper{X} \geq 0 .

The usual approach is to show that Var[X]=E[(XE[X])2] \providecommand{\Varhelper}{ \@ifstar{\Varstar}{\Varnostar} } \providecommand{\Varnostar}[1]{\mathbb{V}\text{ar}[#1]} \providecommand{\Varstar}[1]{\mathbb{V}\text{ar}\left[#1\right]} \Varhelper{X} = \providecommand{\Ehelper}{ \@ifstar{\Estar}{\Enostar} } \providecommand{\Enostar}[1]{\mathbb{E}[#1]} \providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]} \Ehelper{(X - \providecommand{\Ehelper}{ \@ifstar{\Estar}{\Enostar} } \providecommand{\Enostar}[1]{\mathbb{E}[#1]} \providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]} \Ehelper{X})^2} and then argue that the expectation of a nonnegative quantity must be nonnegative. Alternatively, it follows directly from Jensen's inequality on the convex function f(x)=x2 f(x) = x^2 . We will take a different approach.

Lemma. For any random variable X X , let μiE[Xi] \mu_i \coloneqq \providecommand{\Ehelper}{ \@ifstar{\Estar}{\Enostar} } \providecommand{\Enostar}[1]{\mathbb{E}[#1]} \providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]} \Ehelper{X^i} be its i i -th moment. Then the matrices collecting its moments up to order n n Mr(1μ1μ2μrμ1μ2μ3μr+1μ2μ3μ4μr+2μrμr+1μr+2μ2r)\begin{aligned} M_r \coloneqq \begin{pmatrix} 1 & \mu_1 & \mu_2 & \cdots & \mu_r \\ \mu_1 & \mu_2 & \mu_3 & \cdots & \mu_{r + 1} \\ \mu_2 & \mu_3 & \mu_4 & \cdots & \mu_{r + 2} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \mu_r & \mu_{r + 1} & \mu_{r + 2} & \cdots & \mu_{2 r} \\ \end{pmatrix} \end{aligned} are all positive semi-definite for r=1,2,,n/2 r = 1, 2, \dotsc, \providecommand{\floorhelper}{ \@ifstar{\floorstar}{\floornostar} } \providecommand{\floornostar}[1]{\lfloor #1 \rfloor} \providecommand{\floorstar}[1]{\left \lfloor #1 \right \rfloor} \floorhelper{n/2} .
Proof. Take the expectation of the squared polynomial with coefficients a0,,ar a_0, \dotsc, a_r .
E[(a0+a1X+a2X2++arXr)2]0\begin{aligned} \providecommand{\Ehelper}{ \@ifstar{\Estar}{\Enostar} } \providecommand{\Enostar}[1]{\mathbb{E}[#1]} \providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]} \Ehelper{(a_0 + a_1 X + a_2 X^2 + \dotsb + a_r X^r)^2} \geq 0 \end{aligned}
Expanding the square directly, we have
E[0i,jraiajXi+j]=0i,jraiajE[Xi+j]=0i,jraiajμi+j\begin{aligned} \providecommand{\Ehelper}{ \@ifstar{\Estar}{\Enostar} } \providecommand{\Enostar}[1]{\mathbb{E}[#1]} \providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]} \Ehelper*{\sum_{0 \leq i, j \leq r} a_i a_j X^{i + j}} = \sum_{0 \leq i, j \leq r} a_i a_j \providecommand{\Ehelper}{ \@ifstar{\Estar}{\Enostar} } \providecommand{\Enostar}[1]{\mathbb{E}[#1]} \providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]} \Ehelper{X^{i + j}} = \sum_{0 \leq i, j \leq r} a_i a_j \mu_{i + j} \end{aligned}
where we use the linearity of expectation and the definition of moments. But this is precisely the quadratic form aMa \bm{ a}^{\top} M \bm{ a} for a(a0,,ar) \bm{ a} \coloneqq (a_0, \dotsc, a_r) and Mi,jμi+j M_{i, j} \coloneqq \mu_{i + j} , matching the definition in (2). Since aMa0 \bm{ a}^{\top} M \bm{ a} \geq 0 holds for any aRr+1 a \in \R^{r + 1} , M M is positive semi-definite by definition. \square

Now to prove the nonnegativity of variance, we need only to apply the lemma for r=1 r = 1 .

Proof. Take M2=(1μ1μ1μ2)=(1E[X]E[X]E[X2]) M_2 = \left ( \begin{smallmatrix} 1 & \mu_1 \\ \mu_1 & \mu_2 \end{smallmatrix} \right ) = \left ( \begin{smallmatrix} 1 & \providecommand{\Ehelper}{ \@ifstar{\Estar}{\Enostar} } \providecommand{\Enostar}[1]{\mathbb{E}[#1]} \providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]} \Ehelper{X} \\ \providecommand{\Ehelper}{ \@ifstar{\Estar}{\Enostar} } \providecommand{\Enostar}[1]{\mathbb{E}[#1]} \providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]} \Ehelper{X} & \providecommand{\Ehelper}{ \@ifstar{\Estar}{\Enostar} } \providecommand{\Enostar}[1]{\mathbb{E}[#1]} \providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]} \Ehelper{X^2} \end{smallmatrix} \right ). Since M2 M_2 is positive semi-definite its determinant is nonnegative, so we have det(M2)=E[X2]E[X]20 \det(M_2) = \providecommand{\Ehelper}{ \@ifstar{\Estar}{\Enostar} } \providecommand{\Enostar}[1]{\mathbb{E}[#1]} \providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]} \Ehelper{X^2} - \providecommand{\Ehelper}{ \@ifstar{\Estar}{\Enostar} } \providecommand{\Enostar}[1]{\mathbb{E}[#1]} \providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]} \Ehelper{X}^2 \geq 0 , showing Var[X]0 \providecommand{\Varhelper}{ \@ifstar{\Varstar}{\Varnostar} } \providecommand{\Varnostar}[1]{\mathbb{V}\text{ar}[#1]} \providecommand{\Varstar}[1]{\mathbb{V}\text{ar}\left[#1\right]} \Varhelper{X} \geq 0 . \square

証明は絶対

A joke transcribed from page 9 of 『数学女子: 1』 (著者名:安田まさえ) [1].

References

BibTeX

  1. [1] 安田まさえ., 数学女子: 1. Tōkyō: 竹書房, 2010.

さみしいも、たのしい。Page source. Last updated: .