Miscellaneous mathematical observations
2022-09-23 · # math
2022-09-23
Geometric intuition for the gradient.
Definition . For a scalar function
f : R n → R
f: \R^n \to \R
f : R n → R
we say a
contour or
level set of
f
f
f
is a parameterized curve
r ( t ) : R → R n
\bm{ r}(t): \R \to \R^n
r ( t ) : R → R n
such that for every point on the curve, the function value is the same,
that is, there is a fixed constant
c ∈ R
c \in \R
c ∈ R
such that
f ( r ( t ) ) = c \begin{aligned} f(\bm{ r}(t)) = c \end{aligned} f ( r ( t )) = c
Claim . The gradient
∇ f
\nabla f
∇ f
is orthogonal to the contour at every point.
Proof . Take the derivative of both sides of
(1 ) with
respect to
t
t
t .
d f d t = d f d r 1 d r 1 d t + ⋯ + d f d r n d r n d t f ( r ( t ) ) = c d f d t = 0 \begin{aligned} \hphantom{ \frac{df}{dt} = \frac{df}{d
r_1} \frac{d r_1}{dt} + \dotsb + \frac{df}{d r_n} \frac{d
r_n}{dt} } \mathllap{f(\bm{ r}(t))} &= c \\
\frac{df}{dt} &= 0 \end{aligned} d t df = d r 1 df d t d r 1 + ⋯ + d r n df d t d r n f ( r ( t )) d t df = c = 0
Expanding using the multivariate chain rule,
d f d t = d f d r 1 d r 1 d t + ⋯ + d f d r n d r n d t = 0 ( d f d r 1 ⋯ d f d r n ) ( d r 1 d t ⋮ d r n d t ) = 0 ⟨ ∇ f ( t ) , r ′ ( t ) ⟩ = 0 \begin{aligned} \frac{df}{dt} = \frac{df}{d r_1} \frac{d
r_1}{dt} + \dotsb + \frac{df}{d r_n} \frac{d r_n}{dt}
&= 0 \\ \begin{pmatrix} \frac{df}{d r_1} & \cdots
& \frac{df}{d r_n} \end{pmatrix} \begin{pmatrix}
\frac{d r_1}{dt} \\ \vdots \\ \frac{d r_n}{dt}
\end{pmatrix} &= 0 \\ \providecommand{\innerhelper}{
\@ifstar{\innerstar}{\innernostar} }
\providecommand{\innernostar}[2]{\langle #1, #2 \rangle}
\providecommand{\innerstar}[2]{\left \langle #1, #2 \right
\rangle} \innerhelper{\nabla f(t)}{\bm{ r}'(t)} &= 0
\end{aligned} d t df = d r 1 df d t d r 1 + ⋯ + d r n df d t d r n ( d r 1 df ⋯ d r n df )
d t d r 1 ⋮ d t d r n
⟨ ∇ f ( t ) , r ′ ( t )⟩ = 0 = 0 = 0
which is what we wanted to show.
□
\square
□
Thus the gradient is orthogonal to every tangent of the contour, so
moving along the contour moves orthogonally to the gradient. This should
make intuitive sense since the change in the function value after moving
in the direction
Δ x
\Delta \bm{ x}
Δ x
is approximately (to first order Taylor series approximation)
f ( x + Δ x ) ≈ f ( x ) + ∇ f ( t ) ⊤ Δ x \begin{aligned} f(\bm{ x} + \Delta \bm{ x}) \approx f(\bm{
x}) + \nabla f(t)^\top \Delta \bm{ x}
\end{aligned} f ( x + Δ x ) ≈ f ( x ) + ∇ f ( t ) ⊤ Δ x
Therefore if we want the function value to not change (which is the
definition of a contour), we want the direction of movement to be
orthogonal to the gradient, which is precisely what we have shown
happens.
2022-09-23
A quick derivation of the Cauchy–Schwarz inequality.
Theorem . (Cauchy–Schwarz inequality). Every pair of
vectors
u , v
u, v
u , v
satisfies
⟨ u , u ⟩ ⟨ v , v ⟩ ≥ ⟨ u , v ⟩ 2 \begin{aligned} \providecommand{\innerhelper}{
\@ifstar{\innerstar}{\innernostar} }
\providecommand{\innernostar}[2]{\langle #1, #2 \rangle}
\providecommand{\innerstar}[2]{\left \langle #1, #2 \right
\rangle} \innerhelper{u}{u} \providecommand{\innerhelper}{
\@ifstar{\innerstar}{\innernostar} }
\providecommand{\innernostar}[2]{\langle #1, #2 \rangle}
\providecommand{\innerstar}[2]{\left \langle #1, #2 \right
\rangle} \innerhelper{v}{v} &\geq
\providecommand{\innerhelper}{
\@ifstar{\innerstar}{\innernostar} }
\providecommand{\innernostar}[2]{\langle #1, #2 \rangle}
\providecommand{\innerstar}[2]{\left \langle #1, #2 \right
\rangle} \innerhelper{u}{v}^2 \end{aligned} ⟨ u , u ⟩ ⟨ v , v ⟩ ≥ ⟨ u , v ⟩ 2
Proof . For vectors
u , v ∈ R n
u, v \in \R^n
u , v ∈ R n , let
V ≔ ( u v ) ∈ R n × 2
V \coloneqq \begin{pmatrix} u & v \end{pmatrix} \in \R^{n
\times 2}
V : = ( u v ) ∈ R n × 2
be the matrix with
u
u
u
and
v
v
v
as columns.
Θ ≔ V ⊤ V = ( ⟨ u , u ⟩ ⟨ u , v ⟩ ⟨ v , u ⟩ ⟨ v , v ⟩ ) det ( Θ ) = ⟨ u , u ⟩ ⟨ v , v ⟩ − ⟨ u , v ⟩ 2 ≥ 0 ⟹ ⟨ u , u ⟩ ⟨ v , v ⟩ ≥ ⟨ u , v ⟩ 2 \begin{aligned} \Theta \coloneqq V^{\top} V =
\begin{pmatrix} \providecommand{\innerhelper}{
\@ifstar{\innerstar}{\innernostar} }
\providecommand{\innernostar}[2]{\langle #1, #2 \rangle}
\providecommand{\innerstar}[2]{\left \langle #1, #2 \right
\rangle} \innerhelper{u}{u} &
\providecommand{\innerhelper}{
\@ifstar{\innerstar}{\innernostar} }
\providecommand{\innernostar}[2]{\langle #1, #2 \rangle}
\providecommand{\innerstar}[2]{\left \langle #1, #2 \right
\rangle} \innerhelper{u}{v} \\
\providecommand{\innerhelper}{
\@ifstar{\innerstar}{\innernostar} }
\providecommand{\innernostar}[2]{\langle #1, #2 \rangle}
\providecommand{\innerstar}[2]{\left \langle #1, #2 \right
\rangle} \innerhelper{v}{u} &
\providecommand{\innerhelper}{
\@ifstar{\innerstar}{\innernostar} }
\providecommand{\innernostar}[2]{\langle #1, #2 \rangle}
\providecommand{\innerstar}[2]{\left \langle #1, #2 \right
\rangle} \innerhelper{v}{v} \end{pmatrix} & \\
\det(\Theta) = \providecommand{\innerhelper}{
\@ifstar{\innerstar}{\innernostar} }
\providecommand{\innernostar}[2]{\langle #1, #2 \rangle}
\providecommand{\innerstar}[2]{\left \langle #1, #2 \right
\rangle} \innerhelper{u}{u} \providecommand{\innerhelper}{
\@ifstar{\innerstar}{\innernostar} }
\providecommand{\innernostar}[2]{\langle #1, #2 \rangle}
\providecommand{\innerstar}[2]{\left \langle #1, #2 \right
\rangle} \innerhelper{v}{v} -
\providecommand{\innerhelper}{
\@ifstar{\innerstar}{\innernostar} }
\providecommand{\innernostar}[2]{\langle #1, #2 \rangle}
\providecommand{\innerstar}[2]{\left \langle #1, #2 \right
\rangle} \innerhelper{u}{v}^2 &\geq 0 \\ \implies
\providecommand{\innerhelper}{
\@ifstar{\innerstar}{\innernostar} }
\providecommand{\innernostar}[2]{\langle #1, #2 \rangle}
\providecommand{\innerstar}[2]{\left \langle #1, #2 \right
\rangle} \innerhelper{u}{u} \providecommand{\innerhelper}{
\@ifstar{\innerstar}{\innernostar} }
\providecommand{\innernostar}[2]{\langle #1, #2 \rangle}
\providecommand{\innerstar}[2]{\left \langle #1, #2 \right
\rangle} \innerhelper{v}{v} &\geq
\providecommand{\innerhelper}{
\@ifstar{\innerstar}{\innernostar} }
\providecommand{\innernostar}[2]{\langle #1, #2 \rangle}
\providecommand{\innerstar}[2]{\left \langle #1, #2 \right
\rangle} \innerhelper{u}{v}^2 \end{aligned} Θ : = V ⊤ V = ( ⟨ u , u ⟩ ⟨ v , u ⟩ ⟨ u , v ⟩ ⟨ v , v ⟩ ) det ( Θ ) = ⟨ u , u ⟩ ⟨ v , v ⟩ − ⟨ u , v ⟩ 2 ⟹ ⟨ u , u ⟩ ⟨ v , v ⟩ ≥ 0 ≥ ⟨ u , v ⟩ 2
where we use that
Θ
\Theta
Θ
is symmetric positive-definite so its determinant is positive.
□
\square
□
A similar line of reasoning can be used to show that the variance is
positive.
Theorem . For any random variable
X
X
X , define its variance as
V ar [ X ] ≔ E [ X 2 ] − E [ X ] 2 \begin{aligned} \providecommand{\Varhelper}{
\@ifstar{\Varstar}{\Varnostar} }
\providecommand{\Varnostar}[1]{\mathbb{V}\text{ar}[#1]}
\providecommand{\Varstar}[1]{\mathbb{V}\text{ar}\left[#1\right]}
\Varhelper{X} \coloneqq \providecommand{\Ehelper}{
\@ifstar{\Estar}{\Enostar} }
\providecommand{\Enostar}[1]{\mathbb{E}[#1]}
\providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]}
\Ehelper{X^2} - \providecommand{\Ehelper}{
\@ifstar{\Estar}{\Enostar} }
\providecommand{\Enostar}[1]{\mathbb{E}[#1]}
\providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]}
\Ehelper{X}^2 \end{aligned} V ar [ X ] : = E [ X 2 ] − E [ X ] 2
Then
V ar [ X ] ≥ 0
\providecommand{\Varhelper}{ \@ifstar{\Varstar}{\Varnostar} }
\providecommand{\Varnostar}[1]{\mathbb{V}\text{ar}[#1]}
\providecommand{\Varstar}[1]{\mathbb{V}\text{ar}\left[#1\right]}
\Varhelper{X} \geq 0
V ar [ X ] ≥ 0 .
The usual approach is to show that
V ar [ X ] = E [ ( X − E [ X ] ) 2 ]
\providecommand{\Varhelper}{ \@ifstar{\Varstar}{\Varnostar} }
\providecommand{\Varnostar}[1]{\mathbb{V}\text{ar}[#1]}
\providecommand{\Varstar}[1]{\mathbb{V}\text{ar}\left[#1\right]}
\Varhelper{X} = \providecommand{\Ehelper}{
\@ifstar{\Estar}{\Enostar} }
\providecommand{\Enostar}[1]{\mathbb{E}[#1]}
\providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]}
\Ehelper{(X - \providecommand{\Ehelper}{
\@ifstar{\Estar}{\Enostar} }
\providecommand{\Enostar}[1]{\mathbb{E}[#1]}
\providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]}
\Ehelper{X})^2}
V ar [ X ] = E [( X − E [ X ] ) 2 ]
and then argue that the expectation of a nonnegative quantity must be
nonnegative. Alternatively, it follows directly from
Jensen's inequality
on the convex function
f ( x ) = x 2
f(x) = x^2
f ( x ) = x 2 . We will take a different approach.
Lemma . For any random variable
X
X
X , let
μ i ≔ E [ X i ]
\mu_i \coloneqq \providecommand{\Ehelper}{
\@ifstar{\Estar}{\Enostar} }
\providecommand{\Enostar}[1]{\mathbb{E}[#1]}
\providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]}
\Ehelper{X^i}
μ i : = E [ X i ]
be its
i
i
i -th moment. Then the matrices collecting its moments up to order
n
n
n
M r ≔ ( 1 μ 1 μ 2 ⋯ μ r μ 1 μ 2 μ 3 ⋯ μ r + 1 μ 2 μ 3 μ 4 ⋯ μ r + 2 ⋮ ⋮ ⋮ ⋱ ⋮ μ r μ r + 1 μ r + 2 ⋯ μ 2 r ) \begin{aligned} M_r \coloneqq \begin{pmatrix} 1 & \mu_1
& \mu_2 & \cdots & \mu_r \\ \mu_1 & \mu_2
& \mu_3 & \cdots & \mu_{r + 1} \\ \mu_2 &
\mu_3 & \mu_4 & \cdots & \mu_{r + 2} \\ \vdots
& \vdots & \vdots & \ddots & \vdots \\ \mu_r
& \mu_{r + 1} & \mu_{r + 2} & \cdots &
\mu_{2 r} \\ \end{pmatrix} \end{aligned} M r : =
1 μ 1 μ 2 ⋮ μ r μ 1 μ 2 μ 3 ⋮ μ r + 1 μ 2 μ 3 μ 4 ⋮ μ r + 2 ⋯ ⋯ ⋯ ⋱ ⋯ μ r μ r + 1 μ r + 2 ⋮ μ 2 r
are all positive semi-definite for
r = 1 , 2 , … , ⌊ n / 2 ⌋
r = 1, 2, \dotsc, \providecommand{\floorhelper}{
\@ifstar{\floorstar}{\floornostar} }
\providecommand{\floornostar}[1]{\lfloor #1 \rfloor}
\providecommand{\floorstar}[1]{\left \lfloor #1 \right
\rfloor} \floorhelper{n/2}
r = 1 , 2 , … , ⌊ n /2 ⌋ .
Proof . Take the expectation of the squared polynomial with
coefficients
a 0 , … , a r
a_0, \dotsc, a_r
a 0 , … , a r .
E [ ( a 0 + a 1 X + a 2 X 2 + ⋯ + a r X r ) 2 ] ≥ 0 \begin{aligned} \providecommand{\Ehelper}{
\@ifstar{\Estar}{\Enostar} }
\providecommand{\Enostar}[1]{\mathbb{E}[#1]}
\providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]}
\Ehelper{(a_0 + a_1 X + a_2 X^2 + \dotsb + a_r X^r)^2}
\geq 0 \end{aligned} E [( a 0 + a 1 X + a 2 X 2 + ⋯ + a r X r ) 2 ] ≥ 0
Expanding the square directly, we have
E [ ∑ 0 ≤ i , j ≤ r a i a j X i + j ] = ∑ 0 ≤ i , j ≤ r a i a j E [ X i + j ] = ∑ 0 ≤ i , j ≤ r a i a j μ i + j \begin{aligned} \providecommand{\Ehelper}{
\@ifstar{\Estar}{\Enostar} }
\providecommand{\Enostar}[1]{\mathbb{E}[#1]}
\providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]}
\Ehelper*{\sum_{0 \leq i, j \leq r} a_i a_j X^{i + j}} =
\sum_{0 \leq i, j \leq r} a_i a_j
\providecommand{\Ehelper}{ \@ifstar{\Estar}{\Enostar} }
\providecommand{\Enostar}[1]{\mathbb{E}[#1]}
\providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]}
\Ehelper{X^{i + j}} = \sum_{0 \leq i, j \leq r} a_i a_j
\mu_{i + j} \end{aligned} E [ 0 ≤ i , j ≤ r ∑ a i a j X i + j ] = 0 ≤ i , j ≤ r ∑ a i a j E [ X i + j ] = 0 ≤ i , j ≤ r ∑ a i a j μ i + j
where we use the linearity of expectation and the definition of moments.
But this is precisely the quadratic form
a ⊤ M a
\bm{ a}^{\top} M \bm{ a}
a ⊤ M a
for
a ≔ ( a 0 , … , a r )
\bm{ a} \coloneqq (a_0, \dotsc, a_r)
a : = ( a 0 , … , a r )
and
M i , j ≔ μ i + j
M_{i, j} \coloneqq \mu_{i + j}
M i , j : = μ i + j , matching the definition in
(2 ) .
Since
a ⊤ M a ≥ 0
\bm{ a}^{\top} M \bm{ a} \geq 0
a ⊤ M a ≥ 0
holds for any
a ∈ R r + 1
a \in \R^{r + 1}
a ∈ R r + 1 ,
M
M
M
is positive semi-definite by definition.
□
\square
□
Now to prove the nonnegativity of variance, we need only to apply the
lemma for
r = 1
r = 1
r = 1 .
Proof . Take
M 2 = ( 1 μ 1 μ 1 μ 2 ) = ( 1 E [ X ] E [ X ] E [ X 2 ] )
M_2 = \left ( \begin{smallmatrix} 1 & \mu_1 \\ \mu_1 &
\mu_2 \end{smallmatrix} \right ) = \left ( \begin{smallmatrix}
1 & \providecommand{\Ehelper}{ \@ifstar{\Estar}{\Enostar}
} \providecommand{\Enostar}[1]{\mathbb{E}[#1]}
\providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]}
\Ehelper{X} \\ \providecommand{\Ehelper}{
\@ifstar{\Estar}{\Enostar} }
\providecommand{\Enostar}[1]{\mathbb{E}[#1]}
\providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]}
\Ehelper{X} & \providecommand{\Ehelper}{
\@ifstar{\Estar}{\Enostar} }
\providecommand{\Enostar}[1]{\mathbb{E}[#1]}
\providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]}
\Ehelper{X^2} \end{smallmatrix} \right ) M 2 = ( 1 μ 1 μ 1 μ 2 ) = ( 1 E [ X ] E [ X ] E [ X 2 ] ) . Since
M 2
M_2
M 2
is positive semi-definite its determinant is nonnegative, so we have
det ( M 2 ) = E [ X 2 ] − E [ X ] 2 ≥ 0
\det(M_2) = \providecommand{\Ehelper}{
\@ifstar{\Estar}{\Enostar} }
\providecommand{\Enostar}[1]{\mathbb{E}[#1]}
\providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]}
\Ehelper{X^2} - \providecommand{\Ehelper}{
\@ifstar{\Estar}{\Enostar} }
\providecommand{\Enostar}[1]{\mathbb{E}[#1]}
\providecommand{\Estar}[1]{\mathbb{E}\left[#1\right]}
\Ehelper{X}^2 \geq 0
det ( M 2 ) = E [ X 2 ] − E [ X ] 2 ≥ 0 , showing
V ar [ X ] ≥ 0
\providecommand{\Varhelper}{ \@ifstar{\Varstar}{\Varnostar} }
\providecommand{\Varnostar}[1]{\mathbb{V}\text{ar}[#1]}
\providecommand{\Varstar}[1]{\mathbb{V}\text{ar}\left[#1\right]}
\Varhelper{X} \geq 0
V ar [ X ] ≥ 0 .
□
\square
□
2023-06-02
A joke transcribed from page 9 of
『数学女子: 1』
(著者名:安田まさえ )
[1 ] .
BibTeX
[1]
安田まさえ., 数学女子: 1 . Tōkyō: 竹書房, 2010.