Notes on a ML course (2014 SJTU), 2

2 - Matrices and analysis

The notes taken from the course by Zhihua Zhang, 2014@SJTU

Matrices and matrix norms

Matrices

Norms

A norm is called:

Operator norm

induced pp-norm: p=2p=2 spectral norm is also a Schatten norm

Schatten (p) norms

Schattens are ui and sm

Ap=(i=1min{m,n}σip(A))1/p.{\displaystyle \|A\|_{p}=\left(\sum _{i=1}^{\min\{m,\,n\}}\sigma _{i}^{p}(A)\right)^{1/p}.\,}

Nuclear Norm: p=1

A=iσi(A){\displaystyle \|A\|_{*}=\sum _i \sigma _{i}(A)\,}

Frobenius Norm

AF=(iσi2(A))1/2{\displaystyle \|A\|_{F}=\left(\sum _i \sigma _{i}^2(A)\right)^{1/2}\,}

AF2=i=1mj=1naij2=tr(AA)=i=1σi2(A){\displaystyle \|A\|_{F}^2= {\sum _{i=1}^{m}\sum _{j=1}^{n}|a_{ij}|^{2}}= {tr (A^{\dagger }A)}= {\sum _{i=1}\sigma _{i}^{2}(A)}}

invariant under rotation: for any unitary matrix RR (etc.: rotation, orthonormal basis, U,VU,V in SVD)

ARF2=tr(RTATAR)=tr(RRTATA)=tr(ATA)=AF2{\displaystyle \|AR\|_{\rm {F}}^{2}=tr \left(R^{\rm {T}}A^{\rm {T}}AR\right)=tr \left(RR^{\rm {T}}A^{\rm {T}}A\right)=tr \left(A^{\rm {T}}A\right)=\|A\|_{\rm {F}}^{2}}

Derivatives

Directional derivative & Gâteaux differentiable

Let f:ERf : \bf E \rightarrow R, the directional derivative of a function ff at xx in a direction dEd \in E is the limit:

f(x,d)=limt0f(x+cd)f(x)tf'(x,d) = \lim_{t \searrow 0} \frac{f(x+c\cdot d)-f(x)}{t}

if the limit exits. if ff' is linear in dd, s.t., f=<a,d>f'=\left< a,d \right> (inner-product), Then, we say ff is (Gâteaux) differentiable at x with (Gâteaux) derivative aa.

Subgradient

more see convex analysis

ϕ\phi is said to be a subgradient of f(x)f(x) at x0x_0, if it satisfies <ϕ,xx0>f(x)f(x0),    xE\left<\phi,x-x_0\right> \le f(x) - f(x_0),\;\;\forall x \in E. EE is a convex open set. the set of subgradients is called subdifferential at x0x_0 and is denoted f(x0)\partial f(x_0).