Notes on the Elements of Statistical Learning, Ch2

(E of SL) Ch. 1~2 笔记

(E of SL) Ch. 1~2 笔记

Dots

The pair $X \in \mathbf{R}^p, y\in \mathbf{R}^K$ $X \in R^{p}, y \in R^{K}$ ( $k=1$ $k = 1$ if scalar function), SL studies the function $f$ $f$ as estimation of $y$ $y$ , i.e. $\hat{y} = f(x)$ $\overset{y}{^} = f (x)$ .
- and thus you need a loss [ $l = l(f,y)$ ] and risk $R = E_{x,y}(l)$ To evaluate the estimates of functions. By minimizing the risk, under constraints on space of $f, \mathcal{F}$ , we have the "learned" estimates, the problem has the form:

$\min_{f} \mathbf{R}$

$s.t.\;\; f\in\mathcal{F}=\{f|f:X \mapsto Y\}$

Three main classes in SL
- Parametric， $f(x) = f(x|\theta)$ $f (x) = f (x ∣ θ)$ . Since the model has an analytic form or a group parameters, you limit the space of $f$ $f$
  - examples: linear regression, logistic, LDA/QDA
- non-parametric: local/neighborhood, this class is highly dependent on the data, the samples caught in real time.
- Mixture of both.

Statistical decision theory

Concepts: loss, risk (expected prediction error, EPE)

@true fact, EPE notation used in "The Elements in Statistical Learning" is basically a frequentist's risk.

The loss function

the squared loss

$f(x) = \mathbf E(y|x);\;\forall x$ , if $f$ is unconstrained, $f \in \mathbb{C}_0(\mathbb{R}^m)$
if $f \in \mathbb{C}_0$ $f \in C_{0}$ , we have the estimation:
- $\bf{\beta} = \bf{\mathbf{E}}(xx')^{-1}\mathbf{E}(x'y)$ $β = E (x x^{'})^{- 1} E (x^{'} y)$
  - P.F see P.F 1.1
if $f$ $f$ is a neighborhood-typed method:
- the k-nearest-neighbor is very similar to simple functions, where $f=\mathbf{E}(y|x\in \bf{N}(x))$
- trees, at each node you compute the mean of response.

0-1 loss in classification

for bayesian classifier if discrete y (categorical)
where:
- $\displaystyle \bf{L = 11' - I_n}$ , we have:
- $f(x) = e_k$ $f (x) = e_{k}$ , where $k = \text{argmax}_k(P(y_k=1|x))$ $k = argmax_{k} (P (y_{k} = 1 ∣ x))$
  - P.F see P.F 1.3
- K-classification: equivalence of $\mathcal{L}_2$ $L_{2}$ for regression and 0-1 loss for bayesian classifition
  - P.F see P.F 1.4

Remark

if $f$ $f$ additive model:
- It turns out that the optimal estimate for the additive model uses techniques such as k-nearest neighbors to approximate univariate conditional expectations simultaneously for each of the coordinate functions
  - P.F 1.4
if $l$ $l$ = absolute-loss
- $f(x) = \text{median}(y|x)$

miscellanea

Mean square error and residual sum of square:

https://stats.stackexchange.com/questions/73540/mean-squared-error-and-residual-sum-of-squares
MSE :->> (usually frequentist) risk function: estimator to population quantities e.g. $\bar{x} - \mu$
RSS/SSE :->> not a risk function: estimator to real samples, e.g. $\bar{x} - x_i$
https://en.wikipedia.org/wiki/Errors_and_residuals

P.F

Ch1

P.F 1.1

$\displaystyle \mathbf{R} = \int_{y,x}(y-f(x))^2\;d{\bf P}(x,y)$ , pointwise minimization CANNOT BE USED:
$\displaystyle \mathbf{R} = \int_{y,x}(y-\beta'x)^2\;d{\bf P}(x,y)$ convex in $\beta$
$\displaystyle \partial \mathbf{R} /\partial \beta = \int_{y|x}2x(y-x'\beta)\;d{\bf P}(x,y) = 0\Rightarrow \mathbf{E}(xy) = \mathbf{E}(xx')\beta$
$\beta = \mathbf{E}(xx')^{-1}\mathbf{E}(xy)$

注： $\mathbf{E}(xx')$ may not be invertible

P.F 1.3

The Bayesian classifier:
$\displaystyle \mathbf{R} = \int_{y,x}y'Lf(x)\;d{\bf P}(x,y)$ , pointwise minimization, $\displaystyle \forall x$ :
$\displaystyle {\bf R_{x}} = \int_{y|x} y_x'L\cdot fd{\bf P}(y|x)$
$\displaystyle \;\;\; = \int_{y|x} y_x'd{\bf P}(y|x)\cdot{\bf L}\cdot f= \mathbf{E}(y|x)' \cdot({\bf 11' - I_n})\cdot f\cdot$
$\displaystyle \;\;\; =1 - \mathbf{E}\big(y|x\big)'f$ , thus we have $\forall x$ in the support:

$f = \text{arg}\max_k\big(\mathbf{E}(y_k)|x\big)=\text{arg}\max_k\big(\mathbf{P}(y_k=1)|x\big)$

P.F. 1.4

Consider regression model for k-classification using $\mathcal{L}_2$ loss:
$\displaystyle \mathbf{R} = \int_{y,x}||y-f(x)||\; d{\bf P}(x,y)$
In this case, function $f$ maps $\mathcal{X} \subseteq \mathbb{R}^m$ to set of $y \in \{\mathbf e_k\}_{k=1}^{K}$
$f=\mathbf E(y|x)$ , the result is exact

July 29, 2017