Notes on the Elements of Statistical Learning, Ch2
(E of SL) Ch. 1~2 笔记
Dots
- The pair ( if scalar function), SL studies the function as estimation of , i.e. .
- and thus you need a loss [ ] and risk To evaluate the estimates of functions. By minimizing the risk, under constraints on space of , we have the "learned" estimates, the problem has the form:
- Three main classes in SL
- Parametric,. Since the model has an analytic form or a group parameters, you limit the space of
- examples: linear regression, logistic, LDA/QDA
- non-parametric: local/neighborhood, this class is highly dependent on the data, the samples caught in real time.
- Mixture of both.
- Parametric,. Since the model has an analytic form or a group parameters, you limit the space of
Statistical decision theory
Concepts: loss, risk (expected prediction error, EPE)
- @true fact, EPE notation used in "The Elements in Statistical Learning" is basically a frequentist's risk.
The loss function
the squared loss
- , if is unconstrained,
- if , we have the estimation:
-
- P.F see P.F 1.1
-
- if is a neighborhood-typed method:
- the k-nearest-neighbor is very similar to simple functions, where
- trees, at each node you compute the mean of response.
0-1 loss in classification
- for bayesian classifier if discrete y (categorical)
- where:
Remark
- if additive model:
- It turns out that the optimal estimate for the additive model uses techniques such as k-nearest neighbors to approximate univariate conditional expectations simultaneously for each of the coordinate functions
- P.F 1.4
- It turns out that the optimal estimate for the additive model uses techniques such as k-nearest neighbors to approximate univariate conditional expectations simultaneously for each of the coordinate functions
- if = absolute-loss
miscellanea
Mean square error and residual sum of square:
- https://stats.stackexchange.com/questions/73540/mean-squared-error-and-residual-sum-of-squares
- MSE :->> (usually frequentist) risk function: estimator to population quantities e.g.
- RSS/SSE :->> not a risk function: estimator to real samples, e.g.
- https://en.wikipedia.org/wiki/Errors_and_residuals
P.F
Ch1
P.F 1.1
, pointwise minimization CANNOT BE USED:
convex in
注: may not be invertible
P.F 1.3
The Bayesian classifier:
, pointwise minimization, :
, thus we have in the support:
P.F. 1.4
Consider regression model for k-classification using loss:
In this case, function maps to set of
, the result is exact