Statistical Learning
My reference notebook for the concept of statistical learning.
- Motivation
- Goals
- Classification
- Bias Variance Tradeoff
- Helper Functions
- Plot for the Blog Post
- Sources
- References
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
sns.set()
Motivation
The idea is that we have observed some data $D_n=\{x_i, y_i\}_{i=1}^n$, where $x_i$ contains $p$ variables observed for observation $i$. And our model for the data is
$y_i=f(x_i) +\varepsilon_i$
where $f:R^p \rightarrow R$ is some unkown function and $\varepsilon_i \sim N(0, \sigma^2)$
The goal is to estiamte $f$ with some function $\hat{f}$ such that $y_i \approx \hat{f}(x_i)$ is "close" im some sense. More precisly we want
$E(f(X) - Y)^2$
to be small. There are two components which need to be discussed. First the expected value. This basically means that we want the error to be small on average and makes intuitive sense. The second one is the distance meaure in quadratic sense. Why not use for example another norm e.e.g the absolute value ? The reasaon is that it is much nicer to work with the quadrtic function then the absolute value (differentiability for example). The next question is to ask is if there is a function $m^*$ which can minimizies the above quantitiy.
$E(m^*(X) - Y)^2 = min_f E(f(X) - Y ^2)$
one can show that the function $m^*(x) = E(Y| X=x)$ solves the mininmizaion problem so our initial problem becomes
$y_i=m^*(x_i) + \varepsilon_i = E(Y| X=x_i) +\varepsilon_i$
This means $E(y_i) = m^*(x_i)$ and $Var(y_i) = Var(m^*(x_i)) + \sigma^2$
Until now $m^*$ is a theoretical construct since we don't observe the whole distribution but only a finite sample $D_n$. Hence we need an estimate some estimate $\hat{m}^*$. Thus now we are interested in the error
$E(\hat{m}^*(x) - Y)$
x = np.linspace(-1, 1, 200)
def f(x):
y = x.copy()
mask1 = (x >= -1) & (x < -.5)
mask2 = (x >= -.5) & (x < 0)
mask3 = (x >= 0) & (x < .5)
mask4 = (x >= .5) & (x < .1)
y[mask1] = ((y[mask1] + 2) ** 2) / 2
y[mask2] = y[mask2] / 2 + .875
y[mask3] = - 5 * (y[mask3] - .2) ** 2 + 1.075
y[mask4] = y[mask4] + .125
return y
mean = f(x)
variance = .2 - .1 * np.cos(2 * np.pi * x)
y = np.random.normal(mean, variance, 200)
plt.scatter(x, y, alpha = .5, label = 'data')
plt.plot(x, f(x), color = 'r', label = r'$m(x)$')
plt.legend();
Goals
There are two reasons to perform statistical learning. The first is prediction we want to use our estiamted function $\hat{f}$ to make accurate predictions for new, possibly unseen data. The second is inference where we are more concerned how the explanatoy variables influence $y$. In the inference task we may want to estimate a simple function which can be interprted easily.
- Chapter 2 of (James et al., 2013)
- Chapter 1 and 2 of (Györfi et al., 2006)
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). Springer.
- Györfi, L., Kohler, M., Krzyzak, A., & Walk, H. (2006). A distribution-free theory of nonparametric regression. Springer Science & Business Media.