Ridge regression to fight ill-conditioning My lecture notes on ridge regression go as follows: > The eigenvalues of $(\phi^T \phi + \lambda' I)$ are at least $\lambda'$. Therefore ridge regression improves de conditi...--prophetes.ai

Ridge regression to fight ill-conditioning My lecture notes on ridge regression go as follows: > The eigenvalues of $(\phi^T \phi + \lambda' I)$ are at least $\lambda'$. Therefore ridge regression improves de condition number of the Gram matrix So I remind you that ridge regression formula is given by $(\lambda'I + \phi^T \phi)^{-1} \phi^Ty$ I understand the proof of the eigenvalues being at least $\lambda'$ but how does this change the condition number?

Since the matrix $A = \phi^T\phi + \lambda' I$ is symmetric its condition number is given by

$$ \kappa = \frac{|\lambda_\max(A)|}{|\lambda_\min(A)|}, $$

where $\lambda_\max(A)$ and $\lambda_\min(A)$ are the maximum and minimum eigenvalues of $A$ with respect to their moduli, respectively. All eigenvalues of $A$ are of the form $\lambda' + \lambda$, where $\lambda$ belongs to $\sigma(\phi^T\phi) \subset [0,\infty)$ the spectrum of $A$. Therefore,

$$ \kappa = \frac{|\lambda' + \lambda_\max(\phi^T\phi)|}{|\lambda' + \lambda_\min(\phi^T\phi)|} $$

Observe that by adding $\lambda' > 0$ in the denominator we are no longer dividing by a potentially very small number $\lambda_\min(\phi^T\phi)$, which can drastically reduce the condition number.