Artificial intelligent assistant

Optimality of the MSE in gaussian linear regression Let's call $\hat{\beta}$ the least squares estimator of $\beta$ in the regression problem $Y = X\beta + \epsilon$ where $\epsilon \sim \mathcal{N}(0, \sigma^2)$. In a statistics course, I get this statement: > The MSE of $\beta$ : $\hat{\beta} = (X^TX)^{-1}X^TY$ is the unique minimum variance unbiased estimator of $\beta$. Where does this come from? It thought about the Crame-Rao lower bound, but it is a tight bound only in the case of a one-parameter exponential family and in our case ($\beta$, $\sigma^2$) are two parameters. It would not even account for the claimed uniqueness. Moreover, it does not use the Gauss-Markov theorem because, after stating the above, we demonstrate that, if we drop the Gaussian hypothesis and only keep moments assumption, then this theorem is valid.

When you considering to minimize the MSE, which is equivalently to find the ordinary least square estimator, then you don't have to assume normality to show that $\hat{\beta}$ is the best unbiased estimator. Gauss-Markov theorem assumes only that the errors are uncorrelated (and homoscedasticity), same is if you find these estimators using Lagrange multipliers and same for purely algebraic approach of minimizing the MSE\squared error. The only approach that requires a specified parametric distribution is the maximum likelihood maximization as you cannot define the likelihood function without assuming a particular density structure.

xcX3v84RxoQ-4GxG32940ukFUIEgYdPy 152ccc6c4067ac45ffe0d626b2004b25