Artificial intelligent assistant

Transposing matrix when differentiating it Hi so I am trying to understand the solution of linear regression with matrices (found at the following link) and an confused about how on page 10 he says the derivative of $2Y'XB$ with respect to $B$ is $2X'Y$; I get that you drop the $B$ because it is a linear term, but why do you transpose the matrices $X$ and $Y$? Why wouldn't they just remain the same just like getting 5 after taking the derivative of $5x$? Any help would be appreciated, thanks!

Most likely you are being confused because he wrote every variable as a capital letter instead of the usual convention: matrices are capital, and vectors are lowercase. In your situation, $y$ is a column vector, $X$ is a matrix, $b$ is a column vector, and $e$ is a column vector.

Therefore you have:

$$e'e=y'y-2y'Xb+b'X'Xb$$

Taking the gradient with respect to all of the elements in the column vector $b$ (and if $b$ is a column vector, we usually assume that the gradient of any function $f(b)$ will also be a column vector):

$$\cfrac{d e'e}{db} = 0 - 2X'y + 2X'Xb$$

Note that if we do _not_ take transpose of $y'X$, we will get a row vector and you would not be able to "add" the row vector and column vectors. Just to keep things consistent, we must transpose the vector $y'X$ to get all column vectors: $0$, $-2X'y$, and $2X'Xb$.

xcX3v84RxoQ-4GxG32940ukFUIEgYdPy 4dd356c1e5f368592bc87d87660b5cd0