Of course you can use other norms, in fact, if you use first moment, linear regression will be robust w.r.t outliers.
Why use higher than first order then? Because, a lot of the time, the utility function is concave, therefore we need a convex function to penalize dispersion/error.
Why second moment then, why not third order? I think see one reason is that second moment usually gives nice analytical closed form solution.
Last but certainly not least, in case you did not notice, we humans observe and understand the world two-dimensionally, the most potent evidence is that we measure our distance in $l^2$ space.
I can totally imagine in a world where the creatures measure their distance in $l^3$ space study Least Cubic problems instead of Least Square problems, and there is someone posting a question online - "Why do we always measure dispersion using the third moments"?