Suppose your linear model predicted $\hat{y_i} = \overline{y}$ for each sample point $x_i$. Then the ESS would be zero, but the linear model would be a terrible fit! (unless the actual underlying distribution was actually a point mass which is unlikely).
Basically, the goal in simple linear regression is to draw the line through the points that minimizes RSS, or, in other words, maximizes ESS. So your model is "good" when you have a small RSS, or RegSS.