It comes from the fact that $D$ wants to maximize the loss over all potential examples it sees and not just a single example: for real data points, this is done by minimizing the expectation over all true data points of $\log(D(x))$ (your first term). For false points (which are generated by $G$ from the noise distribution $z$), it wants to maximize the expectation over all of these points of $\log(1-D(G(Z)))$, which is the second term.