Maybe it feels more natural to include $bab$ since it helps you cover the entire graph, but there is a much simpler reason to include $bab^{-1}$ instead. The reason is that $bab$ wraps around both circles, while $bab^{-1}$ wraps around only one circle, so altogether the generators in $\langle a, b^2, bab^{-1}\rangle$ each wrap around one distinct circle.
Note that $bab^{-1}b^2 = bab$, so both presentations are technically equivalent.