The usual argument is that, if you perturb $A$ by a small $X$ and get $(A+X)^{-1}=A^{-1}+Y+O(\|X\|^2)$, where $Y$ is the first-order (i.e. linear) change in $A^{-1}$, then by comparing the first-order terms on both sides of $\left(A^{-1}+Y+O\left(\|X\|^2\right)\right)(A+X)=I$, you get $YA+A^{-1}X=0$. Hence $Y=-A^{-1}XA^{-1}$ and $$(A+X)^{-1}=A^{-1}+Y+O\left(\|X\|^2\right)\approx A^{-1}+Y=A^{-1}-A^{-1}XA^{-1}. $$
**Edit.** The above is a rigorous argument provided that $A\mapsto A^{-1}$ is differentiable in the first place, but this is indeed the case because $A^{-1}=\frac1{\det(A)}\operatorname{adj}(A)$ is a rational function in the entries of $A$.