To give an example of deep learning, the number of parameters (in Millions) is so huge that simulated annealing may take longer than just doing a gradient descent from whatever (random) initial state your weights are currently in.
So, in case of deep learning it doesn't make (economic) sense to do simulated annealing.