Artificial intelligent assistant

Why don't people do simulated annealing before gradient descent? It seems obvious to me to first widely explore the optimization landscape (this is effectively what simulated annealing does) and get a sense of the problem structure. Only then, after finding which hill to climb, perform gradient descent. Why isn't this done more often?

To give an example of deep learning, the number of parameters (in Millions) is so huge that simulated annealing may take longer than just doing a gradient descent from whatever (random) initial state your weights are currently in.

So, in case of deep learning it doesn't make (economic) sense to do simulated annealing.

xcX3v84RxoQ-4GxG32940ukFUIEgYdPy 9805a8819043eb93cbf65221129e1861