What does it mean to "marginalize noise"? Reading this article on the dropout method for neural networks. Here (page 21/30, should jump straight to that page) they talk about viewing the dropout method as a way to int...--prophetes.ai

What does it mean to "marginalize noise"? Reading this article on the dropout method for neural networks. Here (page 21/30, should jump straight to that page) they talk about viewing the dropout method as a way to introduce noise, and they talk about "marginalizing this noise". A Google search didn't yield any results that explained what this means.

I think they mean that Dropout is a _stochastic_ regularization technique, which implicitly affects the training of the network. By "marginalizing out the noise", they are instead finding an explicit, _deterministic_ penalty that they can add to the objective function of the network, which is equivalent to dropout (at least in expectation).

They probably use the word marginalize because they are "removing" the stochasticity of the random variables (i.e. dropout units), yet achieving the same effect (i.e. think of how you compute a marginal distribution by integrating out the other random variables). It's a little confusing because nothing else is random (unless you consider the stochastic gradients, or maybe treat dropout as approximate variational inference), so they are not really integrating out to get a probability distribution; they just need to take an expectation of the objective instead, which removes the stochasticity.