I think they mean that Dropout is a _stochastic_ regularization technique, which implicitly affects the training of the network. By "marginalizing out the noise", they are instead finding an explicit, _deterministic_ penalty that they can add to the objective function of the network, which is equivalent to dropout (at least in expectation).
They probably use the word marginalize because they are "removing" the stochasticity of the random variables (i.e. dropout units), yet achieving the same effect (i.e. think of how you compute a marginal distribution by integrating out the other random variables). It's a little confusing because nothing else is random (unless you consider the stochastic gradients, or maybe treat dropout as approximate variational inference), so they are not really integrating out to get a probability distribution; they just need to take an expectation of the objective instead, which removes the stochasticity.