You don't have the right kind of data to construct a uniform measure of difficulty that would normalize the different results on one scale. One needs the relative difficulty of tasks to be calibrated by a large number of cases where the same individual does multiple tasks, with enough cases and enough overlaps to quantify all the tasks in relation to each other.
If you pretend that every class has a test-result distribution that is from a 2-or-fewer parameter family of distributions, such as Gaussians, then you can use the two given pieces of information (and possibly also the number of students in the class, if that is available) to place everyone's results as points on one model distribution, and take those as the normalizations.