First, you have to make a decision: Do you want to use the "real" alexnet (with the grouping) or what most frameworks use as AlexNet (without grouping).
In case you choose without grouping, you might want to have a look at Table D2 of my masters thesis for a better overview over the layers. Especially the output size / number of filters / stride. Don't take the number of FLOPs too seriously, it is rather a ballpark-estimate.
Then you have to ask you the following:
* How would you implement convolution?
* How would you implement max pooing?
* How would you implement a fully connected layer?
I will not give you the answer directly here, but recommend to have a look at the implementation of a framework of your choice. Or you could search for "pure numpy cnn implementation" or something similar, e.g.
<