⟩ Tell us why do we have max-pooling in classification CNNs?
Again as you would expect this is for a role in Computer Vision. Max-pooling in a CNN allows you to reduce computation since your feature maps are smaller after the pooling. You don’t lose too much semantic information since you’re taking the maximum activation. There’s also a theory that max-pooling contributes a bit to giving CNNs more translation in-variance.