Deep Learning with Pytorch – Custom Weight Initialization – 1.5

From the below images of Sigmoid & Tanh activation functions we can see that for the higher values(lower values) of Z (present in x axis where z = wx + b) derivative values are almost equal to zero or close to zero. So for the higher values of Z , we will have vanishing gradients … Continue reading Deep Learning with Pytorch – Custom Weight Initialization – 1.5