Denken
Menu
  • About Me
  • Deep Learning with Pytorch
  • Generative AI: Tutorial Series
  • Python Tutorials
  • Contact Me
Menu

Deep Learning with Pytorch – Custom Weight Initialization – 1.5

Posted on May 26, 2019May 26, 2019 by Aritra Sen

From the below images of Sigmoid & Tanh activation functions we can see that for the higher values(lower values) of Z (present in x axis where z = wx + b) derivative values are almost equal to zero or close to zero.

Tanh function and it’s derivatives

So for the higher values of Z , we will have vanishing gradients problem where network will either learn slowly or not learn at all. Value of Z also depends on the values of weights. So we can understand that weight initialization plays an important part in the performance of neural networks.Below is an another example of how initialization of weights to random normal distribution(with mean zero and standard deviation with 1) can also cause problem.

Source : Deeplizard

For the above example , if we assume that output from the previous layer(with 250 number of nodes) are all one and weights are initialized with a random normal distribution with mean zero and standard deviation equal to one. Then the value of Z would be just the sum of the weights(as X =1 in previous layer). What would the mean and variance of Z ? Knowing the variance or standard deviation of Z , you would be able to guess the range of values of Z.

Mean of z : Z as sum of normally distributed number with mean of zero will have a mean of zero.
What About the Variance: Variance or standard deviation of Z would be greater than 1 because each of the numbers(weights) have variance equal to one. So the variance of Z would the sum of the the variances of each numbers that would equal to (number of weights*1) i.e. 250. Which implies that standard deviation would around 15 as shown above. This will lead to very higher(or high negative) values of Z.

So we can say that by controlling or reducing the variance of Z we can control the range of values of Z.

Below are the few weight initialization algorithms we have to control the weights variance –
Normal Initialization:
As we saw above in Normal initialization variance grows with the number of inputs.
Lecun Initialization:
In Lecun initialization we make the variance of weights as 1/n.
Where n is the number of input units in the weight tensor. This initialization is the default initialization in Pytorch , that means we don’t need to any code changes to implement this. Almost works well with all activation functions.
Xavier(Glorot) Initialization:
Works better with sigmoid activations. In Xavier initialization we make the variance of weights as shown below –

Kaiming (He) Initialization:
Works better for layers with ReLU or LeakyReLU activations. In He initialization we make the variance of the weights as shown below –

Now let’s see how we can implement this weight initialization in Pytorch.
Press up/down/right/left arrow to browse the below notebook.

Reference:
https://www.deeplearningwizard.com/deep_learning/boosting_models_pytorch/weight_initialization_activation_functions/#zero-initialization-set-all-weights-to-0
Do like , share and comment if you have any questions.

Category: Machine Learning, Python

Post navigation

← Deep Learning with Pytorch -Sequence Modeling – Time Series Prediction – RNNs – 3.1
Deep Learning with Pytorch -Sequence Modeling – LSTMs – 3.2 →

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RSS Feeds

Enter your email address:

Delivered by FeedBurner

Pages

  • About Me
  • Contact Me
  • Deep Learning with Pytorch
  • Generative AI: Tutorial Series
  • Python Tutorials

Tag Cloud

Announcements Anrdoid BERT Bias Celebration Cricket CyanogenMod deep-learning Denken Experience Facebook Features Finetuning GCN GenerativeAI GNN Google HBOOT HBOOT downgrading HTC Wildfire huggingface India Launch Life LLM Lumia 520 MachineLearning mobile My Space nlp Orkut People Python pytorch pytorch-geometric Rooting Sachin Share Social Network tranformers transformers Tutorials Twitter weight-initialization Windows Phone

WP Cumulus Flash tag cloud by Roy Tanck and Luke Morton requires Flash Player 9 or better.

Categories

Random Posts

  • 1.0 – Getting started with Transformers for NLP
  • A reaction after Sachin’s 100th international century
  • Generative AI: LLMs: Feature base finetuning 1.3
  • Deep Learning with Pytorch -CNN from Scratch with Data Augmentation – 2.1
  • Deep Learning with Pytorch

Recent Comments

  • Generative AI: LLMs: Reduce Hallucinations with Retrieval-Augmented-Generation (RAG) 1.8 – Denken on Generative AI: LLMs: Semantic Search and Conversation Retrieval QA using Vector Store and LangChain 1.7
  • vikas on Domain Fuss
  • Kajal on Deep Learning with Pytorch -Text Generation – LSTMs – 3.3
  • Aritra Sen on Python Tutorials – 1.1 – Variables and Data Types
  • Aakash on Python Tutorials – 1.1 – Variables and Data Types

Visitors Count

AmazingCounters.com

Archives

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Copyright

AritraSen’s site© This site has been protected from copyright by copyscape.Copying from this site is stricktly prohibited. Protected by Copyscape Original Content Validator
© 2025 Denken | Powered by Minimalist Blog WordPress Theme