Deep Learning – IIT Ropar Week 8 Assignment Answers

Deep Learning - IIT Ropar

Deep Learning – IIT Ropar Week 8 Assignment Answers (Jan-Apr 2026)


1. Why does using sigmoid activation in deep hidden layers slow down training?

  • Sigmoid outputs are always zero
  • Sigmoid causes exploding gradients
  • Sigmoid leads to vanishing gradients
  • Sigmoid increases model variance
Answer : c

2. Which property of ReLU helps improve training speed in deep neural networks?

  • Outputs are bounded between 0 and 1
  • It introduces non-linearity by squaring inputs
  • It avoids gradient saturation for positive inputs
  • It normalizes the input distribution
Answer : c

3. What is the approximate output of the sigmoid activation for an input value of +3?

  • 0.05
  • 0.50
  • 0.95
  • 3.00
Answer : c

4. Which of the following statements about activation functions are correct in this case?

  • Sigmoid activation can cause vanishing gradients in deep networks
  • ReLU outputs zero for negative inputs
  • ReLU always produces outputs between 0 and 1
  • ReLU improves gradient flow for positive inputs
Answer : a, b, d

5. Why is ReLU preferred over sigmoid in the hidden layers for this employee attrition model?

  • Faster convergence during training
  • Reduced risk of vanishing gradients
  • Guaranteed higher validation accuracy
  • Better scalability to deeper networks
Answer : a, b, d

6. If the input to a ReLU neuron is -3, the output of the neuron is __________.

Answer : 0

7. Why did naïve random initialization lead to unstable training in this deep traffic prediction network?

  • The dataset was too small
  • Variance of activations increased across layers
  • ReLU activation removes non-linearity
  • Learning rate was fixed
Answer : b

8. Why is He initialization preferred over Xavier initialization for this network?

  • The network uses sigmoid activations
  • The network uses ReLU activations
  • He initialization reduces model depth
  • Xavier initialization increases variance
Answer : b

9. If a hidden layer has 128 input neurons, what is the variance of weights using Xavier initialization?

  • 0.0078
  • 0.0156
  • 0.0312
  • 0.0625
Answer : a

10. Which of the following outcomes indicate successful training stabilization after switching to He initialization?

  • Stable activation variance across layers
  • Smooth decrease in training loss
  • Guaranteed zero test error
  • Reduced risk of exploding gradients
Answer : a, b, d

11. Why is correct initialization especially critical in deep networks like this traffic prediction model?

  • Errors amplify across multiple layers
  • Initialization determines gradient magnitude
  • Poor initialization slows convergence
  • Initialization eliminates the need for normalization
Answer : a, b, c

12. What is the primary motivation for using greedy layer-wise pretraining in this deep speech recognition model?

  • To increase dataset size
  • To simplify the network architecture
  • To improve training of earlier layers
  • To eliminate the need for fine-tuning
Answer : c

13. During greedy layer-wise pretraining, what happens to the parameters of previously trained layers?

  • They are randomly reinitialized
  • They are frozen
  • They are removed from the network
  • They are updated with a higher learning rate
Answer : b

14. Why does greedy layer-wise pretraining help alleviate vanishing gradient issues?

  • It removes nonlinear activation functions
  • It ensures each layer learns useful representations independently
  • It increases the learning rate automatically
  • It reduces the depth of the network
Answer : b

15. Which of the following are valid benefits of greedy layer-wise pretraining in this case?

  • Faster convergence during supervised fine-tuning
  • Improved gradient flow in early layers
  • Guaranteed perfect recognition accuracy
  • Better initialization for deep networks
Answer : a, b, d