Deep Learning – IIT Ropar Week 8 Assignment Answers
Deep Learning – IIT Ropar Week 8 Assignment Answers (Jan-Apr 2026)
1. Why does using sigmoid activation in deep hidden layers slow down training?
- Sigmoid outputs are always zero
- Sigmoid causes exploding gradients
- Sigmoid leads to vanishing gradients
- Sigmoid increases model variance
Answer : c
2. Which property of ReLU helps improve training speed in deep neural networks?
- Outputs are bounded between 0 and 1
- It introduces non-linearity by squaring inputs
- It avoids gradient saturation for positive inputs
- It normalizes the input distribution
Answer : c
3. What is the approximate output of the sigmoid activation for an input value of +3?
- 0.05
- 0.50
- 0.95
- 3.00
Answer : c
4. Which of the following statements about activation functions are correct in this case?
- Sigmoid activation can cause vanishing gradients in deep networks
- ReLU outputs zero for negative inputs
- ReLU always produces outputs between 0 and 1
- ReLU improves gradient flow for positive inputs
Answer : a, b, d
5. Why is ReLU preferred over sigmoid in the hidden layers for this employee attrition model?
- Faster convergence during training
- Reduced risk of vanishing gradients
- Guaranteed higher validation accuracy
- Better scalability to deeper networks
Answer : a, b, d
6. If the input to a ReLU neuron is -3, the output of the neuron is __________.
Answer : 0
7. Why did naïve random initialization lead to unstable training in this deep traffic prediction network?
- The dataset was too small
- Variance of activations increased across layers
- ReLU activation removes non-linearity
- Learning rate was fixed
Answer : b
8. Why is He initialization preferred over Xavier initialization for this network?
- The network uses sigmoid activations
- The network uses ReLU activations
- He initialization reduces model depth
- Xavier initialization increases variance
Answer : b
9. If a hidden layer has 128 input neurons, what is the variance of weights using Xavier initialization?
- 0.0078
- 0.0156
- 0.0312
- 0.0625
Answer : a
10. Which of the following outcomes indicate successful training stabilization after switching to He initialization?
- Stable activation variance across layers
- Smooth decrease in training loss
- Guaranteed zero test error
- Reduced risk of exploding gradients
Answer : a, b, d
11. Why is correct initialization especially critical in deep networks like this traffic prediction model?
- Errors amplify across multiple layers
- Initialization determines gradient magnitude
- Poor initialization slows convergence
- Initialization eliminates the need for normalization
Answer : a, b, c
12. What is the primary motivation for using greedy layer-wise pretraining in this deep speech recognition model?
- To increase dataset size
- To simplify the network architecture
- To improve training of earlier layers
- To eliminate the need for fine-tuning
Answer : c
13. During greedy layer-wise pretraining, what happens to the parameters of previously trained layers?
- They are randomly reinitialized
- They are frozen
- They are removed from the network
- They are updated with a higher learning rate
Answer : b
14. Why does greedy layer-wise pretraining help alleviate vanishing gradient issues?
- It removes nonlinear activation functions
- It ensures each layer learns useful representations independently
- It increases the learning rate automatically
- It reduces the depth of the network
Answer : b
15. Which of the following are valid benefits of greedy layer-wise pretraining in this case?
- Faster convergence during supervised fine-tuning
- Improved gradient flow in early layers
- Guaranteed perfect recognition accuracy
- Better initialization for deep networks
Answer : a, b, d