Deep Learning – IIT Ropar Week 4 Assignment Answers
Deep Learning – IIT Ropar Week 4 Assignment Answers (Jan-Apr 2026)
1.If the team switches from batch gradient descent to mini-batch gradient descent with a batch size of 4,000, how many parameter updates occur in one epoch?
- 2
- 4,000
- 2,000
- 8,000
Answer : c
2. Why does mini-batch gradient descent generally converge faster than batch gradient descent in this scenario?
- It always finds the global minimum
- It updates weights more frequently
- It eliminates noisy gradients completely
- It does not require tuning learning rates
Answer : b
3. The team observes oscillations in loss while using mini-batch gradient descent. Which change would most directly reduce these oscillations?
- Increasing learning rate
- Removing mini-batches
- Adding momentum
- Increasing batch size to full dataset
Answer :
4. Which optimizer would best handle noisy gradients while also adapting learning rates automatically?
- Vanilla Gradient Descent
- Momentum-based Gradient Descent
- RMSProp
- Adam
Answer : d
5. If each parameter update takes 80 ms, how long (in seconds) will 3 epochs take with the batch size 4000?
- 240
- 480
- 160
- 192
Answer : b
6. What is the most likely cause of this behavior?
- Exploding gradients
- Vanishing gradients
- Overfitting
- High batch variance
Answer : b
7. Which optimizer would most effectively mitigate this issue?
- Batch Gradient Descent
- SGD without momentum
- Adam
- Fixed learning rate GD
Answer : c
8. Which additional technique would help improve gradient flow in this neural network?
- Increasing depth further
- Input normalisation
- Removing non-linearities
- Using larger batch sizes
Answer : b
9. If the network performs 250 updates per epoch, how many updates occur after 6 epochs?
Fill in the blank: __________
Answer : 1500
10. Which change would likely worsen the learning problem in this scenario?
- Reducing learning rate further
- Adding momentum
- Switching to Adam
- Normalizing inputs
Answer : a
11. What is the most likely reason for this behavior?
- Low momentum
- High learning rate
- Large dataset
- Small batch size
Answer : b
12. Which adjustment would best stabilize training without slowing convergence too much?
- Remove momentum
- Reduce learning rate slightly
- Increase batch size drastically
- Use batch gradient descent
Answer : b
13. Which optimizer allows computing gradients after a look-ahead step?
- Adam
- RMSProp
- Nesterov Accelerated Gradient
- Vanilla SGD
Answer : c
14. What happens if the momentum coefficient is set very close to 1?
- Training stops
- Training becomes noiseless
- Model may diverge
- Convergence is guaranteed
Answer : c
15. Why does mini-batch gradient descent often generalize better than full batch gradient descent?
- It uses fewer parameters
- It introduces controlled noise in updates
- It always converges faster
- It removes the need for regularisation
Answer : b