Deep Learning – IIT Ropar Week 4 Assignment Answers

Deep Learning - IIT Ropar

Deep Learning – IIT Ropar Week 4 Assignment Answers (Jan-Apr 2026)


1.If the team switches from batch gradient descent to mini-batch gradient descent with a batch size of 4,000, how many parameter updates occur in one epoch?

  • 2
  • 4,000
  • 2,000
  • 8,000
Answer : c

2. Why does mini-batch gradient descent generally converge faster than batch gradient descent in this scenario?

  • It always finds the global minimum
  • It updates weights more frequently
  • It eliminates noisy gradients completely
  • It does not require tuning learning rates
Answer : b

3. The team observes oscillations in loss while using mini-batch gradient descent. Which change would most directly reduce these oscillations?

  • Increasing learning rate
  • Removing mini-batches
  • Adding momentum
  • Increasing batch size to full dataset
Answer :

4. Which optimizer would best handle noisy gradients while also adapting learning rates automatically?

  • Vanilla Gradient Descent
  • Momentum-based Gradient Descent
  • RMSProp
  • Adam
Answer : d

5. If each parameter update takes 80 ms, how long (in seconds) will 3 epochs take with the batch size 4000?

  • 240
  • 480
  • 160
  • 192
Answer : b

6. What is the most likely cause of this behavior?

  • Exploding gradients
  • Vanishing gradients
  • Overfitting
  • High batch variance
Answer : b

7. Which optimizer would most effectively mitigate this issue?

  • Batch Gradient Descent
  • SGD without momentum
  • Adam
  • Fixed learning rate GD
Answer : c

8. Which additional technique would help improve gradient flow in this neural network?

  • Increasing depth further
  • Input normalisation
  • Removing non-linearities
  • Using larger batch sizes
Answer : b

9. If the network performs 250 updates per epoch, how many updates occur after 6 epochs?

Fill in the blank: __________

Answer : 1500

10. Which change would likely worsen the learning problem in this scenario?

  • Reducing learning rate further
  • Adding momentum
  • Switching to Adam
  • Normalizing inputs
Answer : a

11. What is the most likely reason for this behavior?

  • Low momentum
  • High learning rate
  • Large dataset
  • Small batch size
Answer : b

12. Which adjustment would best stabilize training without slowing convergence too much?

  • Remove momentum
  • Reduce learning rate slightly
  • Increase batch size drastically
  • Use batch gradient descent
Answer : b

13. Which optimizer allows computing gradients after a look-ahead step?

  • Adam
  • RMSProp
  • Nesterov Accelerated Gradient
  • Vanilla SGD
Answer : c

14. What happens if the momentum coefficient is set very close to 1?

  • Training stops
  • Training becomes noiseless
  • Model may diverge
  • Convergence is guaranteed
Answer : c

15. Why does mini-batch gradient descent often generalize better than full batch gradient descent?

  • It uses fewer parameters
  • It introduces controlled noise in updates
  • It always converges faster
  • It removes the need for regularisation
Answer : b