Deep Learning – IIT Ropar Week 10 Assignment Answers
Deep Learning – IIT Ropar Week 10 Assignment Answers (Jan-Apr 2026)
1. After applying a 5 × 5 convolution with stride 1 and padding 2 to a 128 × 128 RGB aerial image, what will be the spatial size of the resulting feature maps?
- 124 × 124
- 128 × 128
- 130 × 130
- 64 × 64
Answer : b
2. If the first CNN layer uses 16 different convolution filters, how many feature maps will be produced?
- 3
- 5
- 16
- 128
Answer : c
3. After applying a 2 × 2 max-pooling layer with stride 2, what will be the new spatial size of the feature maps?
- 64 × 64
- 128 × 128
- 62 × 62
- 32 × 32
Answer : a
4. Why are pooling layers particularly useful for the drone vision system?
- They increase model parameters and computational cost
- They reduce spatial size while retaining key visual features
- They remove color information from the input images
- They sharpen image boundaries using high-frequency filters
Answer : b
5. Which CNN component mainly provides robustness when objects appear slightly shifted or rotated in drone images?
- Convolution filters
- Softmax layers
- Fully connected layers
- Pooling layers
Answer : d
6. If masking a small region of an MRI scan causes a large drop in tumor probability, what does this imply?
- The region is irrelevant for classification
- The region have important tumor-related features
- The CNN is unstable during prediction
- The CNN has memorized the training data patterns
Answer : b
7. Which visualization technique is used when image patches are masked to analyze prediction sensitivity?
- Dropout applied during training to reduce overfitting
- Filter visualization to inspect learned weights
- Occlusion mapping used by masking selected regions
- Batch normalization applied during inference
Answer : c
8. Why are first-layer CNN filters easier to interpret visually?
- They have fewer parameters than deeper layers
- They operate directly on pixel patterns
- They do not use nonlinearities during activation
- They have higher resolution feature maps
Answer : b
9. A filter that detects vertical edges will respond most strongly to which image regions?
- Flat areas with edges
- Uniform textures only
- Vertical boundaries
- Random noise patterns
Answer : c
10. If certain neurons activate strongly for tumor images but not for healthy scans, what does this indicate?
- Overfitting to the training data distribution
- Learning of tumor-specific visual features
- Random noise in weights caused by initialization
- Model instability during gradient updates
Answer : b
11. Why does VGG use many stacked 3 × 3 convolution layers instead of one large filter?
- To create deep representations with fewer parameters
- To reduce training data requirements
- To eliminate pooling layers entirely
- To prevent overfitting through architectural constraints
Answer : a
12. What is the main benefit of Inception modules using parallel filters of different sizes?
- Lower memory usage across all CNN layers
- Removal of nonlinearities between layers
- Faster training due to simpler operations
- Capturing features at multiple spatial scales
Answer : d
13. Why are skip connections used in ResNet architectures?
- To reduce image resolution in later layers
- To help gradients flow through deep networks
- To remove convolution layers automatically
- To improve pooling operations in the network
Answer : b
14. If a deep CNN trains poorly but a ResNet version performs well, what is the most likely reason?
- Fewer parameters in ResNet models
- Reduced overfitting due to regularization
- Skip connections which solve most issues
- Larger filters used in ResNet layers
Answer : c
15. To recognize both small distant signs and large nearby signs, which architecture choice is most appropriate?
- Inception-style parallel convolutions
- Global average pooling across the final feature map
- Only 3 × 3 filters in all layers
- No pooling across the entire network
Answer : a