Deep Learning – IIT Ropar Week 12 Assignment Answers

2 months ago

Sanket Kumar

4 minutes

Deep Learning - IIT Ropar

Deep Learning – IIT Ropar Week 12 Assignment Answers (Jan-Apr 2026)

Course Link : Click Here

Deep Learning – IIT Ropar Week 1

1. Why does a basic encoder–decoder model struggle with long input sequences?

It relies on a single vector to represent the entire input
It updates the decoder state too frequently during decoding
It removes uncommon words during preprocessing
It predicts output tokens without using hidden states

Answer : a

2. What is the primary role of the encoder in this translation system?

To generate translated words step by step
To map the input sequence into hidden representations
To compute attention weights during decoding
To produce probability scores for output tokens

Answer : b

3. Why does the decoder generate translations sequentially?

Output probabilities cannot be computed in parallel
The encoder cannot process the multiple words together
Attention mechanisms only work sequentially
Each output word depends on previous output words

Answer : d

4. Which component is mainly responsible for losing early input information?

The word embedding lookup table
The softmax layer used for output prediction
The fixed-length context vector from the encoder
The optimization algorithm used during the training

Answer : c

5. Which architectural change best addresses this limitation?

Introducing an attention mechanism in decoding
Increasing the size of the output vocabulary
Adding more layers to the encoder network
Applying stronger regularization during training

Answer : a

6. What is the main purpose of attention in this summarization system?

To speed up the training by skipping the encoder states
To reduce the length of the input article
To allow decoder to focus on relevant encoder states
To eliminate the need for an encoder network

Answer : c

7. How are attention weights typically computed?

By copying the decoder hidden state directly
By averaging encoder hidden states equally
By randomly sampling encoder representations
By normalizing alignment scores using softmax

Answer : d

8. How is the context vector formed after attention weights are computed?

By computing a weighted sum of encoder states
By concatenating all decoder hidden states
By selecting only the last encoder state
By applying normalization to output probabilities

Answer : a

9. What does a high attention weight on a word indicate?

The word appears more frequently in training data
The word strongly influences the current output
The decoder has finished generating the output
The encoder failed to process that particular word

Answer : b

10. Why is softmax used when computing attention weights?

To prevent the overfitting in the decoder network
To reduce the numerical precision during training
To limit the number of encoder states used
To convert scores into a probability distribution

Answer : d

11. Why is self-attention especially useful for long documents?

It automatically reduces the length of the very long documents
It processes text strictly in a left-to-right sequential manner
It allows all words to attend to each other regardless of distance
It removes the need for the word embeddings in the architecture

Answer : c

12. What is the role of Query and Key vectors in self-attention?

They determine how strongly one word should attend to another
They store the final predicted output tokens of the sequence
They define the size of the vocabulary used by the model
They replace positional encodings during sequence processing

Answer : a

13. Why is the dot product scaled by √d_k in attention?

To reduce the number of tokens processed in each attention head
To prevent large dot products from saturating the softmax
To increase the representational depth of the transformer
To simplify gradient computation during backpropagation

Answer : b

14. What does it mean if a word attends strongly to several distant words?

The model produces unstable or random attention patterns
The model is overfitting to individual training examples
The model has entirely ignored all the positional information
The model captures multiple meaningful contextual relationships

Answer : d

15. Why are positional encodings required in transformer models?

To remove the dependence on embedding layers for tokens
To provide information about the order of tokens
To replace attention weights during the decoding stage
To control the learning rate used during model training

Answer : b

For other NPTEL Assignment Answer : Click Here