Deep Learning – IIT Ropar Week 2 Assignment Answers
Course Link : https://answergpt.in/courses/nptel-deep-learning-iit-assignment-answers-2025-july-october/
1. Consider a single perceptron shown below.w=1,b=β0.5.The perceptron uses a step activation function defined as

Predict the output for input values 0.51 and 0.49.
- 1, 0
- 0, 1
- 1, 1
- 0, 0
Answer : a
2. You are given a Boolean function that is not linearly separable. Which of the following is true regarding its representation using a perceptron-based network?
- It can be represented using a single-layer perceptron if you increase the number of perceptrons.
- It requires at least one hidden layer in the network.
- It cannot be represented by any feedforward neural network.
- It can only be represented by a network with more than 2n perceptrons.
Answer : c
3. As π increases, representing all Boolean functions using a 2-layer perceptron becomes impractical due to:
- Increase in training data size
- Exponential increase in required hidden layer neurons
- Limitation in backpropagation algorithm
- Decrease in classification accuracy
Answer : b
4. You are designing neural networks to represent Boolean functions. Consider the capabilities of single-layer and multi-layer perceptrons.
Which of the following statements are true?
- A single-layer perceptron can represent all linearly separable Boolean functions.
- XOR requires at least one hidden layer to be represented.
- A network with 2n hidden neurons and one output neuron can represent all Boolean functions over n inputs.
- A single-layer perceptron can represent the XOR function if the learning rate is set appropriately.
Answer : a, b, c
5. You are given a neural network with 2 inputs, a hidden layer with 4 perceptrons, and one output neuron. The hidden neurons are designed to fire for specific input patterns like {β1, +1}, etc.
Which of the following are true about such a network?
- It can represent linearly non-separable functions like XOR.
- The network uses hidden neurons to convert a non-linearly separable function into linearly separable subproblems.
- Removing any one hidden neuron will not affect the network’s ability to represent XOR.
- This network must use sigmoid activation in the hidden layer to implement XOR.
Answer : a, c
6. You are designing a spam filter using a perceptron. Some input features (like the presence of the word βFREEβ) are not linearly separable from others. Which architecture is most appropriate for learning from such data?
- Single-layer perceptron with more training data
- Multi-layer perceptron with hidden neurons
- Removing the non-linearly separable features
- Output layer with more neurons
Answer : b
7. You are given an arbitrary Boolean function defined over 4 binary inputs. Which of the following neural network architectures is guaranteed to represent this function?
- One perceptron
- A network with 4 hidden neurons
- A network with 16 hidden neurons and one output perceptron
- A network with 5 output neurons
Answer : c
8. For a single input value x=1.5,w=2,b=β1,compute the output of the sigmoid neuron up to 2 decimal places.
Fill the blank: ______________
Answer : 0.88
9.

- Upwards
- Leftwards
- Downwards
- Rightwards
Answer : b
10. Which of the following statements are true?
I. Logistic function is smooth and continuous
II. Logistic function is differentiable.
- Only Statement I is true
- Only Statement II is true
- Both statements I and II are true
- None of the above
Answer : c
11. Which of the following statements are true about learning algorithms?
I. Learning algorithms always maximize a loss function
II. Learning algorithms learn parameters from data
- Only Statement I is true
- Only Statement II is true
- Both statements I and II are true
- None of the above
Answer : b
12. Consider a neural network with 12 input features, a hidden layer with 8 neurons, and a single output neuron. All layers are fully connected, and biases are included in both the hidden and output layers.
How many gradients must be computed during backpropagation?
- 101
- 110
- 105
- 113
Answer : d
13. You are evaluating a regression model on a dataset of 3 points. The actual target values and predicted outputs from your model are given below.


What is the MSE for this model on the given dataset?
- 1.00
- 0.67
- 0.33
- 2.00
Answer : b
14.

Answer : c
15.

Answer : a
16. You are comparing two models for different function learning tasks:
Model A: A multilayer network of perceptrons
Model B: A multilayer network of sigmoid neurons
Task 1: Learn a Boolean function (like XOR)
Task 2: Learn a continuous function (like sin(x))
Which of the following statements is most appropriate?
- Model A can represent both tasks with high precision
- Model A is better for Task 1, Model B is better for Task 2
- Model B can approximate both Task 1 and Task 2 outputs, but not represent Task 1 exactly
- Both models are equivalent in their representation abilities
Answer : b
17. A neural network is trained to predict customer churn based on multiple features: age, contract duration, and monthly charges. After training, you observe that the weight associated with the monthly charges feature is close to zero, while the others have larger magnitudes.
What is the most reasonable inference?
- Monthly charges had missing values in training data
- Monthly charges were not normalized correctly
- Monthly charges may not have contributed significantly to the modelβs prediction
- The learning rate was too high for that feature
Answer : c
18. You are building a neural network-based fraud detection system. A sigmoid neuron receives three inputs:
x1: transaction amount
x2: number of transactions in last hour
x3: time of transaction
After training, the learned weights are:
w1=3.2,w2=0.05,w3=β0.02
Assume all input features have been scaled to a similar range (for example, between 0 and 1).
Which of the following is the most reasonable conclusion?
- The time of transaction is the most important feature
- number of transactions in last hour is the most important feature
- The transaction amount is a highly influential feature
- The sigmoid neuron is not functioning properly
Answer : c
19. You are optimizing a function f(x)=x2βx+2 using gradient descent. Let the learning rate be Ξ·=0.01, and the value of x at a step t be xt. Which of the following gives the correct value of x at step t+1 after one update using gradient descent?
- xt+1=xtβ0.01(2xtβ1)
- xt+1=xt+0.01(2xt)
- xt+1=xtβ(2xtβ1)
- xt+1=xtβ0.01(xtβ1)
Answer : a
20. Let f(x)=x3β4x+1. You are using gradient descent with learning rate Ξ·=0.1.
What is the correct update rule for x at step t+1, given that xt is the current value?
- xt+1=xtβ0.1β (3x2tβ4)
- xt+1=xtβ0.1β (3x2t+4)
- xt+1=xt+0.1β (3x2tβ4)
- xt+1=xt+0.1β (3x2t+4)
Answer : a
21. In a temperature calibration model, the function f(T,x)=T2+5x+20 models the system deviation, where T is the temperature input and x is a sensor setting. Suppose gradient descent with a learning rate of 1 is used to minimize the deviation. The process starts at (T,x)=(0,0).
What will be the value of T after 10 iterations?
- 50
- -10
- 5
- 0
Answer : d
22.

Answer : b
23.

Answer : b
24. You train a logistic regression model for spam classification with labels 1(spam) and 0 (not spam). After training, the model has learned a weight vector such that
wTx=2.5
Which of the following can be correctly inferred about the modelβs prediction?
- The predicted probability of class 1 is greater than 0.5
- The predicted label is 1
- The predicted label is 0
- The value of wTx irrelevant to prediction
Answer : a, b
25. You are designing a binary classifier using logistic regression. The model has learned the weight vector w=[β3,4] and no bias term is used.
If a new point x=[1,1] is evaluated, what will be the model output and prediction?
- The predicted label is 1
- The predicted label is 0
- The model output cannot be determined without a bias term
- The model output is undefined for input [1, 1]
Answer : a
26.

Answer : c
27.


Based on the curve, what can you infer about the parameters w and b?
- w is close to 0 and b is large
- w is large and b is small
- w is large and b is large
- w is small and b is negative
Answer : c
28.

Which of the following changes would make the curve transition more sharply (closer to a step function)?
- Increase b
- Increase w
- Decrease w
- Set b=0
Answer : b
29. Why is Sum of Squared Errors (SSE) considered better than Sum of Errors (SE) in many learning scenarios?
- SSE ensures that positive and negative errors do not cancel each other out
- SSE magnifies larger errors, making the model more sensitive to outliers
- The derivative of SSE with respect to prediction is simple and continuous
- SSE always leads to better accuracy than SE
- Sum of errors can be zero even when individual predictions are wrong
Answer : a, b, c, e
30. Statement I: Any linearly separable function can be represented using a singlelayer perceptron.
Statement II: A single sigmoid neuron can approximate any Boolean function with zero error.
Which of the above statements is/are correct?
- Only I
- Only II
- Both I and II
- None
Answer : a
31. You are given a multi-layer perceptron with one hidden layer consisting of 8 perceptrons and a single output neuron. Each perceptron in the hidden layer outputs either 0 or 1 based on its input.
Which of the following statements is true about the function capacity of this network?
- The network is capable of implementing 28 Boolean functions
- The network is capable of implementing 264 Boolean functions
- The output neuron receives a continuous-valued input
- Each hidden neuron produces 64 possible outputs
Answer : a