Deep Learning – IIT Ropar Week 2 Assignment Answers

4 days ago

pkm735251@gmail.com

8 minutes

Course Link : https://answergpt.in/courses/nptel-deep-learning-iit-assignment-answers-2025-july-october/

1. Consider a single perceptron shown below.w=1,b=−0.5.The perceptron uses a step activation function defined as

Predict the output for input values 0.51 and 0.49.

1, 0
0, 1
1, 1
0, 0

Answer : a

2. You are given a Boolean function that is not linearly separable. Which of the following is true regarding its representation using a perceptron-based network?

It can be represented using a single-layer perceptron if you increase the number of perceptrons.
It requires at least one hidden layer in the network.
It cannot be represented by any feedforward neural network.
It can only be represented by a network with more than 2ⁿ perceptrons.

Answer : c

3. As 𝑛 increases, representing all Boolean functions using a 2-layer perceptron becomes impractical due to:

Increase in training data size
Exponential increase in required hidden layer neurons
Limitation in backpropagation algorithm
Decrease in classification accuracy

Answer : b

4. You are designing neural networks to represent Boolean functions. Consider the capabilities of single-layer and multi-layer perceptrons.
Which of the following statements are true?

A single-layer perceptron can represent all linearly separable Boolean functions.
XOR requires at least one hidden layer to be represented.
A network with 2n hidden neurons and one output neuron can represent all Boolean functions over n inputs.
A single-layer perceptron can represent the XOR function if the learning rate is set appropriately.

Answer : a, b, c

5. You are given a neural network with 2 inputs, a hidden layer with 4 perceptrons, and one output neuron. The hidden neurons are designed to fire for specific input patterns like {–1, +1}, etc.
Which of the following are true about such a network?

It can represent linearly non-separable functions like XOR.
The network uses hidden neurons to convert a non-linearly separable function into linearly separable subproblems.
Removing any one hidden neuron will not affect the network’s ability to represent XOR.
This network must use sigmoid activation in the hidden layer to implement XOR.

Answer : a, c

6. You are designing a spam filter using a perceptron. Some input features (like the presence of the word “FREE”) are not linearly separable from others. Which architecture is most appropriate for learning from such data?

Single-layer perceptron with more training data
Multi-layer perceptron with hidden neurons
Removing the non-linearly separable features
Output layer with more neurons

Answer : b

7. You are given an arbitrary Boolean function defined over 4 binary inputs. Which of the following neural network architectures is guaranteed to represent this function?

One perceptron
A network with 4 hidden neurons
A network with 16 hidden neurons and one output perceptron
A network with 5 output neurons

Answer : c

8. For a single input value x=1.5,w=2,b=−1,compute the output of the sigmoid neuron up to 2 decimal places.
Fill the blank: ______________

Answer : 0.88

Upwards
Leftwards
Downwards
Rightwards

Answer : b

10. Which of the following statements are true?
I. Logistic function is smooth and continuous
II. Logistic function is differentiable.

Only Statement I is true
Only Statement II is true
Both statements I and II are true
None of the above

Answer : c

11. Which of the following statements are true about learning algorithms?

I. Learning algorithms always maximize a loss function
II. Learning algorithms learn parameters from data

Only Statement I is true
Only Statement II is true
Both statements I and II are true
None of the above

Answer : b

12. Consider a neural network with 12 input features, a hidden layer with 8 neurons, and a single output neuron. All layers are fully connected, and biases are included in both the hidden and output layers.

How many gradients must be computed during backpropagation?

Answer : d

13. You are evaluating a regression model on a dataset of 3 points. The actual target values and predicted outputs from your model are given below.

What is the MSE for this model on the given dataset?

1.00
0.67
0.33
2.00

Answer : b

14.

Answer : c

15.

Answer : a

16. You are comparing two models for different function learning tasks:

Model A: A multilayer network of perceptrons
Model B: A multilayer network of sigmoid neurons

Task 1: Learn a Boolean function (like XOR)
Task 2: Learn a continuous function (like sin(x))

Which of the following statements is most appropriate?

Model A can represent both tasks with high precision
Model A is better for Task 1, Model B is better for Task 2
Model B can approximate both Task 1 and Task 2 outputs, but not represent Task 1 exactly
Both models are equivalent in their representation abilities

Answer : b

17. A neural network is trained to predict customer churn based on multiple features: age, contract duration, and monthly charges. After training, you observe that the weight associated with the monthly charges feature is close to zero, while the others have larger magnitudes.

What is the most reasonable inference?

Monthly charges had missing values in training data
Monthly charges were not normalized correctly
Monthly charges may not have contributed significantly to the model’s prediction
The learning rate was too high for that feature

Answer : c

18. You are building a neural network-based fraud detection system. A sigmoid neuron receives three inputs:

x1: transaction amount
x2: number of transactions in last hour
x3: time of transaction

After training, the learned weights are:

w₁=3.2,w₂=0.05,w₃=−0.02

Assume all input features have been scaled to a similar range (for example, between 0 and 1).
Which of the following is the most reasonable conclusion?

The time of transaction is the most important feature
number of transactions in last hour is the most important feature
The transaction amount is a highly influential feature
The sigmoid neuron is not functioning properly

Answer : c

19. You are optimizing a function f(x)=x²−x+2 using gradient descent. Let the learning rate be η=0.01, and the value of x at a step t be x_t. Which of the following gives the correct value of x at step t+1 after one update using gradient descent?

x_t+1=x_t−0.01(2x_t−1)
x_t+1=x_t+0.01(2x_t)
x_t+1=x_t−(2x_t−1)
x_t+1=x_t−0.01(x_t−1)

Answer : a

20. Let f(x)=x³−4x+1. You are using gradient descent with learning rate η=0.1.
What is the correct update rule for x at step t+1, given that x_t is the current value?

x_t+1=x_t−0.1⋅(3x²_t−4)
x_t+1=x_t−0.1⋅(3x²_t+4)
x_t+1=x_t+0.1⋅(3x²_t−4)
x_t+1=x_t+0.1⋅(3x²_t+4)

Answer : a

21. In a temperature calibration model, the function f(T,x)=T²+5x+20 models the system deviation, where T is the temperature input and x is a sensor setting. Suppose gradient descent with a learning rate of 1 is used to minimize the deviation. The process starts at (T,x)=(0,0).
What will be the value of T after 10 iterations?

Answer : d

22.

Answer : b

23.

Answer : b

24. You train a logistic regression model for spam classification with labels 1(spam) and 0 (not spam). After training, the model has learned a weight vector such that
w^Tx=2.5

Which of the following can be correctly inferred about the model’s prediction?

The predicted probability of class 1 is greater than 0.5
The predicted label is 1
The predicted label is 0
The value of w^Tx irrelevant to prediction

Answer : a, b

25. You are designing a binary classifier using logistic regression. The model has learned the weight vector w=[−3,4] and no bias term is used.
If a new point x=[1,1] is evaluated, what will be the model output and prediction?

The predicted label is 1
The predicted label is 0
The model output cannot be determined without a bias term
The model output is undefined for input [1, 1]

Answer : a

26.

Answer : c

27.

Based on the curve, what can you infer about the parameters w and b?

w is close to 0 and b is large
w is large and b is small
w is large and b is large
w is small and b is negative

Answer : c

28.

Which of the following changes would make the curve transition more sharply (closer to a step function)?

Increase b
Increase w
Decrease w
Set b=0

Answer : b

29. Why is Sum of Squared Errors (SSE) considered better than Sum of Errors (SE) in many learning scenarios?

SSE ensures that positive and negative errors do not cancel each other out
SSE magnifies larger errors, making the model more sensitive to outliers
The derivative of SSE with respect to prediction is simple and continuous
SSE always leads to better accuracy than SE
Sum of errors can be zero even when individual predictions are wrong

Answer : a, b, c, e

30. Statement I: Any linearly separable function can be represented using a singlelayer perceptron.
Statement II: A single sigmoid neuron can approximate any Boolean function with zero error.
Which of the above statements is/are correct?

Only I
Only II
Both I and II
None

Answer : a

31. You are given a multi-layer perceptron with one hidden layer consisting of 8 perceptrons and a single output neuron. Each perceptron in the hidden layer outputs either 0 or 1 based on its input.
Which of the following statements is true about the function capacity of this network?

The network is capable of implementing 2⁸ Boolean functions
The network is capable of implementing 2⁶⁴ Boolean functions
The output neuron receives a continuous-valued input
Each hidden neuron produces 64 possible outputs

Answer : a