Which of the following is true?

A) In batch gradient descent we update the weights and biases of the neural network after forward pass over each training example.

**B) In batch gradient descent we update the weights and biases of our neural network after **forward** pass over all the training examples.**

C) Each step of stochastic gradient descent takes more time than each step of batch gradient descent.

D) None of these three options is correct.

**Answer: B**

In a neural network, which one of the following techniques is NOT useful to reduce overfitting?

A) Dropout

B) Regularization

C) Batch normalization

**D) Adding more layers**

**Answer: D**

For an image recognition problem (such as recognizing a cat in a photo), which architecture of neural network has been found to be better suited for the tasks?

A) Multi-layer perceptron

B) Recurrent neural network

**C) Convolutional neural network**

D) Perceptron

**Answer: C**

In training a batch neural network, after running the first few epochs, you notice that the loss does not decrease. The reasons for this could be

1. The learning rate is low.

2. The neural net is stuck in local minima

3. The neural net has too many units in the hidden layer

**A) 1 or 2**

8) 1 or 3

C) 2 or 3

D) 1 only

**Answer: A**

What is the sequence of steps followed in training a perceptron?

1. For a sample input, compute an output

2. Initialize weights of perceptron randomly

3. Go to the next batch of a dataset

4. If the prediction does not match the output, change the weights

A) 2,1,4,3

8) 1,4,3,2

C) 1,2,3,4 .

D) 2,3,4,1

**Answer: A**

A 4-input neuron has a bias of 0 and weights 1, 2, 3 and 4. The transfer function is given by f(v)= max(O,v). The inputs are 4, 10, 5 and 20 respectively. The output will be

A) 238

**B) 119**

**Answer: B**

The VC dimension of hypothesis space H1 is larger than the VC dimension of hypothesis space H2. Which of the following can be inferred from this?

**A) The number of examples required for learning a hypothesis in H1 is larger than the number of examples required for H2.**

B) The number of examples required for learning a hypothesis in H1 is smaller than the number of examples required for H2.

C) No relation to the number of samples required for PAC learning.

**Answer: A **

In ensemble learning, you aggregate the predictions for weak learners, so that an ensemble of these models will give a better prediction than a prediction of individual models. Which of the following statements is/are true for weak learners used in ensemble model?

1. They don’t usually overfit.

2. They have a high bias, so they cannot solve complex learning problems

3. They usually overfit.

**A) 1 and 2**

B) 1 and 3

C) 2 and 3

D) Only 1

**Answer: A**

The Bayes Optimal Classifier

A) is an ensemble of some selected hypotheses in the hypothesis space.

**B) is an ensemble of all the hypotheses in the hypothesis space.**

C) is the hypothesis that gives the best result on test instances.

D) none of the above

**Answer: B **

For a particular learning task, if the requirement of error parameter ϵ changes from 0.1 to 0.01. How many more samples will be required for PAC learning?

A) Same

B) 2 times

**C) 10 times**

D) 1000 times

**Answer: C**

Data scientists always use multiple algorithms for prediction and they combine the output of multiple machine learning algorithms (known as “Ensemble Learning”) for getting more robust or generalized output which outperforms all the individual models. In which of the following options you think this is true ?

A) Base models having a higher correlation.

**B) Base models having the lower correlation.**

C) Use “Weighted average” instead of “Voting” methods of an ensemble.

D) Base models coming from the same algorithm

**Answer: B**

Suppose the VC dimension of a hypothesis space is 4. Which of the following are true?

A) No sets of 4 points can be shattered by the hypothesis space.

**B) At least one set of 4 points can be shattered by the hypothesis space.**

C) All sets of 4 points can be shattered by the hypothesis space.

**D) No set of 5 points can be shattered by the hypothesis space.**

**Answer: B, D**

The computational complexity of classes of learning problems depends on which of the following?

A) The size or complexity of the hypothesis space considered by a learner

B) the accuracy to which the target concept must be approximated

C) the probability that the learner will output a successful hypothesis

**D) All of the above**

**Answer: D**

Consider a circle in 2D whose center is at the origin. What is its VC dimension?

A) 1

**B) 2**

C) 3

D) 4

**Answer: B **

Which among the following prevents overfitting when we perform bagging?

**A) The use of sampling with replacement as the sampling technique**

B) The use of weak classifiers

C) The use of classification algorithms which are not prone to overfitting

D) The practice of validation performed on every classifier trained

**Answer: A**

VC dimension for conjunctions of n Boolean literals is:

**A) At least n**

B) At most n

C) Can’t say

D) None

**Answer: A**

Suppose you run K-means clustering algorithm on a given dataset. What are the factors on which the final clusters depend on?

I. The value of K

II. The initial cluster seeds chosen

III. The distance function used.

A) I only

B) II only

C) I and II only

**D) I, II and III**

**Answer: I, II and III**

Which of the following statements are true about the different types of linkages.

**A)single linkage suffers from chaining.**

B.)Average linkage suffers from crowding.

C)In single-linkage clustering, the similarity between two clusters depends on all the elements in the two clusters.

**D)Complete linkage avoids chaining but suffers from crowding.**

**Answe**r:** A And D**

Consider agglomerative hierarchical clustering and proceeding to a stage where every cluster has at least two points. You may be using single-link or complete-link hierarchical clustering. Is it possible for a point to be closer to points in other clusters than to points in its own cluster in some or all of these methods? Mark the methods where this can happen.

A)It is not possible in either method.

B)Only in single-link clustering

C)Only in complete-link clustering

**D)In both single-link and complete-link clustering**

**Answer: D**

Choose ALL the statements that are true for hierarchical agglomerative clustering

A) The number of clusters needs to be pre-specified.

**B) The output of the clustering algorithm depends on the choice of the ****similarity metric.**

**C) The number of merge operations depends on the number of clusters ****desired.**

D) The number of merge operations depends on the characteristics of the data

set.

**Answer: B And C**

We would like to cluster the natural numbers from 1 to 1024 into two clusters using hierarchical agglomerative clustering. We will use Euclidian distance as our distance measure. We break ties by merging the clusters in which the lowest natural number resides. For example, if the distance between clusters A and B is the same as the distance between cluster C and 0; we would choose A and B as the next clusters to merge if min|A , B l<min [c, D] , where {A, B} is the set of natural numbers assigned to clusters A and B. For complete linkage clustering method specify the number of elements assigned to each of the clusters obtained by cutting the dendrogram at the root. In complete-linkage clustering, the distance between two clusters is the distance of the farthest members of the clusters.

A) 1,1023

**B) 512,512**

C) 1022,2

D) None of these

**Answer: B**

Which of the following is riot a clustering approach

A) Partitioning

B) Hierarchical

C) Density-based

**D) Bagging**

**Answer: D**

Which of the following options is a measure of internal evaluation of a clustering algorithm?

A) Rand index

**B) Davies-Bouldin index**

C) Jaccard index

D) F-measure

**Answer-B**

We are given the following four data points in two dimension: XI = (2,2), X2=(8,6), X3=(6,8), X4 = (2,4).We want to cluster the data points into two clusters CI and C2 using the K-Means algorithm. Euclidean distance is used for clustering.To initialize the algorithm we consider CI ={ xI,x3} and C2 ={x2,x4}. After two iteration of the K-means algorithm, the cluster memberships are:

A)CI = {XI ,x2} and C2 = {x3’x4}

**B)CI = {XI ,x4} and C2 = {x2,x3}**

C) CI ={xl,x3}and C2={x2’x4}

D) None of these.

**Answer: B**

With respect to k-means clustering, which of the following are the correct descriptions of the expectation (E) and maximization (M) steps respectively?

**A) E-step: assign points to nearest cluster center, M-step: estimate model parameters that maximize the likelihood for the given assignment of points.**

B) E-step: estimate model parameters that maximize the likelihood for the given, assignment of points, M-step: assign points to nearest cl1ustercenter.

C) None of A or B.

D) Both A and B

**Answer: A**