What are the similarities and differences between a binary logistic classifier and a neural network composed of a single unit with a sigmoid activation function?
What are the differences between the different kinds of activation functions? What are their advantages and drawbacks?
What's the difference between a multinomial regression classifier and a single layer neural net?
Which additional benefit is created by adding a hidden layer to a neural net?
What's the purpose of 'pooling' and how it can be performed?
How should the parameters of a neural network be initialized? Is there difference to logistic regression?
How can the gradient of the loss function be computed for the last layer of the network? How can this be done on the other layers?
What's the mathematical foundation of backpropagation?
How does a computation graph work?
What is overfitting and how can it be avoided?
What are the possibilities to deal with pretrained embeddings in a neural language model?