Fastbook Chapter 4 Questionnaire Answers
Fastbook Chapter 4 Questionnaire Answers
- How is a greyscale image represented on a computer? How about a color image?
- How are the files and folders in the
MNIST_SAMPLE
dataset structured? Why? - Explain how the "pixel similarity" approach to classifying digits works.
- What is a list comprehension? Create one now that selects odd numbers from a list and doubles them.
- What is a "rank 3 tensor"?
- What is the difference between tensor rank and shape? How do you get the rank from the shape?
- What are RMSE and L1 norm?
- How can you apply a calculation on thousands of numbers at once, many thousands of times faster than a Python loop?
- Create a 3x3 tensor or array containing the numbers from 1 to 9. Double it. Select the bottom right 4 numbers.
- What is broadcasting?
- Are metrics generally calculated using the training set, or the validation set? Why?
- What is SGD?
- Why does SGD use mini batches?
- What are the 7 steps in SGD for machine learning?
- How do we initialize the weights in a model?
- What is "loss"?
- Why can't we always use a high learning rate?
- What is a "gradient"?
- Do you need to know how to calculate gradients yourself?
- Why can't we use accuracy as a loss function?
- Draw the sigmoid function. What is special about its shape?
- What is the difference between loss and metric?
- What is the function to calculate new weights using a learning rate?
- What does the
DataLoader
class do? - Write pseudo-code showing the basic steps taken each epoch for SGD.
- Create a function which, if passed two arguments
[1,2,3,4]
and'abcd'
, returns[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
. What is special about that output data structure? - What does
view
do in PyTorch? - What are the "bias" parameters in a neural network? Why do we need them?
- What does the
@
operator do in python? - What does the
backward
method do? - Why do we have to zero the gradients?
- What information do we have to pass to
Learner
? - Show python or pseudo-code for the basic steps of a training loop.
- What is "ReLU"? Draw a plot of it for values from
-2
to+2
. - What is an "activation function"?
- What's the difference between
F.relu
andnn.ReLU
? - The universal approximation theorem shows that any function can be approximated as closely as needed using just one nonlinearity. So why do we normally use more?