Questionnaire

See Lesson 2

  1. Provide an example of where the bear classification model might work poorly in production, due to structural or style differences in the training data.

    • when data used to train the model is very different from data used in production for inferencing.
  2. Where do text models currently have a major deficiency?

    • in generating correct responses to queries.
  3. What are possible negative societal implications of text generation models?

    • automating disinformation campaigns using DL text models to generate simulated human responses.
  4. In situations where a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process?

    • allow humans to oversee predictions made by ML models
  5. What kind of tabular data is deep learning particularly good at?

    • data with high cardinality categorical columns
  6. What's a key downside of directly using a deep learning model for recommendation systems?

    • models that predict what items users would already buy or watch even if no recommendation was made instead of recommending items that a user might buy/watch but if not for the recommendation.
  7. What are the steps of the Drivetrain Approach?

    • Determine objectives
    • Determine levers (inputs that can be controlled)
    • Determine data that can be collected to control the levers
    • Build the models
  8. How do the steps of the Drivetrain Approach map to a recommendation system?

    • Objective: create recommendations for items that users might like but otherwise would not be have if not for the recommendation and trigger new sales
    • Levers: ranking of the recommendation
    • Data: randomized experiments on a wide range of recommendations to a wide variety of customers
    • Models:
      • 2 Models for purchase probabilities conditional on seeing or not seeing a recommendation
        1. model - for purchases given a recommendation
        2. model - for purchases not given a recommendation
      • utility function for the difference in probabilities
        • low
          • when model recommends items which a customer is familiar with and rejected (both components low)
          • when the model recommends items which a customer would have bought even with the recommendation (both high)
        • high
          • when the model recommends an item that a customer would buy if given a recommendation (large value) but low when the model does not recommend that item
  9. Create an image recognition model using data you curate, and deploy it on the web.

    • TODO
  10. What is DataLoaders?

    • an object that specifies the way to feed data to a model
  11. What four things do we need to tell fastai to create DataLoaders?

    • what kinds of data we are working with
    • how to get the items
    • how to label items
    • how to split the validation set from the training set
  12. What does the splitter parameter to DataBlock do?

    • how to split the validation set from the training set
  13. How do we ensure a random split always gives the same validation set?

    • by setting the random seed to a specific number ensures that the same pseudo random sequence is used each time to split the validation set from the training set
  14. What letters are often used to signify the independent and dependent variables?

    • x for dependent, y for independent
  15. What's the difference between the crop, pad, and squish resize approaches? When might you choose one over the others?

    • crop - used by default - use full height or width and crop to make it square
    • pad - add zeros to smaller dimension to make it square
    • squish - scale bigger dimension to fit into square
    • squish is used when it is important to include all the image data but the model can tolerate the distortion, crop is useful when partial crops of the image can still be useful in predicting the label, pad is useful when no distortion can be allowed but at the expense of having to pad the images which might decrease the model's performance.
  16. What is data augmentation? Why is it needed?

    • alter the input slightly each time data goes through an epoch
    • improves model by reducing overfitting (not seeing the same data for each epoch due to distortions added by data augmentation)
  17. What is the difference between item_tfms and batch_tfms?

    • item_tfms - transforms applied once on each sample while reading the data.
    • batch_tfms - applied to entire batch - usually using GPU so its fast.
  18. What is a confusion matrix?

    • a table plot that shows how many the model got right and wrong for each label category
  19. What does export save?

    • an inference model - a model that can be used for making predictions
  20. What is it called when we use a model for getting predictions, instead of training?

    • inferencing
  21. What are IPython widgets?

    • components that can add interactive functionality to a jupyter notebook
  22. When might you want to use CPU for deployment? When might GPU be better?

    • CPU is easier to deploy, when no training is needed
    • GPU is faster when making lots of parallel inferences to run through the model
  23. What are the downsides of deploying your app to a server, instead of to a client (or edge) device such as a phone or PC?

    • slower response due to roundtrip time, more centralized resources required
  24. What are three examples of problems that could occur when rolling out a bear warning system in practice?

    • data used in training is not the same data used in production
      • video not images
      • night time
      • video camera feed vs internet pictures
    • slow response time
  25. What is "out-of-domain data"?

    • data used in production is not part of training data
  26. What is "domain shift"?

    • over time, data encountered in production is not same as used in training
  27. What are the three steps in the deployment process?

    • manual oversight
    • limited scope deployment
    • gradual expansion

Further Research

  1. Consider how the Drivetrain Approach maps to a project or problem you're interested in.
  2. When might it be best to avoid certain types of data augmentation?
  3. For a project you're interested in applying deep learning to, consider the thought experiment "What would happen if it went really, really well?"
  4. Start a blog, and write your first blog post. For instance, write about what you think deep learning might be useful for in a domain you're interested in.