Tag Archives: PyTorch

Topic 23 – Introduction to Computer Vision

Why do I need to learn about computer vision?

Computer vision has become more and more interesting. Image recognition, autonomous driving, and disease detection are examples of breakthrough achievements in the field.

Nowadays a key skill that is often required from a software developer is the ability to use computer vision algorithms and tools to solve real-world problems related to images and videos.

What can I do after finishing learning about applied computer vision?

You will be to create software that could recognize recognize a face or transform a picture of young person to old person.

That sounds fun! What should I do now?

Please read
– this Rafael C. Gonzalez and Richard E. Woods (2018). Digital Image Processing. 4th Edition. Pearson book, and
– this Richard Szeliski (2022). Computer Vision: Algorithms and Applications. Springer book.

At the same time, please
– audit these Deep Learning Specialization courses and
– read this Francois Chollet (2021). Deep Learning with Python. Manning Publications book, and
– this Michael A. Nielsen (2015). Neural Networks and Deep Learning. Determination Press book.

After that please read this David Foster (2023). Generative Deep Learning – Teaching Machines To Paint, Write, Compose, and Play. O’Reilly Media book.

After that please read this Ian Goodfellow et al. (2016). Deep Learning. The MIT Press book.

After that please audit this TinyML and Efficient Deep Learning Computing course.

Terminology Review:

  • Digital Image: f(x, y)
  • Intensity (Gray Level): ℓ = f(x, y)
  • Gray Scale: ℓ = 0 is considered black and ℓ = L – 1 is considered white.
  • Quantization: Digitizing the amplitude values.
  • Sampling: Digitizing the coordinate values.
  • Representing Digital Images: Matrix or Vector.
  • Pixel or Picture Element: An element of matrix or vector.
  • Deep Learning.
  • Artificial Neural Networks.
  • Filter: 2-dimensional matrix commonly square in size containing weights shared all over the input space.
  • The Convolution Operation: Element-wise multiply, and add the outputs.
  • Stride: Filter step size.
  • Padding.
  • Upsampling: Nearest Neighbors, Linear Interpolation, Bilinear Interpolation.
  • Max Pooling, Average Pooling, Min Pooling.
  • Convolutional Layers.
  • Feature Maps.
  • Convolutional Neural Networks (CNNs).
  • Object Detection.
  • Face Recognition.
  • YOLO Algorithm.
  • Latent Variable.
  • Autoencoders.
  • Variational Autoencoders.
  • Generators.
  • Discriminators.
  • Binary Cross Entropy Loss Function, Log Loss Function.
  • Generative Adversarial Networks (GANs).
  • Mode Collapse.
  • Earth Mover’s Distance.
  • Wasserstein Loss (W-Loss).
  • 1-Lipschitz Continuous Function.
  • CycleGAN.
  • Neural Style Transfer.
  • Magnitude-based Pruning.
  • K-Means-based Weight Quantization.
  • Linear Quantization.

After finishing learning about computer vision please click Topic 24 – Introduction to Nature Language Processing to continue.