Category Archives: Computer Vision

Topic 23 – Introduction to Computer Vision

Why do I need to learn about computer vision?

Computer vision has become an increasingly interesting field, with achievements such as image recognition, autonomous driving, and disease detection.

Nowadays, a key skill for software developers is the ability to use computer vision algorithms and tools to solve real-world problems involving images and videos.

What can I do after finishing learning about applied computer vision?

You will be able to create software that can recognize a face or transform a picture of a young person into an older person.

That sounds fun! What should I do now?

First, please take a quick look at the following two books to grasp the core concepts and methods in computer vision:

After that, please audit the course and read the book below to solidify your knowledge and gain hands-on experience with computer vision algorithms:

After that, please audit the following courses to grasp the core concepts of generative adversarial networks and gain hands-on experience with them:

After that, please audit the following courses and read the book below to grasp the core concepts of generative models, including diffusion models, and to gain hands-on experience with these models:

After that, please audit this course to learn how to efficiently represent, compress, and train large generative models: TinyML and Efficient Deep Learning Computing.

Terminology Review:

  • Digital Image: f(x, y)
  • Intensity (Gray Level): ℓ = f(x, y)
  • Gray Scale: ℓ = 0 is considered black and ℓ = L – 1 is considered white.
  • Quantization: Digitizing the amplitude values.
  • Sampling: Digitizing the coordinate values.
  • Representing Digital Images: Matrix or Vector.
  • Pixel or Picture Element: An element of matrix or vector.
  • Deep Learning.
  • Artificial Neural Networks.
  • Filter: 2-dimensional matrix commonly square in size containing weights shared all over the input space.
  • The Convolution Operation: Element-wise multiply, and add the outputs.
  • Stride: Filter step size.
  • Padding.
  • Upsampling: Nearest Neighbors, Linear Interpolation, Bilinear Interpolation.
  • Max Pooling, Average Pooling, Min Pooling.
  • Convolutional Layers.
  • Feature Maps.
  • Convolutional Neural Networks (CNNs).
  • Object Localization.
  • Bounding Box.
  • Landmark Detection.
  • Sliding Windows Detection.
  • Bounding Box Predictions.
  • Intersection over Union.
  • Non-max Suppression Algorithm.
  • Anchor Box Algorithm.
  • Object Detection.
  • YOLO Algorithm.
  • Semantic Segmentation.
  • Transpose Convolution.
  • U-Net.
  • Face Verification.
  • Face Recognition.
  • One-shot Learning.
  • Siamese Network.
  • Triplet Loss.
  • Neural Style Transfer.
  • Content Cost Function.
  • Style Cost Function.
  • 1D Convolution.
  • 3D Convolution.
  • Latent Variable.
  • Autoencoders.
  • Variational Autoencoders.
  • Generators.
  • Discriminators.
  • Binary Cross Entropy Loss Function, Log Loss Function.
  • Generative Adversarial Networks (GANs).
  • Deep Convolutional Generative Adversarial Networks.
  • Mode Collapse.
  • Earth Mover’s Distance.
  • Wasserstein Loss (W-Loss).
  • 1-Lipschitz Continuous Function.
  • Wasserstein GANs.
  • Conditional GANs.
  • Pixel Distance.
  • Feature Distance.
  • Fréchet Inception Distance (FID).
  • Inception Score (IS).
  • Autoregressive Models.
  • Variational Autoencoders (VAEs).
  • Flow Models.
  • StyleGAN.
  • Pix2Pix.
  • CycleGAN.
  • Diffusion Models.
  • Magnitude-based Pruning.
  • K-Means-based Weight Quantization.
  • Linear Quantization.
  • Neural Architecture Search.
  • Knowledge Distillation.
  • Self and Online Distillation.
  • Network Augmentation.
  • Loop Reordering, Loop Tiling, Loop Unrolling, SIMD (Single Instruction, Multiple Data) Programming, Multithreading, CUDA Programming.
  • Data Parallelism.
  • Pipeline Parallelism.
  • Tensor Parallelism.
  • Hybrid Parallelism.
  • Automated Parallelism.
  • Gradient Pruning: Sparse Communication, Deep Gradient Compression, PowerSGD.
    Gradient Quantization: 1-Bit SGD, Threshold Quantization, TernGrad.
  • Delayed Gradient Averaging.

After finishing computer vision, please click on Topic 24 – Introduction to Nature Language Processing to continue.