Why do I need to learn about computer vision?
Computer vision has become an increasingly interesting field, with achievements such as image recognition, autonomous driving, and disease detection.
Nowadays, a key skill for software developers is the ability to use computer vision algorithms and tools to solve real-world problems involving images and videos.
What can I do after finishing learning about applied computer vision?
You will be able to create software that can recognize a face or transform a picture of a young person into an older person.
That sounds fun! What should I do now?
First, please take a quick look at the following two books to grasp the core concepts and methods in computer vision:
- Rafael C. Gonzalez and Richard E. Woods (2018). Digital Image Processing. 4th Edition. Pearson
- Richard Szeliski (2022). Computer Vision: Algorithms and Applications. Springer
After that, please audit the course and read the book below to solidify your knowledge and gain hands-on experience with computer vision algorithms:
- Convolutional Neural Networks
- Francois Chollet (2021). Deep Learning with Python. Manning Publications
After that, please audit the following courses to grasp the core concepts of generative adversarial networks and gain hands-on experience with them:
- Build Basic Generative Adversarial Networks
- Build Better Generative Adversarial Networks
- Apply Generative Adversarial Networks
After that, please audit the following courses and read the book below to grasp the core concepts of generative models, including diffusion models, and to gain hands-on experience with these models:
- How Diffusion Models Work
- Deep Generative Models (Stanford University CS236, 2023)
- David Foster (2023). Generative Deep Learning – Teaching Machines To Paint, Write, Compose, and Play. O’Reilly Media
After that, please audit this course to learn how to efficiently represent, compress, and train large generative models: TinyML and Efficient Deep Learning Computing.
Terminology Review:
- Digital Image: f(x, y)
- Intensity (Gray Level): ℓ = f(x, y)
- Gray Scale: ℓ = 0 is considered black and ℓ = L – 1 is considered white.
- Quantization: Digitizing the amplitude values.
- Sampling: Digitizing the coordinate values.
- Representing Digital Images: Matrix or Vector.
- Pixel or Picture Element: An element of matrix or vector.
- Deep Learning.
- Artificial Neural Networks.
- Filter: 2-dimensional matrix commonly square in size containing weights shared all over the input space.
- The Convolution Operation: Element-wise multiply, and add the outputs.
- Stride: Filter step size.
- Padding.
- Upsampling: Nearest Neighbors, Linear Interpolation, Bilinear Interpolation.
- Max Pooling, Average Pooling, Min Pooling.
- Convolutional Layers.
- Feature Maps.
- Convolutional Neural Networks (CNNs).
- Object Localization.
- Bounding Box.
- Landmark Detection.
- Sliding Windows Detection.
- Bounding Box Predictions.
- Intersection over Union.
- Non-max Suppression Algorithm.
- Anchor Box Algorithm.
- Object Detection.
- YOLO Algorithm.
- Semantic Segmentation.
- Transpose Convolution.
- U-Net.
- Face Verification.
- Face Recognition.
- One-shot Learning.
- Siamese Network.
- Triplet Loss.
- Neural Style Transfer.
- Content Cost Function.
- Style Cost Function.
- 1D Convolution.
- 3D Convolution.
- Latent Variable.
- Autoencoders.
- Variational Autoencoders.
- Generators.
- Discriminators.
- Binary Cross Entropy Loss Function, Log Loss Function.
- Generative Adversarial Networks (GANs).
- Deep Convolutional Generative Adversarial Networks.
- Mode Collapse.
- Earth Mover’s Distance.
- Wasserstein Loss (W-Loss).
- 1-Lipschitz Continuous Function.
- Wasserstein GANs.
- Conditional GANs.
- Pixel Distance.
- Feature Distance.
- Fréchet Inception Distance (FID).
- Inception Score (IS).
- Autoregressive Models.
- Variational Autoencoders (VAEs).
- Flow Models.
- StyleGAN.
- Pix2Pix.
- CycleGAN.
- Diffusion Models.
- Magnitude-based Pruning.
- K-Means-based Weight Quantization.
- Linear Quantization.
- Neural Architecture Search.
- Knowledge Distillation.
- Self and Online Distillation.
- Network Augmentation.
- Loop Reordering, Loop Tiling, Loop Unrolling, SIMD (Single Instruction, Multiple Data) Programming, Multithreading, CUDA Programming.
- Data Parallelism.
- Pipeline Parallelism.
- Tensor Parallelism.
- Hybrid Parallelism.
- Automated Parallelism.
- Gradient Pruning: Sparse Communication, Deep Gradient Compression, PowerSGD.
Gradient Quantization: 1-Bit SGD, Threshold Quantization, TernGrad. - Delayed Gradient Averaging.
After finishing computer vision, please click on Topic 24 – Introduction to Nature Language Processing to continue.