Why do I need to learn about computer vision?
Computer vision has become an increasingly interesting field, with achievements such as image recognition, autonomous driving, and disease detection.
Nowadays, a key skill for software developers is the ability to use computer vision algorithms and tools to solve real-world problems involving images and videos.
What can I do after finishing learning about applied computer vision?
You will be able to create software that can recognize a face or transform a picture of a young person into an older person.
That sounds fun! What should I do now?
First, please take a quick look at the following two books to grasp the core concepts and methods in computer vision:
After that, please audit the course and read the book below to solidify your knowledge and gain hands-on experience with computer vision algorithms:
After that, please audit the following courses to grasp the core concepts of generative adversarial networks and gain hands-on experience with them:
After that, please audit the following courses and read the book below to grasp the core concepts of generative models, including diffusion models, and to gain hands-on experience with these models:
After that, please audit this course to learn how to efficiently represent, compress, and train large generative models: TinyML and Efficient Deep Learning Computing.
Terminology Review:
- Digital Image: f(x, y)
- Intensity (Gray Level): ℓ = f(x, y)
- Gray Scale: ℓ = 0 is considered black and ℓ = L – 1 is considered white.
- Quantization: Digitizing the amplitude values.
- Sampling: Digitizing the coordinate values.
- Representing Digital Images: Matrix or Vector.
- Pixel or Picture Element: An element of matrix or vector.
- ∞×∞
- Computer Vision Tasks: Image Classification, Image Classification, Object Segmentation, Style Transfer, Image Colorization, Image Reconstruction, Image Super-Resolution, Generating Images.
- Deep Learning.
- Artificial Neural Networks.
- ∞×∞
- Filter: 2-dimensional matrix commonly square in size containing weights shared all over the input space.
- The Convolution Operation: Element-wise multiply, and add the outputs.
- Stride: Filter step size.
- Padding.
- Upsampling: Nearest Neighbors, Linear Interpolation, Bilinear Interpolation.
- Max Pooling, Average Pooling, Min Pooling.
- Convolutional Layers.
- Feature Maps.
- Convolutional Neural Networks (CNNs), ResNet.
- Receptive Field, Strided Convolution Layer, Grouped Convolution Layer.
- ∞×∞
- Object Localization.
- Bounding Box.
- Landmark Detection.
- Sliding Windows Detection.
- Bounding Box Predictions.
- Intersection over Union.
- Non-max Suppression Algorithm.
- Anchor Box Algorithm.
- ∞×∞
- Object Detection.
- YOLO Algorithm.
- ∞×∞
- Semantic Segmentation.
- Transpose Convolution.
- U-Net.
- ∞×∞
- Face Verification.
- Face Recognition.
- One-shot Learning.
- Siamese Network.
- Triplet Loss.
- ∞×∞
- Neural Style Transfer.
- Content Cost Function.
- Style Cost Function.
- ∞×∞
- 1D Convolution.
- 3D Convolution.
- ∞×∞
- Latent Variable.
- Autoencoders.
- Variational Autoencoders.
- Generators.
- Discriminators.
- Binary Cross Entropy Loss Function, Log Loss Function.
- Generative Adversarial Networks (GANs).
- Deep Convolutional Generative Adversarial Networks.
- Mode Collapse.
- Earth Mover’s Distance.
- Wasserstein Loss (W-Loss).
- 1-Lipschitz Continuous Function.
- Wasserstein GANs.
- Conditional GANs.
- Pixel Distance.
- Feature Distance.
- Fréchet Inception Distance (FID).
- Inception Score (IS).
- Autoregressive Models.
- Variational Autoencoders (VAEs).
- Flow Models.
- StyleGAN.
- Pix2Pix.
- CycleGAN.
- ∞×∞
- Diffusion Models.
- ∞×∞
- Tokenizer.
- Embeddings.
- Self-Attention.
- Multi-Head Attention.
- Attention Masking.
- Transformer Block.
- Positional Encoding.
- Vision Transformer.
- Contrastive Language-Image Pre-training (CLIP) Models.
- Visual Language Models: Flamingo.
- ∞×∞
- Magnitude-based Pruning.
- K-Means-based Weight Quantization.
- Linear Quantization.
- ∞×∞
- Neural Architecture Search.
- ∞×∞
- Knowledge Distillation.
- Self and Online Distillation.
- Network Augmentation.
- ∞×∞
- Loop Reordering, Loop Tiling, Loop Unrolling.
- SIMD (Single Instruction, Multiple Data) Programming.
- Multithreading.
- CUDA Programming.
- ∞×∞
- Data Parallelism.
- Pipeline Parallelism.
- Tensor Parallelism.
- Hybrid Parallelism.
- Automated Parallelism.
- Gradient Pruning: Sparse Communication, Deep Gradient Compression, PowerSGD.
Gradient Quantization: 1-Bit SGD, Threshold Quantization, TernGrad.
- Delayed Gradient Averaging.
After finishing computer vision, please click on Topic 24 – Introduction to Nature Language Processing to continue.