Why do I need to learn about computer vision?
Computer vision has become more and more interesting. Image recognition, autonomous driving, and disease detection are examples of breakthrough achievements in the field.
Nowadays a key skill that is often required from a software developer is the ability to use computer vision algorithms and tools to solve real-world problems related to images and videos.
What can I do after finishing learning about applied computer vision?
You will be to create software that could recognize recognize a face or transform a picture of young person to old person.
That sounds fun! What should I do now?
Please read
– this Rafael C. Gonzalez and Richard E. Woods (2018). Digital Image Processing. 4th Edition. Pearson book, and
– this Richard Szeliski (2022). Computer Vision: Algorithms and Applications. Springer book.
At the same time, please
– audit this Convolutional Neural Networks course and
– read this Francois Chollet (2021). Deep Learning with Python. Manning Publications book.
After that please audit
– this Build Basic Generative Adversarial Networks course and
– this Build Better Generative Adversarial Networks course and
– this Apply Generative Adversarial Networks course and
– this How Diffusion Models Work course.
After that please read this David Foster (2023). Generative Deep Learning – Teaching Machines To Paint, Write, Compose, and Play. O’Reilly Media book.
After that please read this Ian Goodfellow et al. (2016). Deep Learning. The MIT Press book.
After that please audit this TinyML and Efficient Deep Learning Computing course.
Terminology Review:
- Digital Image: f(x, y)
- Intensity (Gray Level): ℓ = f(x, y)
- Gray Scale: ℓ = 0 is considered black and ℓ = L – 1 is considered white.
- Quantization: Digitizing the amplitude values.
- Sampling: Digitizing the coordinate values.
- Representing Digital Images: Matrix or Vector.
- Pixel or Picture Element: An element of matrix or vector.
- Deep Learning.
- Artificial Neural Networks.
- Filter: 2-dimensional matrix commonly square in size containing weights shared all over the input space.
- The Convolution Operation: Element-wise multiply, and add the outputs.
- Stride: Filter step size.
- Padding.
- Upsampling: Nearest Neighbors, Linear Interpolation, Bilinear Interpolation.
- Max Pooling, Average Pooling, Min Pooling.
- Convolutional Layers.
- Feature Maps.
- Convolutional Neural Networks (CNNs).
- Object Localization.
- Bounding Box.
- Landmark Detection.
- Sliding Windows Detection.
- Bounding Box Predictions.
- Intersection over Union.
- Non-max Suppression Algorithm.
- Anchor Box Algorithm.
- Object Detection.
- YOLO Algorithm.
- Semantic Segmentation.
- Transpose Convolution.
- U-Net.
- Face Verification.
- Face Recognition.
- One-shot Learning.
- Siamese Network.
- Triplet Loss.
- Neural Style Transfer.
- Content Cost Function.
- Style Cost Function.
- 1D Convolution.
- 3D Convolution.
- Latent Variable.
- Autoencoders.
- Variational Autoencoders.
- Generators.
- Discriminators.
- Binary Cross Entropy Loss Function, Log Loss Function.
- Generative Adversarial Networks (GANs).
- Deep Convolutional Generative Adversarial Networks.
- Mode Collapse.
- Earth Mover’s Distance.
- Wasserstein Loss (W-Loss).
- 1-Lipschitz Continuous Function.
- Wasserstein GANs.
- Conditional GANs.
- Pixel Distance.
- Feature Distance.
- Fréchet Inception Distance (FID).
- Inception Score (IS).
- Autoregressive Models.
- Variational Autoencoders (VAEs).
- Flow Models.
- StyleGAN.
- Pix2Pix.
- CycleGAN.
- Diffusion Models.
- Magnitude-based Pruning.
- K-Means-based Weight Quantization.
- Linear Quantization.
- Neural Architecture Search.
- Knowledge Distillation.
- Self and Online Distillation.
- Network Augmentation.
- Loop Reordering, Loop Tiling, Loop Unrolling, SIMD (Single Instruction, Multiple Data) Programming, Multithreading, CUDA Programming.
- Data Parallelism.
- Pipeline Parallelism.
- Tensor Parallelism.
- Hybrid Parallelism.
- Automated Parallelism.
- Gradient Pruning: Sparse Communication, Deep Gradient Compression, PowerSGD.
Gradient Quantization: 1-Bit SGD, Threshold Quantization, TernGrad. - Delayed Gradient Averaging.
After finishing learning about computer vision please click Topic 24 – Introduction to Nature Language Processing to continue.