Category Archives: Machine Learning

Computer Science Curriculum, Machine Learning, Natural Language Processing, Software Engineering Curriculum

Topic 24 – Introduction to Nature Language Processing

October 8, 2023 admin Leave a comment

Why do I need to learn about nature language processing?

Natural language processing (NLP) has become increasingly interesting, with breakthrough achievements such as speech recognition, speech synthesis, autonomous driving, and chatbots.

Nowadays, a key skill for software developers is the ability to use NLP algorithms and tools to solve real-world problems involving text, audio, natural language sentences, and speech.

What can I do after finishing learning about nature language processing?

You will be to create software that could recognize speech, translate text to speech, translate a sentence from English to French, answer a customer’s question.

That sounds fun! What should I do now?

First, please take a quick look at the following two books to grasp the core concepts and classical methods in natural language processing:

After that, please audit this course, Sequence Models, to obtain the core concepts and hands-on experience with sequence models.

After that please watch these videos to learn about audio signal processing for machine learning.

After that, please audit the courses below to learn how to understand and generate natural language using deep learning models:

After that, please read the book below to learn how to use large lanugage models to build NLP applications:

Lewis Tunstall et al. (2022). Natural Language Processing with Transformers – Building Language Applications with Hugging Face. O’Reilly Media

Terminology Review:

Natural Language Processing.
Text Classification (e.g. Spam Detection).
Named Entity Recognition.
Chatbots.
Speech Processing.
Speech Recognition.
Speech Synthesis.
Machine Translation.
Corpus: A body of texts.
Token: a word or a number or a punctuation mark.
Collocation: compounds (e.g. disk drive), phrasal verbs (e.g. make up), and other stock phrases (e.g. bacon and eggs).
Unigram: word.
Bigrams: pairs of words that occur commonly.
Trigrams: 3 words that occur commonly.
N-grams: n words that occur commonly.
Hypothesis Testing.
t-Test.
Likelihood Ratios.
Language Model: statistical model of word sequences.
Naive Bayes.
Hidden Markov Models.
Bag-of-Words Model.
Term Frequency–Inverse Document Frequency (TF–IDF).
Bag-of-n-Grams.
One-Hot Representation: You have a vocabulary of n words and you represent each word using a vector that is n bits long, in which all bits are zero except for one bit that is set to 1.
Word Embedding (Featurized Representation) is the transformation from words to dense vector.
Euclidean Distance, Dot Product Similarity, Cosine Similarity.
Embedding Matrix.
Neural Language Model.
Word2Vec: Skip-Gram Model, Bag-of-Words Model.
Negative Sampling.
GloVe, Global Vectors.
Recurrent Neural Networks.
Backpropagation Through Time.
Recurrent Neural Net Language Model (RNNLM).
Gated Recurrent Unit (GRU).
Long Short Term Memory (LSTM).
Bidirectional RNN.
Deep RNNs.
Sequence to Sequence Model.
Teacher Forcing.
Image Captioning.
Greedy Search.
Beam Search, Length Normalization.
BLEU (BiLingual Evaluation Understudy) Score.
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) Score.
F1 Score.
Minimum Bayes-Risk.
Attention Mechanism.
Self-Attention (Scaled and Dot-Product Attention): Queries, Keys and Values.
Positional Encoding.
Masked Self-Attention.
Multi-Head Attention.
Residual Dropout.
Label Smoothing.
Transformer Encoder.
Transformer Decoder.
Transformer Encoder-Decoder.
Cross-Attention.
Byte Pair Encoding.
BERT (Bidirectional Encoder Representations from Transformers).

After finishing natural language processing, please click on Topic 25 – Introduction to Distributed Systems to continue.

Computer Science Curriculum, Computer Vision, Machine Learning, Software Engineering Curriculum

Topic 23 – Introduction to Computer Vision

October 8, 2023 admin Leave a comment

Why do I need to learn about computer vision?

Computer vision has become an increasingly interesting field, with achievements such as image recognition, autonomous driving, and disease detection.

Nowadays, a key skill for software developers is the ability to use computer vision algorithms and tools to solve real-world problems involving images and videos.

What can I do after finishing learning about applied computer vision?

You will be able to create software that can recognize a face or transform a picture of a young person into an older person.

That sounds fun! What should I do now?

First, please take a quick look at the following two books to grasp the core concepts and methods in computer vision:

After that, please audit the course and read the book below to solidify your knowledge and gain hands-on experience with computer vision algorithms:

After that, please audit the following courses to grasp the core concepts of generative adversarial networks and gain hands-on experience with them:

After that, please audit the following courses and read the book below to grasp the core concepts of generative models, including diffusion models, and to gain hands-on experience with these models:

After that, please audit this course to learn how to efficiently represent, compress, and train large generative models: TinyML and Efficient Deep Learning Computing.

Terminology Review:

Digital Image: f(x, y)
Intensity (Gray Level): ℓ = f(x, y)
Gray Scale: ℓ = 0 is considered black and ℓ = L – 1 is considered white.
Quantization: Digitizing the amplitude values.
Sampling: Digitizing the coordinate values.
Representing Digital Images: Matrix or Vector.
Pixel or Picture Element: An element of matrix or vector.
Deep Learning.
Artificial Neural Networks.
Filter: 2-dimensional matrix commonly square in size containing weights shared all over the input space.
The Convolution Operation: Element-wise multiply, and add the outputs.
Stride: Filter step size.
Padding.
Upsampling: Nearest Neighbors, Linear Interpolation, Bilinear Interpolation.
Max Pooling, Average Pooling, Min Pooling.
Convolutional Layers.
Feature Maps.
Convolutional Neural Networks (CNNs).
Object Localization.
Bounding Box.
Landmark Detection.
Sliding Windows Detection.
Bounding Box Predictions.
Intersection over Union.
Non-max Suppression Algorithm.
Anchor Box Algorithm.
Object Detection.
YOLO Algorithm.
Semantic Segmentation.
Transpose Convolution.
U-Net.
Face Verification.
Face Recognition.
One-shot Learning.
Siamese Network.
Triplet Loss.
Neural Style Transfer.
Content Cost Function.
Style Cost Function.
1D Convolution.
3D Convolution.
Latent Variable.
Autoencoders.
Variational Autoencoders.
Generators.
Discriminators.
Binary Cross Entropy Loss Function, Log Loss Function.
Generative Adversarial Networks (GANs).
Deep Convolutional Generative Adversarial Networks.
Mode Collapse.
Earth Mover’s Distance.
Wasserstein Loss (W-Loss).
1-Lipschitz Continuous Function.
Wasserstein GANs.
Conditional GANs.
Pixel Distance.
Feature Distance.
Fréchet Inception Distance (FID).
Inception Score (IS).
Autoregressive Models.
Variational Autoencoders (VAEs).
Flow Models.
StyleGAN.
Pix2Pix.
CycleGAN.
Diffusion Models.
Magnitude-based Pruning.
K-Means-based Weight Quantization.
Linear Quantization.
Neural Architecture Search.
Knowledge Distillation.
Self and Online Distillation.
Network Augmentation.
Loop Reordering, Loop Tiling, Loop Unrolling, SIMD (Single Instruction, Multiple Data) Programming, Multithreading, CUDA Programming.
Data Parallelism.
Pipeline Parallelism.
Tensor Parallelism.
Hybrid Parallelism.
Automated Parallelism.
Gradient Pruning: Sparse Communication, Deep Gradient Compression, PowerSGD.
Gradient Quantization: 1-Bit SGD, Threshold Quantization, TernGrad.
Delayed Gradient Averaging.

After finishing computer vision, please click on Topic 24 – Introduction to Nature Language Processing to continue.

Computer Science Curriculum, Machine Learning, Software Engineering Curriculum

Topic 22 – Introduction to Machine Learning

April 4, 2019 admin Leave a comment

Why do I need to learn about machine learning?

Machine learning has been used to solve many important and difficult problems, including speech recognition, speech synthesis, image recognition, autonomous driving, and chatbots. Today, a key skill for software developers is the ability to use machine learning algorithms to solve real-world problems.

What can I do after finishing learning about machine learning?

You will be to create software that could recognize car plate number from an image, identify probability of breast cancer for a patient.

That sounds useful! What should I do now?

First, please audit these couses to learn the core concepts of machine learning and gain hands-on experience with them:

After that, please read the following books to reinforce your theoretical understanding and practical competence in machine learning:

After that, please audit this course and read its readings to learn the core approaches and algorithms for building artificial intelligence systems: MIT 6.034 – Artificial Intelligence, Fall 2010 (Readings).

After that, please read the following books to to study the mathematical foundations underlying machine learning algorithms:

After that, please audit the following courses and read the book below to learn the core concepts and algorithms of reinforcement learning:

Supervised Learning Terminology Review:

Artificial Intelligence.
Machine Learning.
Deep Learning.
Linear Regression: Y = θᵀX + Ε.
Cost Function measures how good/bad your model is.
Mean Square Error (MSE) measures the average of the squares of the errors.
Gradient Descent, Learning Rate.
Batch Gradient Descent.
The R-Squared Test measures the proportion of the total variance in the output (y) that can be explained by the variation in x. It can be used to evaluate how good a “fit” some model is on the given data.
Stochastic Gradient Descent.
Mini-Batch Gradient Descent.
Overfitting: machine learning model gives accurate predictions for training data but not for new data.
Regularization: Ridge Regression, Lasso Regression, Elastic Net, Early Stopping.
Normalization.
Logistic Regression.
Sigmoid Function.
Binary Cross Entropy Loss Function, Log Loss Function.
One Hot Encoding.
The Softmax function takes an N-dimensional vector of arbitrary real values and produces another N-dimensional vector with real values in the range (0, 1) that add up to 1.0.
Softmax Regression.
Gradient Ascent.
Newton’s Method.
Support Vector Machines.
Decision Trees.
Parametric vs. Non-parametric Models.
K-Nearest Neighbors.
Locally Weighted Regression.
McCulloch-Pitts Neuron.
Linear Threshold Unit with threshold T calculates the weighted sum of its inputs, and then outputs 0 if this sum is less than T, and 1 if the sum is greater than T.
Perceptron.
Artificial Neural Networks.
Backpropagation.
Activation Functions: Rectified Linear Unit (ReLU), Leaky ReLU, Sigmoid, Hyperbolic Tangent.
Batch Normalization.
Learning Rate Decay.
Exponentially Weighted Averages.
Gradient Descent Optimization Algorithms: Momentum, Adagrad, Adadelta, RMSprop, Adam.
Regularization: Dropout.
The Joint Probability Table.
Bayesian Networks.
Naive Bayes Inference.

Unsupervised Learning Terminology Review:

K-Means.
Principal Component Analysis.
User-Based Collaborative Filtering.
Item-based Collaborative Filtering.
Matrix Factorization.

Reinforcement Learning Terminology Review:

k-armed Bandit Problem.
Sample-Average Method.
Greedy Action.
Exploration and Exploitation.
ϵ-Greedy Action Selection.
Bandit Algorithm.
Exponential Recency-Weighted Average.
Optimistic Initial Values.
Upper-Confidence-Bound Action Selection.
Rewards.
Agent, Actions, World or Environment.
History, States, Terminal State, Environment State, Agent State, Information State.
Fully Observable Environments.
Partially Observable Environments.
Policy, Value Function, Model.
Value Based RL Agent, Policy Based RL Agent, Actor Critic RL Agent.
Model Free RL Agent, Model Based RL Agent.
Learning Problem and Planning Problem.
Prediction and Control.
Markov Property.
State Transition Matrix.
Markov Process.
Episodic Tasks.
Continuing Tasks.
Horizon (H): Number of time steps in each episode, can be infinite.
Markov Reward Process.
Discount Factor, Discount Rate: 0 ≤ γ ≤ 1.
Return.
Discounted Return: Discounted sum of rewards from time step t to horizon H.
State-Value Function.
Bellman Equation for Markov Reward Processes.
Markov Decision Process.
Policy: Mapping from states to actions. Deterministic policy: π (s) = a. Stochastic policy: π (a|s) = P(aₜ=a|sₜ=s).
State Value Function – Vπ(s): The expected return starting from state s following policy π.
Bellman Expectation Equation for Vπ.
Action Value Function (also known as State-Action Value Fucntion or the Quality Function) – Qπ(s, a): The expected return starting from state $s$ , taking action $a$ , then following policy $π$ .
Bellman Expectation Equation for Qπ.
Optimal State Value Function.
Optimal Action Value Function.
Bellman Optimality Equation for v*.
Bellman Optimality Equation for q*.
Optimal Policies.
Dynamic Programming.
Iterative Policy Evaluation.
Policy Improvement.
Policy Improvement Theorem.
Policy Iteration.
Value Iteration.
Synchronous Dynamic Programming.
Asynchronous Dynamic Programming.
Generalized Policy Iteration.
Bootstrapping: Updating estimates on the basis of other estimates.
Monte-Carlo Policy Evaluation.
First-Visit Monte-Carlo Policy Evaluation.
Every-Visit Monte-Carlo Policy Evaluation.
Incremental Mean.
Incremental Monte-Carlo Updates.
Temporal-Difference Learning.
Forward-View TD(λ).
Eligibility Traces.
Backward-View TD(λ).
On-Policy Learning.
Off-Policy Learning.
ϵ-Greedy Exploration.
ϵ-greedy Policies: Most of the time they choose an action that has maximal estimated action value, but with probability ϵ they instead select an action at random.
Monte-Carlo Policy Iteration. Policy evaluation: Monte-Carlo policy evaluation, Q = qπ. Policy improvement: ϵ-greedy policy improvement.
Monte-Carlo Control. Policy evaluation: Monte-Carlo policy evaluation, Q ≈ qπ. Policy improvement: ϵ-greedy policy improvement.
Exploring Starts: Specify that the episodes start in a state–action pair, and that every pair has a nonzero probability of being selected as the start.
Monte Carlo Control Exploring Starts.
Greedy in the Limit with Innite Exploration (GLIE) Monte-Carlo Control.
ϵ-soft Policies: Policies for which π(a|s) ≥ ϵ/|A(s)| for all states and actions, for some ϵ > 0.
On-Policy First-Visit MC Control.
SARSA: State (S), Action (A), Reward (R), State (S’), Action (A’).
On-Policy Control with SARSA. Policy evaluation: SARSA evaluation, Q ≈ qπ. Policy improvement: ϵ-greedy policy improvement.
Forward-View SARSA (λ).
Backward-View SARSA (λ).
Target Policy.
Behavior Policy.
Importance Sampling: Use samples from one distribution to estimate the expectation of a different distribution.
Importance Sampling for Off-Policy Monte-Carlo.
Importance Sampling for Off-Policy TD.
Q-Learning: Next action is chosen using behaviour policy. Q is updated using alternative successor action.
Off-Policy Control with Q-Learning.
Expected SARSA.
Value Function Approximation.
Function Approximators.
Differentiable Function Approximators.
Feature Vectors.
State Aggregation.
Coarse Coding.
Tile Coding.
Continuous States.
Incremental Prediction Algorithms.
Control with Value Function Approximation. Policy evaluation: Approximate policy evaluation, q(.,., w) ≈ qπ. Policy improvement: ϵ-greedy policy improvement.
Learning State Action Value function: Replay Buffer: 10,000 tuples most recent (s, a, R(s), s’). x = (s, a) → Q(θ) → y = R(s) + γmaxQ(s’, a’, θ). Loss = [R(s) + γmaxQ(s’, a’; θ)] − Q(s, a; θ).
Expected SARSA with Function Approximation.
Target Network: A separate neural network for generating the y targets. It has the same architecture as the original Q-Network. Loss = [R(s) + γmaxTargetQ(s’, a’; θ′)] − Q(s, a; θ). Every C time steps we will use the TargetQ-Network to generate the y targets and update the weights of the TargetQ-Network using the weights of the Q-Network.
Soft Updates: $θ^{'}$ $θ^{'} \leftarrow τ θ + (1 - τ) θ^{'}$ , where $θ^{'}$ and $θ$ represent the weights of the target network and the current network, respectively.
Deep Q-learning.
Linear Least Squares Prediction Algorithms.
Least Squares Policy Iteration. Policy evaluation: Least squares Q-Learning. Policy improvement: Greedy policy improvement.
Average Reward.
Discounted Returns, Returns for Average Reward.
Stochastic Policies.
Softmax Policies.
Gaussian Policies.
Policy Objective Functions: Start State Objective, Average Reward Objective and Average Value Objective.
Score Function.
Policy Gradient Theorem.
Monte-Carlo Policy Gradient (REINFORCE).
Action-Value Actor-Critic: Critic updates w by linear TD(0). Actor updates θ by policy gradient.
The Tabular Dyna-Q Algorithm.
The Dyna-Q+ Algorithm.
Forward Search.
Simulation-Based Search.
Monte-Carlo Tree Search.
Temporal-Difference Search.
Dyna-2.

Probabilistic Machine Learning Terminology Review:

Probabilistic Machine Learning
Non-Probabilistic Machine Learning
Algorithmic Machine Learning.
Array Programming.
Frequentist and Bayesian Approaches.

After finishing machine learning, please click on Topic 23 – Introduction to Computer Vision to continue.

Software Development

Category Archives: Machine Learning

Topic 24 – Introduction to Nature Language Processing

Topic 23 – Introduction to Computer Vision

Topic 22 – Introduction to Machine Learning

Software development and software engineering research