Category Archives: Machine Learning

Computer Science Curriculum, Machine Learning, Natural Language Processing, Software Engineering Curriculum

Topic 24 – Introduction to Nature Language Processing

October 8, 2023 admin Leave a comment

Why do I need to learn about nature language processing?

Natural language processing (NLP) has become more and more interesting. Speech recognition, speech synthesis, autonomous driving and chat bots are examples of breakthrough achievements in the field.

Nowadays a key skill of software developer is the ability to use nature language processing algorithms and tools to solve real-world problems related to text, audio, natural language sentences and speech.

What can I do after finishing learning about nature language processing?

You will be to create software that could recognize speech, translate text to speech, translate a sentence from English to French, answer a customer’s question.

That sounds fun! What should I do now?

Please take a quick look at
– this Daniel Jurafsky and James H. Martin (2014). Speech and Language Processing. Pearson book, and
– this Christopher D. Manning and Hinrich Schiitze (1999). Foundations of Statistical Natural Language Processing. MIT Press book first.

After that please audit this Sequence Models course.

After that please audit these Natural Language Processing Specialization courses and this Stanford CS224N – NLP with Deep Learning, Winter 2023 course (Lecture Notes).

After that please read this Lewis Tunstall et al. (2022). Natural Language Processing with Transformers – Building Language Applications with Hugging Face. O’Reilly Media book.

After that please watch these videos to learn about audio signal processing for machine learning.

Terminology Review:

Natural Language Processing.
Text Classification (e.g. Spam Detection).
Named Entity Recognition.
Chatbots.
Speech Processing.
Speech Recognition.
Speech Synthesis.
Machine Translation.
Corpus: A body of texts.
Token: a word or a number or a punctuation mark.
Collocation: compounds (e.g. disk drive), phrasal verbs (e.g. make up), and other stock phrases (e.g. bacon and eggs).
Unigram: word.
Bigrams: pairs of words that occur commonly.
Trigrams: 3 words that occur commonly.
N-grams: n words that occur commonly.
Hypothesis Testing.
t-Test.
Likelihood Ratios.
Language Model: statistical model of word sequences.
Naive Bayes.
Hidden Markov Models.
Bag-of-Words Model.
Term Frequency–Inverse Document Frequency (TF–IDF).
Bag-of-n-Grams.
One-Hot Representation: You have a vocabulary of n words and you represent each word using a vector that is n bits long, in which all bits are zero except for one bit that is set to 1.
Word Embedding (Featurized Representation) is the transformation from words to dense vector.
Euclidean Distance, Dot Product Similarity, Cosine Similarity.
Embedding Matrix.
Neural Language Model.
Word2Vec: Skip-Gram Model, Bag-of-Words Model.
Negative Sampling.
GloVe, Global Vectors.
Recurrent Neural Networks.
Backpropagation Through Time.
Recurrent Neural Net Language Model (RNNLM).
Gated Recurrent Unit (GRU).
Long Short Term Memory (LSTM).
Bidirectional RNN.
Deep RNNs.
Sequence to Sequence Model.
Teacher Forcing.
Image Captioning.
Greedy Search.
Beam Search, Length Normalization.
BLEU (BiLingual Evaluation Understudy) Score.
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) Score.
F1 Score.
Minimum Bayes-Risk.
Attention Mechanism.
Self-Attention (Scaled and Dot-Product Attention): Queries, Keys and Values.
Positional Encoding.
Masked Self-Attention.
Multi-Head Attention.
Residual Dropout.
Label Smoothing.
Transformer Encoder.
Transformer Decoder.
Transformer Encoder-Decoder.
Cross-Attention.
Byte Pair Encoding.
BERT (Bidirectional Encoder Representations from Transformers).

After finishing learning about natural language processing please click Topic 25 – Introduction to Distributed Systems to continue.

Computer Science Curriculum, Computer Vision, Machine Learning, Software Engineering Curriculum

Topic 23 – Introduction to Computer Vision

October 8, 2023 admin Leave a comment

Why do I need to learn about computer vision?

Computer vision has become more and more interesting. Image recognition, autonomous driving, and disease detection are examples of breakthrough achievements in the field.

Nowadays a key skill that is often required from a software developer is the ability to use computer vision algorithms and tools to solve real-world problems related to images and videos.

What can I do after finishing learning about applied computer vision?

You will be to create software that could recognize recognize a face or transform a picture of young person to old person.

That sounds fun! What should I do now?

Please read
– this Rafael C. Gonzalez and Richard E. Woods (2018). Digital Image Processing. 4th Edition. Pearson book, and
– this Richard Szeliski (2022). Computer Vision: Algorithms and Applications. Springer book.

At the same time, please
– audit this Convolutional Neural Networks course and
– read this Francois Chollet (2021). Deep Learning with Python. Manning Publications book.

After that please audit
– this Build Basic Generative Adversarial Networks course and
– this Build Better Generative Adversarial Networks course and
– this Apply Generative Adversarial Networks course and
– this How Diffusion Models Work course.

After that please read this David Foster (2023). Generative Deep Learning – Teaching Machines To Paint, Write, Compose, and Play. O’Reilly Media book.

After that please read this Ian Goodfellow et al. (2016). Deep Learning. The MIT Press book.

After that please audit this TinyML and Efficient Deep Learning Computing course.

Terminology Review:

Digital Image: f(x, y)
Intensity (Gray Level): ℓ = f(x, y)
Gray Scale: ℓ = 0 is considered black and ℓ = L – 1 is considered white.
Quantization: Digitizing the amplitude values.
Sampling: Digitizing the coordinate values.
Representing Digital Images: Matrix or Vector.
Pixel or Picture Element: An element of matrix or vector.
Deep Learning.
Artificial Neural Networks.
Filter: 2-dimensional matrix commonly square in size containing weights shared all over the input space.
The Convolution Operation: Element-wise multiply, and add the outputs.
Stride: Filter step size.
Padding.
Upsampling: Nearest Neighbors, Linear Interpolation, Bilinear Interpolation.
Max Pooling, Average Pooling, Min Pooling.
Convolutional Layers.
Feature Maps.
Convolutional Neural Networks (CNNs).
Object Localization.
Bounding Box.
Landmark Detection.
Sliding Windows Detection.
Bounding Box Predictions.
Intersection over Union.
Non-max Suppression Algorithm.
Anchor Box Algorithm.
Object Detection.
YOLO Algorithm.
Semantic Segmentation.
Transpose Convolution.
U-Net.
Face Verification.
Face Recognition.
One-shot Learning.
Siamese Network.
Triplet Loss.
Neural Style Transfer.
Content Cost Function.
Style Cost Function.
1D Convolution.
3D Convolution.
Latent Variable.
Autoencoders.
Variational Autoencoders.
Generators.
Discriminators.
Binary Cross Entropy Loss Function, Log Loss Function.
Generative Adversarial Networks (GANs).
Deep Convolutional Generative Adversarial Networks.
Mode Collapse.
Earth Mover’s Distance.
Wasserstein Loss (W-Loss).
1-Lipschitz Continuous Function.
Wasserstein GANs.
Conditional GANs.
Pixel Distance.
Feature Distance.
Fréchet Inception Distance (FID).
Inception Score (IS).
Autoregressive Models.
Variational Autoencoders (VAEs).
Flow Models.
StyleGAN.
Pix2Pix.
CycleGAN.
Diffusion Models.
Magnitude-based Pruning.
K-Means-based Weight Quantization.
Linear Quantization.
Neural Architecture Search.
Knowledge Distillation.
Self and Online Distillation.
Network Augmentation.
Loop Reordering, Loop Tiling, Loop Unrolling, SIMD (Single Instruction, Multiple Data) Programming, Multithreading, CUDA Programming.
Data Parallelism.
Pipeline Parallelism.
Tensor Parallelism.
Hybrid Parallelism.
Automated Parallelism.
Gradient Pruning: Sparse Communication, Deep Gradient Compression, PowerSGD.
Gradient Quantization: 1-Bit SGD, Threshold Quantization, TernGrad.
Delayed Gradient Averaging.

After finishing learning about computer vision please click Topic 24 – Introduction to Nature Language Processing to continue.

Computer Science Curriculum, Machine Learning, Software Engineering Curriculum

Topic 22 – Introduction to Machine Learning

April 4, 2019 admin Leave a comment

Why do I need to learn about machine learning?

Machine learning has been solving many important difficult problems. A few of them include speech recognition, speech synthesis, image recognition, autonomous driving and chat bots.
Nowadays a key skill of software developer is the ability to use machine learning algorithms solve real-world problems.

What can I do after finishing learning about machine learning?

You will be to create software that could recognize car plate number from an image, identify probability of breast cancer for a patient.

That sounds useful! What should I do now?

Please audit
– these Machine Learning Specialization (Coursera) courses and
– this Applied Machine Learning in Python (Coursera) course.

At the same time, please read
– this Aurelien Geron (2022). Hands-On Machine Learning with Scikit-Learn, Keras and TensorFlow. O’Reilly Media book and
– this Brett Lantz (2023). Machine Learning with R. Packt Publishing book.

At the same time, please audit
– this Neural Networks and Deep Learning course and
– this Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization course and
– this Structuring Machine Learning Projects course.

After that please read this Michael A. Nielsen (2015). Neural Networks and Deep Learning. Determination Press book.

After that please watch
– this MIT 6.034 – Artificial Intelligence, Fall 2010 course videos (Readings).

After that please read
– this Tom M. Mitchell (1997). Machine Learning. McGraw-Hill Education book, and
– this Christopher M. Bishop (2006). Pattern Recognition and Machine Learning. Springer book.

After that please audit this RL Course by David Silver course (Slides) and these Reinforcement Learning Specialization (Coursera) courses, and read this Richard S. Sutton and Andrew G. Barto (2018). Reinforcement Learning. The MIT Press book at the same time.

Supervised Learning Terminology Review:

Artificial Intelligence.
Machine Learning.
Deep Learning.
Linear Regression: Y = θᵀX + Ε.
Cost Function measures how good/bad your model is.
Mean Square Error (MSE) measures the average of the squares of the errors.
Gradient Descent, Learning Rate.
Batch Gradient Descent.
The R-Squared Test measures the proportion of the total variance in the output (y) that can be explained by the variation in x. It can be used to evaluate how good a “fit” some model is on the given data.
Stochastic Gradient Descent.
Mini-Batch Gradient Descent.
Overfitting: machine learning model gives accurate predictions for training data but not for new data.
Regularization: Ridge Regression, Lasso Regression, Elastic Net, Early Stopping.
Normalization.
Logistic Regression.
Sigmoid Function.
Binary Cross Entropy Loss Function, Log Loss Function.
One Hot Encoding.
The Softmax function takes an N-dimensional vector of arbitrary real values and produces another N-dimensional vector with real values in the range (0, 1) that add up to 1.0.
Softmax Regression.
Support Vector Machines.
Decision Trees.
K-Nearest Neighbors.
McCulloch-Pitts Neuron.
Linear Threshold Unit with threshold T calculates the weighted sum of its inputs, and then outputs 0 if this sum is less than T, and 1 if the sum is greater than T.
Perceptron.
Artificial Neural Networks.
Backpropagation.
Activation Functions: Rectified Linear Unit (ReLU), Leaky ReLU, Sigmoid, Hyperbolic Tangent.
Batch Normalization.
Learning Rate Decay.
Exponentially Weighted Averages.
Gradient Descent Optimization Algorithms: Momentum, Adagrad, Adadelta, RMSprop, Adam.
Regularization: Dropout.
The Joint Probability Table.
Bayesian Networks.
Naive Bayes Inference.

Unsupervised Learning Terminology Review:

K-Means.
Principal Component Analysis.
User-Based Collaborative Filtering.
Item-based Collaborative Filtering.
Matrix Factorization.

Reinforcement Learning Terminology Review:

k-armed Bandit Problem.
Sample-Average Method.
Greedy Action.
Exploration and Exploitation.
ϵ-Greedy Action Selection.
Bandit Algorithm.
Exponential Recency-Weighted Average.
Optimistic Initial Values.
Upper-Confidence-Bound Action Selection.
Rewards.
Agent, Actions, World or Environment.
History, States, Terminal State, Environment State, Agent State, Information State.
Fully Observable Environments.
Partially Observable Environments.
Policy, Value Function, Model.
Value Based RL Agent, Policy Based RL Agent, Actor Critic RL Agent.
Model Free RL Agent, Model Based RL Agent.
Learning Problem and Planning Problem.
Prediction and Control.
Markov Property.
State Transition Matrix.
Markov Process.
Episodic Tasks.
Continuing Tasks.
Horizon (H): Number of time steps in each episode, can be infinite.
Markov Reward Process.
Discount Factor, Discount Rate: 0 ≤ γ ≤ 1.
Return.
Discounted Return: Discounted sum of rewards from time step t to horizon H.
State-Value Function.
Bellman Equation for Markov Reward Processes.
Markov Decision Process.
Policy: Mapping from states to actions. Deterministic policy: π (s) = a. Stochastic policy: π (a|s) = P(aₜ=a|sₜ=s).
State Value Function – Vπ(s): The expected return starting from state s following policy π.
Bellman Expectation Equation for Vπ.
Action Value Function (also known as State-Action Value Fucntion or the Quality Function) – Qπ(s, a): The expected return starting from state $s$ , taking action $a$ , then following policy $π$ .
Bellman Expectation Equation for Qπ.
Optimal State Value Function.
Optimal Action Value Function.
Bellman Optimality Equation for v*.
Bellman Optimality Equation for q*.
Optimal Policies.
Dynamic Programming.
Iterative Policy Evaluation.
Policy Improvement.
Policy Improvement Theorem.
Policy Iteration.
Value Iteration.
Synchronous Dynamic Programming.
Asynchronous Dynamic Programming.
Generalized Policy Iteration.
Bootstrapping: Updating estimates on the basis of other estimates.
Monte-Carlo Policy Evaluation.
First-Visit Monte-Carlo Policy Evaluation.
Every-Visit Monte-Carlo Policy Evaluation.
Incremental Mean.
Incremental Monte-Carlo Updates.
Temporal-Difference Learning.
Forward-View TD(λ).
Eligibility Traces.
Backward-View TD(λ).
On-Policy Learning.
Off-Policy Learning.
ϵ-Greedy Exploration.
ϵ-greedy Policies: Most of the time they choose an action that has maximal estimated action value, but with probability ϵ they instead select an action at random.
Monte-Carlo Policy Iteration. Policy evaluation: Monte-Carlo policy evaluation, Q = qπ. Policy improvement: ϵ-greedy policy improvement.
Monte-Carlo Control. Policy evaluation: Monte-Carlo policy evaluation, Q ≈ qπ. Policy improvement: ϵ-greedy policy improvement.
Exploring Starts: Specify that the episodes start in a state–action pair, and that every pair has a nonzero probability of being selected as the start.
Monte Carlo Control Exploring Starts.
Greedy in the Limit with Innite Exploration (GLIE) Monte-Carlo Control.
ϵ-soft Policies: Policies for which π(a|s) ≥ ϵ/|A(s)| for all states and actions, for some ϵ > 0.
On-Policy First-Visit MC Control.
SARSA: State (S), Action (A), Reward (R), State (S’), Action (A’).
On-Policy Control with SARSA. Policy evaluation: SARSA evaluation, Q ≈ qπ. Policy improvement: ϵ-greedy policy improvement.
Forward-View SARSA (λ).
Backward-View SARSA (λ).
Target Policy.
Behavior Policy.
Importance Sampling: Use samples from one distribution to estimate the expectation of a different distribution.
Importance Sampling for Off-Policy Monte-Carlo.
Importance Sampling for Off-Policy TD.
Q-Learning: Next action is chosen using behaviour policy. Q is updated using alternative successor action.
Off-Policy Control with Q-Learning.
Expected SARSA.
Value Function Approximation.
Function Approximators.
Differentiable Function Approximators.
Feature Vectors.
State Aggregation.
Coarse Coding.
Tile Coding.
Continuous States.
Incremental Prediction Algorithms.
Control with Value Function Approximation. Policy evaluation: Approximate policy evaluation, q(.,., w) ≈ qπ. Policy improvement: ϵ-greedy policy improvement.
Learning State Action Value function: Replay Buffer: 10,000 tuples most recent (s, a, R(s), s’). x = (s, a) → Q(θ) → y = R(s) + γmaxQ(s’, a’, θ). Loss = [R(s) + γmaxQ(s’, a’; θ)] − Q(s, a; θ).
Expected SARSA with Function Approximation.
Target Network: A separate neural network for generating the y targets. It has the same architecture as the original Q-Network. Loss = [R(s) + γmaxTargetQ(s’, a’; θ′)] − Q(s, a; θ). Every C time steps we will use the TargetQ-Network to generate the y targets and update the weights of the TargetQ-Network using the weights of the Q-Network.
Soft Updates: $θ^{'}$ $θ^{'} \leftarrow τ θ + (1 - τ) θ^{'}$ , where $θ^{'}$ and $θ$ represent the weights of the target network and the current network, respectively.
Deep Q-learning.
Linear Least Squares Prediction Algorithms.
Least Squares Policy Iteration. Policy evaluation: Least squares Q-Learning. Policy improvement: Greedy policy improvement.
Average Reward.
Discounted Returns, Returns for Average Reward.
Stochastic Policies.
Softmax Policies.
Gaussian Policies.
Policy Objective Functions: Start State Objective, Average Reward Objective and Average Value Objective.
Score Function.
Policy Gradient Theorem.
Monte-Carlo Policy Gradient (REINFORCE).
Action-Value Actor-Critic: Critic updates w by linear TD(0). Actor updates θ by policy gradient.
The Tabular Dyna-Q Algorithm.
The Dyna-Q+ Algorithm.
Forward Search.
Simulation-Based Search.
Monte-Carlo Tree Search.
Temporal-Difference Search.
Dyna-2.

After finishing learning about machine learning please click Topic 23 – Introduction to Computer Vision to continue.

Software Development

Category Archives: Machine Learning

Topic 24 – Introduction to Nature Language Processing

Topic 23 – Introduction to Computer Vision

Topic 22 – Introduction to Machine Learning

Software development and software engineering research