Topic 22 – Introduction to Machine Learning

Why do I need to learn about machine learning?

Machine learning has been used to solve many important and difficult problems, including speech recognition, speech synthesis, image recognition, autonomous driving, and chatbots. Today, a key skill for software developers is the ability to use machine learning algorithms to solve real-world problems.

What can I do after finishing learning about machine learning?

You will be to create software that could recognize car plate number from an image, identify probability of breast cancer for a patient.

That sounds useful! What should I do now?

First, please audit these couses to learn the core concepts of machine learning and gain hands-on experience with them:

After that, please read the following books to reinforce your theoretical understanding and practical competence in machine learning:

After that, please audit this course and read its readings to learn the core approaches and algorithms for building artificial intelligence systems: MIT 6.034 – Artificial Intelligence, Fall 2010 (Readings).

After that, please read the following books to to study the mathematical foundations underlying machine learning algorithms:

After that, please audit the following courses and read the book below to learn the core concepts and algorithms of reinforcement learning:

Supervised Learning Terminology Review:

Artificial Intelligence.
Machine Learning.
Deep Learning.
Linear Regression: Y = θᵀX + Ε.
Cost Function measures how good/bad your model is.
Mean Square Error (MSE) measures the average of the squares of the errors.
Gradient Descent, Learning Rate.
Batch Gradient Descent.
The R-Squared Test measures the proportion of the total variance in the output (y) that can be explained by the variation in x. It can be used to evaluate how good a “fit” some model is on the given data.
Stochastic Gradient Descent.
Mini-Batch Gradient Descent.
Overfitting: machine learning model gives accurate predictions for training data but not for new data.
Regularization: Ridge Regression, Lasso Regression, Elastic Net, Early Stopping.
Normalization.
Logistic Regression.
Sigmoid Function.
Binary Cross Entropy Loss Function, Log Loss Function.
One Hot Encoding.
The Softmax function takes an N-dimensional vector of arbitrary real values and produces another N-dimensional vector with real values in the range (0, 1) that add up to 1.0.
Softmax Regression.
Gradient Ascent.
Newton’s Method.
Support Vector Machines.
Decision Trees.
Parametric vs. Non-parametric Models.
K-Nearest Neighbors.
Locally Weighted Regression.
McCulloch-Pitts Neuron.
Linear Threshold Unit with threshold T calculates the weighted sum of its inputs, and then outputs 0 if this sum is less than T, and 1 if the sum is greater than T.
Perceptron.
Artificial Neural Networks.
Backpropagation.
Activation Functions: Rectified Linear Unit (ReLU), Leaky ReLU, Sigmoid, Hyperbolic Tangent.
Batch Normalization.
Learning Rate Decay.
Exponentially Weighted Averages.
Gradient Descent Optimization Algorithms: Momentum, Adagrad, Adadelta, RMSprop, Adam.
Regularization: Dropout.
The Joint Probability Table.
Bayesian Networks.
Naive Bayes Inference.

Unsupervised Learning Terminology Review:

K-Means.
Principal Component Analysis.
User-Based Collaborative Filtering.
Item-based Collaborative Filtering.
Matrix Factorization.

Reinforcement Learning Terminology Review:

k-armed Bandit Problem.
Sample-Average Method.
Greedy Action.
Exploration and Exploitation.
ϵ-Greedy Action Selection.
Bandit Algorithm.
Exponential Recency-Weighted Average.
Optimistic Initial Values.
Upper-Confidence-Bound Action Selection.
Rewards.
Agent, Actions, World or Environment.
History, States, Terminal State, Environment State, Agent State, Information State.
Fully Observable Environments.
Partially Observable Environments.
Policy, Value Function, Model.
Value Based RL Agent, Policy Based RL Agent, Actor Critic RL Agent.
Model Free RL Agent, Model Based RL Agent.
Learning Problem and Planning Problem.
Prediction and Control.
Markov Property.
State Transition Matrix.
Markov Process.
Episodic Tasks.
Continuing Tasks.
Horizon (H): Number of time steps in each episode, can be infinite.
Markov Reward Process.
Discount Factor, Discount Rate: 0 ≤ γ ≤ 1.
Return.
Discounted Return: Discounted sum of rewards from time step t to horizon H.
State-Value Function.
Bellman Equation for Markov Reward Processes.
Markov Decision Process.
Policy: Mapping from states to actions. Deterministic policy: π (s) = a. Stochastic policy: π (a|s) = P(aₜ=a|sₜ=s).
State Value Function – Vπ(s): The expected return starting from state s following policy π.
Bellman Expectation Equation for Vπ.
Action Value Function (also known as State-Action Value Fucntion or the Quality Function) – Qπ(s, a): The expected return starting from state $s$ , taking action $a$ , then following policy $π$ .
Bellman Expectation Equation for Qπ.
Optimal State Value Function.
Optimal Action Value Function.
Bellman Optimality Equation for v*.
Bellman Optimality Equation for q*.
Optimal Policies.
Dynamic Programming.
Iterative Policy Evaluation.
Policy Improvement.
Policy Improvement Theorem.
Policy Iteration.
Value Iteration.
Synchronous Dynamic Programming.
Asynchronous Dynamic Programming.
Generalized Policy Iteration.
Bootstrapping: Updating estimates on the basis of other estimates.
Monte-Carlo Policy Evaluation.
First-Visit Monte-Carlo Policy Evaluation.
Every-Visit Monte-Carlo Policy Evaluation.
Incremental Mean.
Incremental Monte-Carlo Updates.
Temporal-Difference Learning.
Forward-View TD(λ).
Eligibility Traces.
Backward-View TD(λ).
On-Policy Learning.
Off-Policy Learning.
ϵ-Greedy Exploration.
ϵ-greedy Policies: Most of the time they choose an action that has maximal estimated action value, but with probability ϵ they instead select an action at random.
Monte-Carlo Policy Iteration. Policy evaluation: Monte-Carlo policy evaluation, Q = qπ. Policy improvement: ϵ-greedy policy improvement.
Monte-Carlo Control. Policy evaluation: Monte-Carlo policy evaluation, Q ≈ qπ. Policy improvement: ϵ-greedy policy improvement.
Exploring Starts: Specify that the episodes start in a state–action pair, and that every pair has a nonzero probability of being selected as the start.
Monte Carlo Control Exploring Starts.
Greedy in the Limit with Innite Exploration (GLIE) Monte-Carlo Control.
ϵ-soft Policies: Policies for which π(a|s) ≥ ϵ/|A(s)| for all states and actions, for some ϵ > 0.
On-Policy First-Visit MC Control.
SARSA: State (S), Action (A), Reward (R), State (S’), Action (A’).
On-Policy Control with SARSA. Policy evaluation: SARSA evaluation, Q ≈ qπ. Policy improvement: ϵ-greedy policy improvement.
Forward-View SARSA (λ).
Backward-View SARSA (λ).
Target Policy.
Behavior Policy.
Importance Sampling: Use samples from one distribution to estimate the expectation of a different distribution.
Importance Sampling for Off-Policy Monte-Carlo.
Importance Sampling for Off-Policy TD.
Q-Learning: Next action is chosen using behaviour policy. Q is updated using alternative successor action.
Off-Policy Control with Q-Learning.
Expected SARSA.
Value Function Approximation.
Function Approximators.
Differentiable Function Approximators.
Feature Vectors.
State Aggregation.
Coarse Coding.
Tile Coding.
Continuous States.
Incremental Prediction Algorithms.
Control with Value Function Approximation. Policy evaluation: Approximate policy evaluation, q(.,., w) ≈ qπ. Policy improvement: ϵ-greedy policy improvement.
Learning State Action Value function: Replay Buffer: 10,000 tuples most recent (s, a, R(s), s’). x = (s, a) → Q(θ) → y = R(s) + γmaxQ(s’, a’, θ). Loss = [R(s) + γmaxQ(s’, a’; θ)] − Q(s, a; θ).
Expected SARSA with Function Approximation.
Target Network: A separate neural network for generating the y targets. It has the same architecture as the original Q-Network. Loss = [R(s) + γmaxTargetQ(s’, a’; θ′)] − Q(s, a; θ). Every C time steps we will use the TargetQ-Network to generate the y targets and update the weights of the TargetQ-Network using the weights of the Q-Network.
Soft Updates: $θ^{'}$ $θ^{'} \leftarrow τ θ + (1 - τ) θ^{'}$ , where $θ^{'}$ and $θ$ represent the weights of the target network and the current network, respectively.
Deep Q-learning.
Linear Least Squares Prediction Algorithms.
Least Squares Policy Iteration. Policy evaluation: Least squares Q-Learning. Policy improvement: Greedy policy improvement.
Average Reward.
Discounted Returns, Returns for Average Reward.
Stochastic Policies.
Softmax Policies.
Gaussian Policies.
Policy Objective Functions: Start State Objective, Average Reward Objective and Average Value Objective.
Score Function.
Policy Gradient Theorem.
Monte-Carlo Policy Gradient (REINFORCE).
Action-Value Actor-Critic: Critic updates w by linear TD(0). Actor updates θ by policy gradient.
The Tabular Dyna-Q Algorithm.
The Dyna-Q+ Algorithm.
Forward Search.
Simulation-Based Search.
Monte-Carlo Tree Search.
Temporal-Difference Search.
Dyna-2.

Probabilistic Machine Learning Terminology Review:

Probabilistic Machine Learning
Non-Probabilistic Machine Learning
Algorithmic Machine Learning.
Array Programming.
Frequentist and Bayesian Approaches.

After finishing machine learning, please click on Topic 23 – Introduction to Computer Vision to continue.

(Visited 135 times, 1 visits today)

Software Development

Topic 22 – Introduction to Machine Learning

Leave a Reply Cancel reply

Software development and software engineering research