Why do I need to learn about machine learning?
Machine learning has been used to solve many important and difficult problems, including speech recognition, speech synthesis, image recognition, autonomous driving, and chatbots. Today, a key skill for software developers is the ability to use machine learning algorithms to solve real-world problems.
What can I do after finishing learning about machine learning?
You will be to create software that could recognize car plate number from an image, identify probability of breast cancer for a patient.
That sounds useful! What should I do now?
First, please audit these couses to learn the core concepts of machine learning and gain hands-on experience with them:
After that, please read the following books to reinforce your theoretical understanding and practical competence in machine learning:
After that, please audit this course and read its readings to learn the core approaches and algorithms for building artificial intelligence systems: MIT 6.034 – Artificial Intelligence, Fall 2010 (Readings).
After that, please read the following books to to study the mathematical foundations underlying machine learning algorithms:
After that, please audit the following courses and read the book below to learn the core concepts and algorithms of reinforcement learning:
Supervised Learning Terminology Review:
- Artificial Intelligence.
- Machine Learning.
- Deep Learning.
- Linear Regression: Y = θᵀX + Ε.
- Cost Function measures how good/bad your model is.
- Mean Square Error (MSE) measures the average of the squares of the errors.
- Gradient Descent, Learning Rate.
- Batch Gradient Descent.
- The R-Squared Test measures the proportion of the total variance in the output (y) that can be explained by the variation in x. It can be used to evaluate how good a “fit” some model is on the given data.
- Stochastic Gradient Descent.
- Mini-Batch Gradient Descent.
- Overfitting: machine learning model gives accurate predictions for training data but not for new data.
- Regularization: Ridge Regression, Lasso Regression, Elastic Net, Early Stopping.
- Normalization.
- Logistic Regression.
- Sigmoid Function.
- Binary Cross Entropy Loss Function, Log Loss Function.
- One Hot Encoding.
- The Softmax function takes an N-dimensional vector of arbitrary real values and produces another N-dimensional vector with real values in the range (0, 1) that add up to 1.0.
- Softmax Regression.
- Gradient Ascent.
- Newton’s Method.
- Support Vector Machines.
- Decision Trees.
- Parametric vs. Non-parametric Models.
- K-Nearest Neighbors.
- Locally Weighted Regression.
- McCulloch-Pitts Neuron.
- Linear Threshold Unit with threshold T calculates the weighted sum of its inputs, and then outputs 0 if this sum is less than T, and 1 if the sum is greater than T.
- Perceptron.
- Artificial Neural Networks.
- Backpropagation.
- Activation Functions: Rectified Linear Unit (ReLU), Leaky ReLU, Sigmoid, Hyperbolic Tangent.
- Batch Normalization.
- Learning Rate Decay.
- Exponentially Weighted Averages.
- Gradient Descent Optimization Algorithms: Momentum, Adagrad, Adadelta, RMSprop, Adam.
- Regularization: Dropout.
- The Joint Probability Table.
- Bayesian Networks.
- Naive Bayes Inference.
Unsupervised Learning Terminology Review:
- K-Means.
- Principal Component Analysis.
- User-Based Collaborative Filtering.
- Item-based Collaborative Filtering.
- Matrix Factorization.
Reinforcement Learning Terminology Review:
- k-armed Bandit Problem.
- Sample-Average Method.
- Greedy Action.
- Exploration and Exploitation.
- ϵ-Greedy Action Selection.
- Bandit Algorithm.
- Exponential Recency-Weighted Average.
- Optimistic Initial Values.
- Upper-Confidence-Bound Action Selection.
- Rewards.
- Agent, Actions, World or Environment.
- History, States, Terminal State, Environment State, Agent State, Information State.
- Fully Observable Environments.
- Partially Observable Environments.
- Policy, Value Function, Model.
- Value Based RL Agent, Policy Based RL Agent, Actor Critic RL Agent.
- Model Free RL Agent, Model Based RL Agent.
- Learning Problem and Planning Problem.
- Prediction and Control.
- Markov Property.
- State Transition Matrix.
- Markov Process.
- Episodic Tasks.
- Continuing Tasks.
- Horizon (H): Number of time steps in each episode, can be infinite.
- Markov Reward Process.
- Discount Factor, Discount Rate: 0 ≤ γ ≤ 1.
- Return.
- Discounted Return: Discounted sum of rewards from time step t to horizon H.
- State-Value Function.
- Bellman Equation for Markov Reward Processes.
- Markov Decision Process.
- Policy: Mapping from states to actions. Deterministic policy: π (s) = a. Stochastic policy: π (a|s) = P(aₜ=a|sₜ=s).
- State Value Function – Vπ(s): The expected return starting from state s following policy π.
- Bellman Expectation Equation for Vπ.
- Action Value Function (also known as State-Action Value Fucntion or the Quality Function) – Qπ(s, a): The expected return starting from state s, taking action a, then following policy π.
- Bellman Expectation Equation for Qπ.
- Optimal State Value Function.
- Optimal Action Value Function.
- Bellman Optimality Equation for v*.
- Bellman Optimality Equation for q*.
- Optimal Policies.
- Dynamic Programming.
- Iterative Policy Evaluation.
- Policy Improvement.
- Policy Improvement Theorem.
- Policy Iteration.
- Value Iteration.
- Synchronous Dynamic Programming.
- Asynchronous Dynamic Programming.
- Generalized Policy Iteration.
- Bootstrapping: Updating estimates on the basis of other estimates.
- Monte-Carlo Policy Evaluation.
- First-Visit Monte-Carlo Policy Evaluation.
- Every-Visit Monte-Carlo Policy Evaluation.
- Incremental Mean.
- Incremental Monte-Carlo Updates.
- Temporal-Difference Learning.
- Forward-View TD(λ).
- Eligibility Traces.
- Backward-View TD(λ).
- On-Policy Learning.
- Off-Policy Learning.
- ϵ-Greedy Exploration.
- ϵ-greedy Policies: Most of the time they choose an action that has maximal estimated action value, but with probability ϵ they instead select an action at random.
- Monte-Carlo Policy Iteration. Policy evaluation: Monte-Carlo policy evaluation, Q = qπ. Policy improvement: ϵ-greedy policy improvement.
- Monte-Carlo Control. Policy evaluation: Monte-Carlo policy evaluation, Q ≈ qπ. Policy improvement: ϵ-greedy policy improvement.
- Exploring Starts: Specify that the episodes start in a state–action pair, and that every pair has a nonzero probability of being selected as the start.
- Monte Carlo Control Exploring Starts.
- Greedy in the Limit with Innite Exploration (GLIE) Monte-Carlo Control.
- ϵ-soft Policies: Policies for which π(a|s) ≥ ϵ/|A(s)| for all states and actions, for some ϵ > 0.
- On-Policy First-Visit MC Control.
- SARSA: State (S), Action (A), Reward (R), State (S’), Action (A’).
- On-Policy Control with SARSA. Policy evaluation: SARSA evaluation, Q ≈ qπ. Policy improvement: ϵ-greedy policy improvement.
- Forward-View SARSA (λ).
- Backward-View SARSA (λ).
- Target Policy.
- Behavior Policy.
- Importance Sampling: Use samples from one distribution to estimate the expectation of a different distribution.
- Importance Sampling for Off-Policy Monte-Carlo.
- Importance Sampling for Off-Policy TD.
- Q-Learning: Next action is chosen using behaviour policy. Q is updated using alternative successor action.
- Off-Policy Control with Q-Learning.
- Expected SARSA.
- Value Function Approximation.
- Function Approximators.
- Differentiable Function Approximators.
- Feature Vectors.
- State Aggregation.
- Coarse Coding.
- Tile Coding.
- Continuous States.
- Incremental Prediction Algorithms.
- Control with Value Function Approximation. Policy evaluation: Approximate policy evaluation, q(.,., w) ≈ qπ. Policy improvement: ϵ-greedy policy improvement.
- Learning State Action Value function: Replay Buffer: 10,000 tuples most recent (s, a, R(s), s’). x = (s, a) → Q(θ) → y = R(s) + γmaxQ(s’, a’, θ). Loss = [R(s) + γmaxQ(s’, a’; θ)] − Q(s, a; θ).
- Expected SARSA with Function Approximation.
- Target Network: A separate neural network for generating the y targets. It has the same architecture as the original Q-Network. Loss = [R(s) + γmaxTargetQ(s’, a’; θ′)] − Q(s, a; θ). Every C time steps we will use the TargetQ-Network to generate the y targets and update the weights of the TargetQ-Network using the weights of the Q-Network.
- Soft Updates: θ′ ← 0.001θ + 0.999θ′, where θ′ and θ represent the weights of the target network and the current network, respectively.
- Deep Q-learning.
- Linear Least Squares Prediction Algorithms.
- Least Squares Policy Iteration. Policy evaluation: Least squares Q-Learning. Policy improvement: Greedy policy improvement.
- Average Reward.
- Discounted Returns, Returns for Average Reward.
- Stochastic Policies.
- Softmax Policies.
- Gaussian Policies.
- Policy Objective Functions: Start State Objective, Average Reward Objective and Average Value Objective.
- Score Function.
- Policy Gradient Theorem.
- Monte-Carlo Policy Gradient (REINFORCE).
- Action-Value Actor-Critic: Critic updates w by linear TD(0). Actor updates θ by policy gradient.
- The Tabular Dyna-Q Algorithm.
- The Dyna-Q+ Algorithm.
- Forward Search.
- Simulation-Based Search.
- Monte-Carlo Tree Search.
- Temporal-Difference Search.
- Dyna-2.
Probabilistic Machine Learning Terminology Review:
- Probabilistic Machine Learning
- Non-Probabilistic Machine Learning
- Algorithmic Machine Learning.
- Array Programming.
- Frequentist and Bayesian Approaches.
After finishing machine learning, please click on Topic 23 – Introduction to Computer Vision to continue.