**Why do I need to learn about machine learning?**

Machine learning has been solving many important difficult problems. A few of them include speech recognition, speech synthesis, image recognition, autonomous driving and chat bots.

Nowadays a key skill of software developer is the ability to use machine learning algorithms solve real-world problems.

**What can I do after finishing learning about machine learning?**

You will be to create software that could recognize car plate number from an image, identify probability of breast cancer for a patient.

**That sounds useful! What should I do now?**

Please audit

– these Machine Learning Specialization (Coursera) courses and

– this Applied Machine Learning in Python (Coursera) course.

At the same time, please read

– this Aurelien Geron (2022). Hands-On Machine Learning with Scikit-Learn, Keras and TensorFlow. O’Reilly Media book and

– this Brett Lantz (2023). Machine Learning with R. Packt Publishing book.

At the same time, please audit

– this Neural Networks and Deep Learning course and

– this Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization course and

– this Structuring Machine Learning Projects course.

After that please read this Michael A. Nielsen (2015). Neural Networks and Deep Learning. Determination Press book.

After that please watch

– this MIT 6.034 – Artificial Intelligence, Fall 2010 course (Readings).

After that please read

– this Tom M. Mitchell (1997). Machine Learning. McGraw-Hill Education book, and

– this Christopher M. Bishop (2006). Pattern Recognition and Machine Learning. Springer book.

After that please audit this RL Course by David Silver course (Slides) and these Reinforcement Learning Specialization (Coursera) courses, and read this Richard S. Sutton and Andrew G. Barto (2018). Reinforcement Learning. The MIT Press book at the same time.

**Supervised Learning Terminology Review:**

- Artificial Intelligence.
- Machine Learning.
- Deep Learning.
- Linear Regression:
**Y** = **θ**ᵀ**X** + **Ε.**
*Cost Function* measures how good/bad your model is.
- Mean Square Error (MSE) measures the average of the squares of the errors.
- Gradient Descent, Learning Rate.
- Batch Gradient Descent.
*The R-Squared Test* measures the proportion of the total variance in the output (y) that can be explained by the variation in x. It can be used to evaluate how good a “fit” some model is on the given data.
- Stochastic Gradient Descent.
- Mini-Batch Gradient Descent.
- Overfitting: machine learning model gives accurate predictions for training data but not for new data.
- Regularization: Ridge Regression, Lasso Regression, Elastic Net, Early Stopping.
- Normalization.
- Logistic Regression.
- Sigmoid Function.
- Binary Cross Entropy Loss Function, Log Loss Function.
- One Hot Encoding.
- The
*Softmax* function takes an N-dimensional vector of arbitrary real values and produces another N-dimensional vector with real values in the range (0, 1) that add up to 1.0.
- Softmax Regression.
- Support Vector Machines.
- Decision Trees.
- K-Nearest Neighbors.
- McCulloch-Pitts Neuron.
*Linear Threshold Unit* with threshold T calculates the weighted sum of its inputs, and then outputs 0 if this sum is less than T, and 1 if the sum is greater than T.
- Perceptron.
- Artificial Neural Networks.
- Backpropagation.
- Activation Functions: Rectified Linear Unit (ReLU), Leaky ReLU, Sigmoid, Hyperbolic Tangent.
- Batch Normalization.
- Learning Rate Decay.
- Exponentially Weighted Averages.
- Gradient Descent Optimization Algorithms: Momentum, Adagrad, Adadelta, RMSprop, Adam.
- Regularization: Dropout.
- The Joint Probability Table.
- Bayesian Networks.
- Naive Bayes Inference.

**Unsupervised Learning Terminology Review:**

- K-Means.
- Principal Component Analysis.
- User-Based Collaborative Filtering.
- Item-based Collaborative Filtering.
- Matrix Factorization.

**Reinforcement Learning Terminology Review:**

- k-armed Bandit Problem.
- Sample-Average Method.
- Greedy Action.
- Exploration and Exploitation.
- ϵ-Greedy Action Selection.
- Bandit Algorithm.
- Exponential Recency-Weighted Average.
- Optimistic Initial Values.
- Upper-Confidence-Bound Action Selection.
- Rewards.
- Agent, Actions, World or Environment.
- History, States, Terminal State, Environment State, Agent State, Information State.
- Fully Observable Environments.
- Partially Observable Environments.
- Policy, Value Function, Model.
- Value Based RL Agent, Policy Based RL Agent, Actor Critic RL Agent.
- Model Free RL Agent, Model Based RL Agent.
- Learning Problem and Planning Problem.
- Prediction and Control.
- Markov Property.
- State Transition Matrix.
- Markov Process.
- Episodic Tasks.
- Continuing Tasks.
- Horizon (H): Number of time steps in each episode, can be infinite.
- Markov Reward Process.
- Discount Factor, Discount Rate: 0 ≤ γ ≤ 1.
- Return.
- Discounted Return: Discounted sum of rewards from time step t to horizon H.
- State-Value Function.
- Bellman Equation for Markov Reward Processes.
- Markov Decision Process.
- Policy: Mapping from states to actions. Deterministic policy: π (s) = a. Stochastic policy: π (a|s) = P(aₜ=a|sₜ=s).
- State Value Function – Vπ(s): The expected return starting from state s
*following* policy π.
- Bellman Expectation Equation for Vπ.
- Action Value Function (also known as State-Action Value Fucntion or the Quality Function) – Qπ(s, a): The expected return starting from state s,
*taking action a, then following policy π*.
- Bellman Expectation Equation for Qπ.
- Optimal State Value Function.
- Optimal Action Value Function.
- Bellman Optimality Equation for v*.
- Bellman Optimality Equation for q*.
- Optimal Policies.
- Dynamic Programming.
- Iterative Policy Evaluation.
- Policy Improvement.
- Policy Improvement Theorem.
- Policy Iteration.
- Value Iteration.
- Synchronous Dynamic Programming.
- Asynchronous Dynamic Programming.
- Generalized Policy Iteration.
- Bootstrapping: Updating estimates on the basis of other estimates.
- Monte-Carlo Policy Evaluation.
- First-Visit Monte-Carlo Policy Evaluation.
- Every-Visit Monte-Carlo Policy Evaluation.
- Incremental Mean.
- Incremental Monte-Carlo Updates.
- Temporal-Difference Learning.
- Forward-View TD(λ).
- Eligibility Traces.
- Backward-View TD(λ).
- On-Policy Learning.
- Off-Policy Learning.
- ϵ-Greedy Exploration.
- ϵ-greedy Policies: Most of the time they choose an action that has maximal estimated action value, but with probability ϵ they instead select an action at random.
- Monte-Carlo Policy Iteration. Policy evaluation: Monte-Carlo policy evaluation, Q = qπ. Policy improvement: ϵ-greedy policy improvement.
- Monte-Carlo Control. Policy evaluation: Monte-Carlo policy evaluation, Q ≈ qπ. Policy improvement: ϵ-greedy policy improvement.
- Exploring Starts: Specify that the episodes start in a state–action pair, and that every pair has a nonzero probability of being selected as the start.
- Monte Carlo Control Exploring Starts.
- Greedy in the Limit with Innite Exploration (GLIE) Monte-Carlo Control.
- ϵ-soft Policies: Policies for which π(a|s) ≥ ϵ/|A(s)| for all states and actions, for some ϵ > 0.
- On-Policy First-Visit MC Control.
- SARSA: State (S), Action (A), Reward (R), State (S’), Action (A’).
- On-Policy Control with SARSA. Policy evaluation: SARSA evaluation, Q ≈ qπ. Policy improvement: ϵ-greedy policy improvement.
- Forward-View SARSA (λ).
- Backward-View SARSA (λ).
- Target Policy.
- Behavior Policy.
- Importance Sampling: Use samples from one distribution to estimate the
*expectation* of a different distribution.
- Importance Sampling for Off-Policy Monte-Carlo.
- Importance Sampling for Off-Policy TD.
- Q-Learning: Next action is chosen using behaviour policy. Q is updated using alternative successor action.
- Off-Policy Control with Q-Learning.
- Expected SARSA.
- Value Function Approximation.
- Function Approximators.
- Differentiable Function Approximators.
- Feature Vectors.
- State Aggregation.
- Coarse Coding.
- Tile Coding.
- Continuous States.
- Incremental Prediction Algorithms.
- Control with Value Function Approximation. Policy evaluation: Approximate policy evaluation, q(.,.,
**w**) ≈ qπ. Policy improvement: ϵ-greedy policy improvement.
- Learning State Action Value function: Replay Buffer: 10,000 tuples most recent (s, a, R(s), s’). x = (s, a) → Q(θ) → y = R(s) + γmaxQ(s’, a’, θ). Loss = [R(s) + γmaxQ(s’, a’; θ)] − Q(s, a; θ).
- Expected SARSA with Function Approximation.
- Target Network: A separate neural network for generating the y targets. It has the same architecture as the original Q-Network. Loss = [R(s) + γmaxTargetQ(s’, a’; θ′)] − Q(s, a; θ). Every C time steps we will use the TargetQ-Network to generate the y targets and update the weights of the TargetQ-Network using the weights of the Q-Network.
- Soft Updates: θ′ ← 0.001θ + 0.999θ′, where θ′ and θ represent the weights of the target network and the current network, respectively.
- Deep Q-learning.
- Linear Least Squares Prediction Algorithms.
- Least Squares Policy Iteration. Policy evaluation: Least squares Q-Learning. Policy improvement: Greedy policy improvement.
- Average Reward.
- Discounted Returns, Returns for Average Reward.

- Stochastic Policies.
- Softmax Policies.
- Gaussian Policies.
- Policy Objective Functions: Start State Objective, Average Reward Objective and Average Value Objective.
- Score Function.
- Policy Gradient Theorem.
- Monte-Carlo Policy Gradient (REINFORCE).
- Action-Value Actor-Critic: Critic updates w by linear TD(0). Actor updates θ by policy gradient.
- The Tabular Dyna-Q Algorithm.
- The Dyna-Q+ Algorithm.
- Forward Search.
- Simulation-Based Search.
- Monte-Carlo Tree Search.
- Temporal-Difference Search.
- Dyna-2.

After finishing learning about machine learning please click Topic 23 – Introduction to Computer Vision to continue.