**Why do I need to learn about machine learning?**

Machine learning has solved many important difficult problems recently. A few of them include speech recognition, speech synthesis, image recognition, autonomous driving and chat bots.

Nowadays a key skill of software developer is the ability to use machine learning algorithms solve real-world problems.

**What can I do after finishing learning about machine learning?**

You will be to create software that could recognize car plate number from an image, identify probability of breast cancer for a patient.

**That sounds useful! What should I do now?**

Please audit

– this Machine Learning Specialization (Coursera) courses and

– this Applied Machine Learning in Python (Coursera) course.

At the same time, please read

– this Aurelien Geron (2022). Hands-On Machine Learning with Scikit-Learn, Keras and TensorFlow. O’Reilly Media book and

– this Brett Lantz (2023). Machine Learning with R. Packt Publishing book.

After that please watch

– this MIT 6.034 – Artificial Intelligence, Fall 2010 course (Readings).

After that please read

– this Tom M. Mitchell (1997). Machine Learning. McGraw-Hill Education book, and

– this Christopher M. Bishop (2006). Pattern Recognition and Machine Learning. Springer book.

After that please audit this Reinforcement Learning Specialization (Coursera) courses.

At the same time, please read this Richard S. Sutton and Andrew G. Barto (2020). Reinforcement Learning. The MIT Press book.

**Supervised Learning Terminology Review:**

- Artificial Intelligence.
- Machine Learning.
- Deep Learning.
- Linear Regression:
**Y**=**θ**ᵀ**X**+**Ε.** *Cost Function*measures how good/bad your model is.- Mean Square Error (MSE) measures the average of the squares of the errors.
- Gradient Descent, Learning Rate.
- Batch Gradient Descent.
*The R-Squared Test*measures the proportion of the total variance in the output (y) that can be explained by the variation in x. It can be used to evaluate how good a “fit” some model is on the given data.- Stochastic Gradient Descent.
- Mini-Batch Gradient Descent.
- Overfitting: machine learning model gives accurate predictions for training data but not for new data.
- Regularization: Ridge Regression, Lasso Regression, Elastic Net, Early Stopping.
- Normalization.
- Logistic Regression.
- Sigmoid Function.
- Binary Cross Entropy Loss Function, Log Loss Function.
- One Hot Encoding.
- The
*Softmax*function takes an N-dimensional vector of arbitrary real values and produces another N-dimensional vector with real values in the range (0, 1) that add up to 1.0. - Softmax Regression.
- Support Vector Machines.
- Decision Trees.
- K-Nearest Neighbors.
- McCulloch-Pitts Neuron.
*Linear Threshold Unit*with threshold T calculates the weighted sum of its inputs, and then outputs 0 if this sum is less than T, and 1 if the sum is greater than T.- Perceptron.
- Artificial Neural Networks.
- Backpropagation.
- Activation Functions: Rectified Linear Unit (ReLU), Leaky ReLU, Sigmoid, Hyperbolic Tangent.
- Batch Normalization.
- Gradient Descent Optimization Algorithms: Momentum, Adagrad, Adadelta, RMSprop, Adam.
- Regularization: Dropout.
- The Joint Probability Table.
- Bayesian Networks.
- Naive Bayes Inference.

**Unsupervised Learning Terminology Review:**

- K-Means.
- Principal Component Analysis.
- User-Based Collaborative Filtering.
- Item-based Collaborative Filtering.
- Matrix Factorization.

**Reinforcement Learning Terminology Review:**

- k-armed Bandit Problem.
- Bandit Algorithm.
- Exponential Recency-Weighted Average.
- Optimistic Initial Values.
- Upper-Confidence-Bound Action Selection.
- Agent.
- World.
- States, Terminal State.
- Actions.
- Rewards.
- Markov Decision Processes: Agent (π) >> Action (a) >> World >> State (s), Reward >> Agent (π). Model: (current state, action, reward of current state, next state) = (s, a, R(s), s’).
- Episodes.
- Continuing Tasks.
- Horizon (H): Number of time steps in each episode, can be infinite.
- Expected Return: Sum of rewards from time step t to horizon H.
- Discounted Return: Discounted sum of rewards from time step t to horizon H.
- Discount Factor, Discount Rate: 0 ≤ γ ≤ 1.
- Policy: Mapping from states to actions: π (s) = a or π (a|s) = P(aₜ=a|sₜ=s).
- State Value Function – Vπ(s): The expected return starting from state s
*following*policy π. - State-Action Value function, also known as the quality function – Qπ(s): The expected return starting from state ,
*taking action , then following policy*. - Bellman Equations.
- Optimal Value Functions.
- Optimal Policies.
- Bellman Optimality Equations.
- Policy Evaluation: (MDP, π) → Linear System Solver, Dynamic Programming → Vπ.
- Iterative Policy Evaluation.
- Policy Control, Policy Improvement.
- Policy Improvement Theorem.
- Greedy Policy.
- Policy Iteration: (MDP) → Dynamic Programming → Vπ-optimal.
- Value Iteration: MDP → (Qopt, πopt).
- Asynchronous Dynamic Programming.
- Generalized Policy Iteration.
- Bootstrapping: Updating estimates on the basis of other estimates.
- First-Visit Monte Carlo Prediction.
- Exploring Starts.
- Monte Carlo Control Exploring Starts.
- On-Policy Methods.
- ϵ-greedy Policies: Most of the time they choose an action that has maximal estimated action value, but with probability ϵ they instead select an action at random.
- ϵ-soft Policies: Policies for which π(a|s) ≥ ϵ/|A(s)| for all states and actions, for some ϵ > 0.
- On-Policy First-Visit MC Control.
- Off-Policy Learning.
- Target Policy.
- Behavior Policy.
- Importance Sampling.
- Off-Policy Monte Carlo Prediction.
- Off-Policy Monte Carlo Control.
- Temporal-Difference Learning.
- SARSA: On-Policy TD Control.
- Q-Learning: Off-Policy TD Control
- Function Approximation.
- Continuous States.
- Learning State Action Value function: Replay Buffer: 10,000 tuples most recent (s, a, R(s), s’). x = (s, a) → Q(θ) → y = R(s) + γmaxQ(s’, a’, θ). Loss = [R(s) + γmaxQ(s’, a’; θ)] − Q(s, a; θ).
- Target Network: A separate neural network for generating the y targets. It has the same architecture as the original Q-Network. Loss = [R(s) + γmaxTargetQ(s’, a’; θ′)] − Q(s, a; θ). Every C time steps we will use the TargetQ-Network to generate the y targets and update the weights of the TargetQ-Network using the weights of the Q-Network.
- Soft Updates: ← 0.001θ + 0.999, where and represent the weights of the target network and the current network, respectively.
- Deep Reinforcement Learning, Deep Q-learning.

After finishing learning about machine learning please click Topic 23 – Introduction to Computer Vision to continue.