**Why do I need to learn about machine learning?**

Machine learning has solved many important difficult problems recently. A few of them include speech recognition, speech synthesis, image recognition, autonomous driving and chat bots.

Nowadays a key skill of software developer is the ability to use machine learning algorithms solve real-world problems.

**What can I do after finishing learning about machine learning?**

You will be to create software that could recognize car plate number from an image, identify probability of breast cancer for a patient.

**That sounds useful! What should I do now?**

Please audit

– this Machine Learning Specialization (Coursera) courses and

– this Applied Machine Learning in Python (Coursera) course.

At the same time, please read

– this Aurelien Geron (2022). Hands-On Machine Learning with Scikit-Learn, Keras and TensorFlow. O’Reilly Media book and

– this Brett Lantz (2019). Machine Learning with R – Expert Techniques for Predictive Modeling. Packt Publishing book, and

– this Michael A. Nielsen (2015). Neural Networks and Deep Learning. Determination Press book.

After that please watch

– this MIT 6.034 – Artificial Intelligence, Fall 2010 course (Readings).

After that, at the same time, please audit

– this Reinforcement Learning Specialization (Coursera) courses and read

– this Richard S. Sutton and Andrew G. Barto (2020). Reinforcement Learning. The MIT Press.

After that please read

– this Tom M. Mitchell (1997). Machine Learning. McGraw-Hill Education book, and

– this Christopher M. Bishop (2006). Pattern Recognition and Machine Learning. Springer book.

**Supervised Learning Terminology Review:**

- Artificial Intelligence.
- Machine Learning.
- Deep Learning.
- Linear Regression:
**Y**=**θ**ᵀ**X**+**Ε.** *Cost Function*measures how good/bad your model is.- Mean Square Error (MSE) measures the average of the squares of the errors.
- Gradient Descent, Learning Rate.
- Batch Gradient Descent.
*The R-Squared Test*measures the proportion of the total variance in the output (y) that can be explained by the variation in x. It can be used to evaluate how good a “fit” some model is on the given data.- Stochastic Gradient Descent.
- Mini-Batch Gradient Descent.
- Overfitting: machine learning model gives accurate predictions for training data but not for new data.
- Regularization: Ridge Regression, Lasso Regression, Elastic Net, Early Stopping.
- Logistic Regression.
- Sigmoid Function.
- Binary Cross Entropy Loss Function, Log Loss Function.
- One Hot Encoding.
- The
*Softmax*function takes an N-dimensional vector of arbitrary real values and produces another N-dimensional vector with real values in the range (0, 1) that add up to 1.0. - Softmax Regression.
- Support Vector Machines.
- Decision Trees.
- K-Nearest Neighbors.
- McCulloch-Pitts Neuron.
*Linear Threshold Unit*with threshold T calculates the weighted sum of its inputs, and then outputs 0 if this sum is less than T, and 1 if the sum is greater than T.- Perceptron.
- Activation Functions: Sigmoid, Hyperbolic Tangent, Rectified Linear Unit (ReLU).
- Artificial Neural Networks.
- Backpropagation.
- Gradient Descent Optimization Algorithms: Momentum, Adagrad, Adadelta, RMSprop, Adam.
- Regularization: Dropout.
- The Joint Probability Table.
- Bayesian Networks.
- Naive Bayes Inference.

**Unsupervised Learning Terminology Review:**

- K-Means.
- Principal Component Analysis.
- User-Based Collaborative Filtering.
- Item-based Collaborative Filtering.
- Matrix Factorization.

**Reinforcement Learning Terminology Review:**

- k-armed Bandit Problem.
- Bandit Algorithm.
- Exponential Recency-Weighted Average.
- Optimistic Initial Values.
- Upper-Confidence-Bound Action Selection.
- Agent.
- World.
- States, Terminal State.
- Actions.
- Rewards.
- Markov Decision Processes: Agent (π) >> Action (a) >> World >> State (s), Reward >> Agent (π). Model: (current state, action, reward of current state, next state) = (s, a, R(s), s’).
- Episodes.
- Continuing Tasks.
- Horizon (H): Number of time steps in each episode, can be infinite.
- Expected Return: Sum of rewards from time step t to horizon H.
- Discounted Return: Discounted sum of rewards from time step t to horizon H.
- Discount Factor, Discount Rate: 0 ≤ γ ≤ 1.
- Policy: Mapping from states to actions: π (s) = a or π (a|s) = P(aₜ=a|sₜ=s).
- State Value Function – Vπ(s): The expected return starting from state s
*following*policy π. - State-Action Value function, also known as the quality function – Qπ(s): The expected return starting from state ,
*taking action , then following policy*. - Bellman Equations.
- Optimal Policies.
- Optimal Value Functions.
- Bellman Optimality Equations.
- Policy Evaluation: (MDP, π) → Linear System Solver, Dynamic Programming → Vπ.
- Iterative Policy Evaluation.
- Policy Control, Policy Improvement.
- Policy Improvement Theorem.
- Greedy Policy.
- Policy Iteration: (MDP) → Dynamic Programming → Vπ-optimal.
- Value Iteration: MDP → (Qopt, πopt).
- Asynchronous Dynamic Programming.
- Generalized Policy Iteration.
- Bootstrapping: Updating estimates on the basis of other estimates.
- First-Visit Monte Carlo Prediction.
- Exploring Starts.
- Monte Carlo Control.
- Model-Based Value Iteration.
- Model-free Monte Carlo.
- SARSA.
- Function Approximation.
- Continuous States.
- Learning State Action Value function: Replay Buffer: 10,000 tuples most recent (s, a, R(s), s’). x = (s, a) → Q(θ) → y = R(s) + γmaxQ(s’, a’, θ). Loss = [R(s) + γmaxQ(s’, a’; θ)] − Q(s, a; θ).
- Target Network: A separate neural network for generating the y targets. It has the same architecture as the original Q-Network. Loss = [R(s) + γmaxTargetQ(s’, a’; θ′)] − Q(s, a; θ). Every C time steps we will use the TargetQ-Network to generate the y targets and update the weights of the TargetQ-Network using the weights of the Q-Network.
- Soft Updates: ← 0.001θ + 0.999, where and represent the weights of the target network and the current network, respectively.
- Deep Reinforcement Learning, Deep Q-learning.
- ϵ-greedy Policy: With probability 0.95, pick greedy action (exploitation). With probability 0.05, pick action randomly (exploration).

After finishing learning about machine learning please click Topic 23 – Introduction to Computer Vision to continue.