Category Archives: Software Engineering Curriculum

Topic 26 – Introduction to Cloud Computing

Why do I need to learn about cloud computing?

Because you will develop software systems that usually leverage cloud services for quick deployment or scalable computation or storage.

What can I do after finishing learning cloud computing?

You will be able to

  • deploy software systems to public clouds,
  • build your private cloud,
  • develop software using cloud plaftforms,
  • develop software using cloud services,
  • leverage cloud services for training and deploying machine learning models,
  • leverage cloud services for big data analytics and reporting.

What should I do now?

Please read
– this Dan C. Marinescu (2022). Cloud Computing – Theory and Practice. Morgan Kaufmann book, and
– this Andreas Wittig and Michael Wittig (2022). Amazon Web Services in Action. Manning Publications book, and
– this Nick Marshall et al. (2019). Mastering VMware vSphere 6.7. Sybex book, and
– this Rakesh Gupta (2020). Salesforce Platform App Builder Certification – A Practical Study Guide. Apress book, and
– this Philip Weinmeister (2019). Practical Salesforce Development Without Code – Building Declarative Solutions on the Salesforce Platform. Apress book, and
– this Tomasz Wiktorski (2019). Data-Intensive Systems – Principles and Fundamentals using Hadoop and Spark. Springer book.

Terminology Review:

  • Software as a Service
  • Multitenancy
  • Infrastructure as a Service
  • Virtual Machines
  • Software-Defined Networking
  • Infrastructure as Code (IaC)
  • Platform as a Service
  • Containers as a Service
  • Function as a Service (Serverless Computing)
  • File Storage
  • Block Storage
  • Object Storage
  • Direct-Attached Storage (DAS)
  • Network-Attached Storage (NAS)
  • Storage Area Network (SAN)
  • GFS
  • Bigtable
  • MapReduce

After finishing learning about cloud computing please click Topic 27 – Introduction to Block Chain to continue.

 

Topic 25 – Introduction to Distributed Systems

Why do I need to learn about distributed systems?

Distributed systems provides foundation for understanding theories and techniques behind cloud computing and block chain technology.

Architectures, protocols and algorithms introduced in distributed systems are necessary for creating complicated software too.

What can I do after finishing learning distributed systems?

You will be able to design software that can

  • tolerate faults,
  • shard data,
  • handle massive number of requests, and
  • perform expensive computation.

You will be prepared to learn about cloud computing and block chain technology.

What should I do now?

Please watch this Distributed Systems, UC Santa Cruz Baskin School of Engineering, 2021 course to get familar with core concepts and protocols.

After that please watch this MIT 6.824, Distributed Systems, Spring 2020 course to learn how to design a large-scale distributed system.

At the same time you can read
– this Maarten van Steen and Andrew S. Tanenbaum (2023). Distributed Systems. Maarten van Steen book, and
– this Martin Kleppmann (2017). Designing Data-Intensive Applications – The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O’Reilly Media book to solidify your knowledge.

Terminology Review:

  • Fault Tolerance
  • Consistency
  • System Models
  • Failure Detectors
  • Communication
  • Ordering
  • State Machine Replication
  • Primary-Backup Replication
  • Bully Algorithm
  • Ring Election
  • Multi-Leader Replication
  • Leaderless Replication
  • Cristian’s Algorithm
  • Berkeley Algorithm
  • Lamport Clocks
  • Vector Clocks
  • Version Vectors
  • Chain Replication
  • Consensus Algorithms
  • FLP
  • Raft
  • Paxos
  • Viewstamped Replication
  • Zab
  • Consistent Hashing
  • Distributed Transactions
  • ACID
  • Two-Phase Commit
  • Three-Phase Commit
  • Serializability
  • Two-Phase Locking
  • Distributed Locks
  • CAP
  • Consistency Models
  • Linearizability
  • Distributed Architectures
  • Distributed Programming
  • Hadoop
  • Spark
  • Tensorflow
  • PyTorch
  • Kubernetes
  • Bitcoin
  • Smart Contracts

After finishing learning about computer networks please click Topic 26 – Introduction to Cloud Computing to continue.

 

Topic 20 – Discrete Mathematics

Why do I need to learn about discrete mathematics?

Discrete mathematics is a fundamental tool for understanding many theories and techniques behind artificial intelligence, machine learning, deep learning, data mining, security, digital imagine processing and natural language processing.

The problem-solving techniques and computation thinking introduced in discrete mathematics are necessary for creating complicated software too.

What can I do after finishing learning discrete mathematics?

You will be prepared to learn modern theories and techniques to create modern security, machine learning, data mining, image processing or natural language processing software.

What should I do now?

Please read
– this Kenneth H. Rosen (2012). Discrete Mathematics and Its Applications. McGraw-Hill book and
– this Alfred V. Aho and Jeffrey D. Ullman (1994). Foundations of Computer Science  book (free online version).

Alternatively, please watch this MIT 6.042J – Mathematics for Computer Science, Fall 2010 course (Textbook).

Terminology Review:

  • Statement: An assertion that is either true or false.
  • Mathematical Statements.
  • Mathematical Proof: A convincing argument about the accuracy of a statement.
  • If p, then q. p is hypothesis. q is conclusion.
  • Proposition: A true statement.
  • Theorem: An important proposition.
  • Lemmas: Supporting propositions.
  • Logic: A language for reasoning that contains a collection of rules that we use when doing logical reasoning.
  • Propositional Logic: A logic about truth and falsity of statements.
  • Logic Connectives: Not (Negation), And (Conjunction), Or (Disjunction), If then (Implication), If and only if (Equivalence).
  • Truth Table.
  • Contrapositive of Proposition: The contrapositive of p q is the proposition ¬q ¬p.
  • Modus Ponens: If both P  Q and P hold, then Q can be concluded.
  • Predicate: A property of some objects or a relationship among objects represented by the variables.
  • Quantifier: Tells how many objects have a certain property.
  • Mathematical Induction: Base Case, Inductive Case.
  • Recursion: A Base, An Recursive Step.
  • Sum Example: Annuity.
  • Set.
  • Subset.
  • Set Operations: A ∪ B, A ∩ B, A ⊂ U: A’ = {x : x ∈ U and x ∉ A}, A \ B = A ∩ B’ = {x : x ∈ A and x ∉ B}.
  • Cartesian Product: A × B = {(a; b) : a ∈ A and b ∈ B};
  • A binary relation (or just relation) from X to Y is a subset R ⊆ X × Y. To describe the relation R, we  may list the collection of all ordered pairs (x, y) such that x is related to y by R.
  • A mapping or function f ⊂ A × B from a set A to a set B to be the special type of relation in which for each element a ∈ A there is a unique element b ∈ B such that (a, b) ∈ f.
  • Equivalence Relation.
  • Equivalence Class.
  • Partition.
  • A state machine is a binary relation on a set, the elements of the set are called states, the relation is called the transition relation, and an arrow in the graph of the transition relation is called a transition.
  • Greatest Common Divisor.
  • Division Algorithm.
  • Prime Numbers.
  • The Fundamental Theorem of Arithmetic: Let n be an integer such that n > 1. Then n can be factored as a product of prime numbers. n = p₁p₂ ∙ ∙ ∙ pₖ
  • Congruence: a is congruent to b modulo n if n | (a – b), written a ≡ b (mod n).
  • Fermat’s Little Theorem.
  • Stirling’s Approximation.
  • Probability.
  • Example: The Monty Hall Problem.
  • The Four Step Method: (1) Find the Sample Space (Set of possible outcomes), (2) Define Events of Interest (Subset of the sample space),  (3) Determine Outcome Probabilities, (4) Compute Event Probabilities.
  • A tree diagram is a graphical tool that can help us work through the four step approach when the number of outcomes is not too large or the problem is nicely structured.
  • Example: The Strange Dice.
  • Conditional Probability: P(A|B) = P (A ∩ B) / P(B).
  • A conditional probability P(B|A) is called a posteriori if event B precedes event A in time.
  • Example: Medical Testing.
  • Independence: P(B|A) = P(B)  or P(A∩B) = P(A) · P(B).
  • Mutual Independence: The probability of each event is the same no matter which of the other events has occurred.
  • Pairwise Independence: Any two events are independent.
  • Example: The Birthday Problem.
  • The birthday paradox refers to the counterintuitive fact that only 23 people are needed for that probability to exceed 50%, for 70 people: P = 99.9%.
  • Bernoulli Random Variable (Indicator Random Variable): f: Ω {1, 0}.
  • Binomial Random Variable: A number of successes in an experiment consisting of n trails. P (X = x) = [(n!)/((x!) · (n-x)!))]pˣ(1 − p)ⁿ − ˣ
  • Expectation (Average, Mean). E = Sum(R(w) · P(w)) = Sum(x · P(X = x)).
  • Median P(R < x) ≤ 1/2 and P(R>x) < 1/2.
  • Example: Splitting the Pot.
  • Mean Time to Failure: If a system independently fails at each time step with probability p, then the expected number of steps up to the first failure is 1/p.
  • Linearity of Expectation.
  • Example: The Hat Check Problem.
  • Example: Benchmark: E(Z/R) = 1.2 does NOT mean that E(Z) = 1.2E(R).
  • Variance: var(X) = E[(X−E[X])²].
  • Kurtosis: E[(X−E[X])⁴].
  • Markov’s Theorem: P(R ≥ x) ≤ E(R)/x (R > 0, x > 0).
  • Chebyshev’s Theorem: P(|R – E(R)| ≥ x) ≤ var(R)/x². Boundary of the probability of deviation from the mean.
  • The Chernoff Bound: P(T ≥ c·E(T)) ≤ e−ᶻ·ᴱ⁽ᵀ⁾, where z = c·lnc − c + 1, T = Sum(Tᵢ),  0 ≤ Tᵢ ≤ 1.

After finishing learning about discrete mathematics please click Topic 21 – Introduction to Computation and Programming using Python to continue.

 

Topic 2 – Introduction to Computer Networks

Why do I need to learn about computer networks?

Because you will develop software system that usually connects with other software systems via various networks.

What can I do after finishing learning computer networks?

You will be able to set up various software systems such as Domain Name System, Active Directory System, Electronic Mail, File Transfer Protocol System, Remote Desktop Services, File Services, HTTP Services.

You will be prepared to learn about network programming, game development, web application development, and distributed systems and blockchain.

What should I do now?

Please audit this The Bits and Bytes of Computer Networking course and complete all the quizzes.

Alternatively, you can read
– this Andrew S. Tanenbaum and David J. Wetherall (2021). Computer Networks. Pearson Education book, and
– this James F. Kurose and Keith W. Ross (2021). Computer Networking: A Top-Down Approach. Pearson book.

After that please read
– this Brian Svidergol et al. (2018). Mastering Windows Server 2016. Wiley book, and
– this Larry L. Peterson and Bruce S. Davie (2021). Computer Networks: A Systems Approach. Morgan Kaufmann book.

Terminology Review:

  • Computer Networking.
  • Computer Networks, Peer-to-Peer Systems, Local Area Networks, Wide Area Networks, Virtual Private Networks, ISP Networks, The Internet.
  • Network Software, Distributed Systems, World Wide Web, Network Protocols.
  • The OSI Reference Model: The Physical Layer, The Data Link Layer, The Network Layer, The Transport Layer, The Session Layer, The Presentation Layer, The Application Layer.
  • The TCP/IP Reference Model: The Link Layer, The Internet Layer, The Transport Layer, The Application Layer.
  • The TCP/IP 5-Layer Model: The Physical Layer, The Data Link Layer, The Network Layer, The Transport Layer, The Application Layer.
  • Network Interface Cards, RJ45 Ports and Plugs, Cables, Hubs, Switches, Routers, Servers, Clients, Nodes.
  • Bit, Octet (Byte), Modulation, Line Coding, Twisted Pair Cables, Simplex Communication, Duplex Communication, Full-Duplex, Half-Duplex.
  • Collision Domain, Ethernet, Carrier-Sense Multiple Access with Collision Detection (CSMA/CD), MAC Address.
  • Unicast, Broadcast, Multicast.
  • Data Packet, Ethernet Frame, Virtual LAN (VLAN), VLAN Header.
  • First-in-First-Out (FIFO).
  • IPv4 Addresses, IIPv4 Datagrams, IPv4 Address Classes, Address Resolution Protocol (ARP), Subnet Masks, CIDR (Classless Inter-Domain Routing).
  • Routing Tables, Autonomous System, Interior Gateway Protocols,  Exterior Gateway Protocols, Distance Vector Routing Protocols, Link State Routing Protocols, Core Internet Routers, Border Gateway Protocol (BGP), Non-Routable Address Space.
  • Multiplexing, Demultiplexing, Ports.
  • TCP Segment, TCP Control Flags, Three-way Handshake, Four-way Handshake, Transmission Control Protocol (TCP), TCP Socket, TCP Socket States.
  • Connection-Oriented Protocols, Connectionless Protocols.
  • User Datagram Protocol (UDP).
  • Firewall.
  • Network Address Translation.
  • Frames, Packets, Messages.
  • Network Socket.
  • Transport Service Primitives: LISTEN, CONNECT, SEND, RECEIVE, DISCONNECT.
  • Domain Name System (DNS).
  • Electronic Mail, SMTP Protocol.
  • File Transfer Protocol System.
  • Remote Desktop Services.
  • File Services.
  • HTTP Services.
  • Time Services.
  • Short Message Service (SMS).
  • Public Switched Telephone Network (PSTN), Plain Old Telephone Service (POTS), Modems, Dial-up (Phone Lines), Usenet.
  • Broadband, T-Carrier Technologies, Digital Subscriber Line (DSL, Phone Lines), Asymmetric Digital Subscriber Line (ADSL), Symmetric Digital Subscriber Line (SDSL), High Bit-Rate Digital Subscriber Line (HDAL), Digital Subscriber Line Access Multiplexers (DSLAM).
  • Cable Broadband (Television Lines), Cable Modems, Cable Modem Termination System (CMTS).
  • Fiber to the X (FTTX), Fiber to the Neighborhood (FTTN), Fiber to the Building (FTTB), Fiber to the Home (FTTH), Fiber to the Premises (FTTP), Optical Network Terminator.
  • Point to Point Protocol (PPP), Network Control Protocol (NCP), Link Control Protocol (LCP), Point to Point Protocol over Ethernet (PPPoE).

After finishing learning about computer networks please click Topic 3 – Introduction to Programming to continue.

 

Topic 24 – Introduction to Nature Language Processing

Why do I need to learn about nature language processing?

Natural language processing (NLP) has become more and more interesting. Speech recognition, speech synthesis, autonomous driving and chat bots are examples of breakthrough achievements in the field.

Nowadays a key skill of software developer is the ability to use nature language processing algorithms and tools to solve real-world problems related to text, audio, natural language sentences and speech.

What can I do after finishing learning about nature language processing?

You will be to create software that could recognize speech, translate text to speech, translate a sentence from English to French, answer a customer’s question.

That sounds fun! What should I do now?

Please read
– this Daniel Jurafsky and James H. Martin (2014). Speech and Language Processing. Pearson book, and
– this Christopher D. Manning and Hinrich Schiitze (1999). Foundations of Statistical Natural Language Processing. MIT Press book first.

After that please audit these Natural Language Processing Specialization courses and this Stanford CS224N – NLP with Deep Learning, Winter 2023 course (Lecture Notes).

Terminology Review:

  • Natural Language Processing.
  • Text Classification (e.g. Spam Detection).
  • Named Entity Recognition.
  • Chatbots.
  • Speech Processing.
  • Speech Recognition.
  • Speech Synthesis.
  • Machine Translation.
  • Corpus: A body of texts.
  • Token: a word or a number or a punctuation mark.
  • Collocation: compounds (e.g. disk drive), phrasal verbs (e.g. make up), and other stock phrases (e.g. bacon and eggs).
  • Unigram: word.
  • Bigrams: pairs of words that occur commonly.
  • Trigrams: 3 words that occur commonly.
  • N-grams: n words that occur commonly.
  • Hypothesis Testing.
  • t-Test.
  • Likelihood Ratios.
  • Language Model: statistical model of word sequences.
  • Naive Bayes.
  • Hidden Markov Models.
  • Bag-of-Words Model.
  • Term Frequency–Inverse Document Frequency (TF–IDF).
  • Bag-of-n-Grams.
  • One-Hot Representation: You have a vocabulary of n words and you represent each word using a vector that is n bits long, in which all bits are zero except for one bit that is set to 1.
  • Word Embedding (Featurized Representation) is the transformation from words to dense vector.
  • Euclidean Distance, Dot Product Similarity, Cosine Similarity.
  • Embedding Matrix.
  • Neural Language Model.
  • Word2Vec: Skip-Gram Model, Bag-of-Words Model.
  • Negative Sampling.
  • GloVe, Global Vectors.
  • Recurrent Neural Networks.
  • Backpropagation Through Time.
  • Recurrent Neural Net Language Model (RNNLM).
  • Gated Recurrent Unit (GRU).
  • Long Short Term Memory (LSTM).
  • Bidirectional RNN.
  • Deep RNNs.
  • Sequence to Sequence Model.
  • Teacher Forcing.
  • Image Captioning.
  • Greedy Search.
  • Beam Search, Length Normalization.
  • BLEU (BiLingual Evaluation Understudy) Score.
  • ROUGE (Recall-Oriented Understudy for Gisting Evaluation) Score.
  • F1 Score.
  • Minimum Bayes-Risk.
  • Attention Mechanism.
  • Self-Attention (Scaled and Dot-Product Attention): Queries, Keys and Values.
  • Positional Encoding.
  • Masked Self-Attention.
  • Multi-Head Attention.
  • Residual Dropout.
  • Label Smoothing.
  • Transformer Encoder.
  • Transformer Decoder.
  • Transformer Encoder-Decoder.
  • Cross-Attention.
  • Byte Pair Encoding.
  • BERT (Bidirectional Encoder Representations from Transformers).

After finishing learning about natural language processing please click Topic 25 – Introduction to Distributed Systems to continue.

 

 

Topic 23 – Introduction to Computer Vision

Why do I need to learn about computer vision?

Computer vision has become more and more interesting. Image recognition, autonomous driving, and disease detection are examples of breakthrough achievements in the field.

Nowadays a key skill that is often required from a software developer is the ability to use computer vision algorithms and tools to solve real-world problems related to images and videos.

What can I do after finishing learning about applied computer vision?

You will be to create software that could recognize recognize a face or transform a picture of young person to old person.

That sounds fun! What should I do now?

Please read
– this Rafael C. Gonzalez and Richard E. Woods (2018). Digital Image Processing. 4th Edition. Pearson book, and
– this Richard Szeliski (2022). Computer Vision: Algorithms and Applications. Springer book.

At the same time, please
– audit these Deep Learning Specialization courses and
– read this Francois Chollet (2021). Deep Learning with Python. Manning Publications book, and
– this Michael A. Nielsen (2015). Neural Networks and Deep Learning. Determination Press book.

After that please read this Ian Goodfellow et al. (2016). Deep Learning. The MIT Press book.

Terminology Review:

  • Digital Image: f(x, y)
  • Intensity (Gray Level): ℓ = f(x, y)
  • Gray Scale: ℓ = 0 is considered black and ℓ = L – 1 is considered white.
  • Quantization: Digitizing the amplitude values.
  • Sampling: Digitizing the coordinate values.
  • Representing Digital Images: Matrix or Vector.
  • Pixel or Picture Element: An element of matrix or vector.
  • Deep Learning.
  • Artificial Neural Networks.
  • Filter: 2-dimensional matrix commonly square in size containing weights shared all over the input space.
  • The Convolution Operation: Element-wise multiply, and add the outputs.
  • Stride: Filter step size.
  • Convolutional Layers.
  • Feature Maps.
  • Pooling.
  • Convolutional Neural Networks (CNNs).
  • Object Detection.
  • Face Recognition.
  • YOLO Algorithm.
  • Latent Variable.
  • Autoencoders.
  • Variational Autoencoders.
  • Generators.
  • Discriminators.
  • Generative Adversarial Networks (GANs).
  • CycleGAN.
  • Neural Style Transfer.

After finishing learning about computer vision please click Topic 24 – Introduction to Nature Language Processing to continue.

 

 

Topic 22 – Introduction to Machine Learning

Why do I need to learn about machine learning?

Machine learning has solved many important difficult problems recently. A few of them include speech recognition, speech synthesis, image recognition, autonomous driving and chat bots.
Nowadays a key skill of software developer is the ability to use machine learning algorithms solve real-world problems.

What can I do after finishing learning about machine learning?

You will be to create software that could recognize car plate number from an image, identify probability of breast cancer for a patient.

That sounds useful! What should I do now?

Please audit
– this Machine Learning Specialization (Coursera) courses and
– this Applied Machine Learning in Python (Coursera) course.

At the same time, please read
– this Aurelien Geron (2022). Hands-On Machine Learning with Scikit-Learn, Keras and TensorFlow. O’Reilly Media book and
– this Brett Lantz (2023). Machine Learning with R. Packt Publishing book.

After that please watch
– this MIT 6.034 – Artificial Intelligence, Fall 2010 course (Readings).

After that please read
– this Tom M. Mitchell (1997). Machine Learning. McGraw-Hill Education book, and
– this Christopher M. Bishop (2006). Pattern Recognition and Machine Learning. Springer book.

After that please audit this Reinforcement Learning Specialization (Coursera) courses.
At the same time, please read this Richard S. Sutton and Andrew G. Barto (2020). Reinforcement Learning. The MIT Press book.

Supervised Learning Terminology Review:

  • Artificial Intelligence.
  • Machine Learning.
  • Deep Learning.
  • Linear Regression: Y = θX + Ε.
  • Cost Function measures how good/bad your model is.
  • Mean Square Error (MSE) measures the average of the squares of the errors.
  • Gradient Descent, Learning Rate.
  • Batch Gradient Descent.
  • The R-Squared Test measures the proportion of the total variance in the output (y) that can be explained by the variation in x. It can be used to evaluate how good a “fit” some model is on the given data.
  • Stochastic Gradient Descent.
  • Mini-Batch Gradient Descent.
  • Overfitting: machine learning model gives accurate predictions for training data but not for new data.
  • Regularization: Ridge Regression, Lasso Regression, Elastic Net, Early Stopping.
  • Logistic Regression.
  • Sigmoid Function.
  • Binary Cross Entropy Loss Function, Log Loss Function.
  • One Hot Encoding.
  • The Softmax function takes an N-dimensional vector of arbitrary real values and produces another N-dimensional vector with real values in the range (0, 1) that add up to 1.0.
  • Softmax Regression.
  • Support Vector Machines.
  • Decision Trees.
  • K-Nearest Neighbors.
  • McCulloch-Pitts Neuron.
  • Linear Threshold Unit with threshold T calculates the weighted sum of its inputs, and then outputs 0 if this sum is less than T, and 1 if the sum is greater than T.
  • Perceptron.
  • Activation Functions: Sigmoid, Hyperbolic Tangent, Rectified Linear Unit (ReLU).
  • Artificial Neural Networks.
  • Backpropagation.
  • Gradient Descent Optimization Algorithms: Momentum, Adagrad, Adadelta, RMSprop, Adam.
  • Regularization: Dropout.
  • The Joint Probability Table.
  • Bayesian Networks.
  • Naive Bayes Inference.

Unsupervised Learning Terminology Review:

  • K-Means.
  • Principal Component Analysis.
  • User-Based Collaborative Filtering.
  • Item-based Collaborative Filtering.
  • Matrix Factorization.

Reinforcement Learning Terminology Review:

  • k-armed Bandit Problem.
  • Bandit Algorithm.
  • Exponential Recency-Weighted Average.
  • Optimistic Initial Values.
  • Upper-Confidence-Bound Action Selection.
  • Agent.
  • World.
  • States, Terminal State.
  • Actions.
  • Rewards.
  • Markov Decision Processes: Agent (π) >> Action (a) >> World >> State (s), Reward >> Agent (π). Model: (current state, action, reward of current state, next state) = (s, a, R(s), s’).
  • Episodes.
  • Continuing Tasks.
  • Horizon (H): Number of time steps in each episode, can be infinite.
  • Expected Return: Sum of rewards from time step t to horizon H.
  • Discounted Return: Discounted sum of rewards from time step t to horizon H.
  • Discount Factor, Discount Rate: 0 ≤ γ ≤ 1.
  • Policy: Mapping from states to actions: π (s) = a or π (a|s) = P(aₜ=a|sₜ=s).
  • State Value Function – Vπ(s): The expected return starting from state s following policy π.
  • State-Action Value function, also known as the quality function – Qπ(s): The expected return starting from state , taking action , then following policy .
  • Bellman Equations.
  • Optimal Value Functions.
  • Optimal Policies.
  • Bellman Optimality Equations.
  • Policy Evaluation: (MDP, π) → Linear System Solver, Dynamic Programming → Vπ.
  • Iterative Policy Evaluation.
  • Policy Control, Policy Improvement.
  • Policy Improvement Theorem.
  • Greedy Policy.
  • Policy Iteration: (MDP) → Dynamic Programming → Vπ-optimal.
  • Value Iteration: MDP → (Qopt, πopt).
  • Asynchronous Dynamic Programming.
  • Generalized Policy Iteration.
  • Bootstrapping: Updating estimates on the basis of other estimates.
  • First-Visit Monte Carlo Prediction.
  • Exploring Starts.
  • Monte Carlo Control Exploring Starts.
  • On-Policy Methods.
  • ϵ-greedy Policies: Most of the time they choose an action that has maximal estimated action value, but with probability ϵ they instead select an action at random.
  • ϵ-soft Policies: Policies for which π(a|s) ≥ ϵ/|A(s)| for all states and actions, for some ϵ > 0.
  • On-Policy First-Visit MC Control.
  • Off-Policy Learning.
  • Target Policy.
  • Behavior Policy.
  • Importance Sampling.
  • Off-Policy Monte Carlo Prediction.
  • Off-Policy Monte Carlo Control.
  • Temporal-Difference Learning.
  • SARSA: On-Policy TD Control.
  • Q-Learning: Off-Policy TD Control
  • Function Approximation.
  • Continuous States.
  • Learning State Action Value function: Replay Buffer: 10,000 tuples most recent (s, a, R(s), s’). x = (s, a) → Q(θ) → y = R(s) + γmaxQ(s’, a’, θ). Loss = [R(s) + γmaxQ(s’, a’; θ)] − Q(s, a; θ).
  • Target Network: A separate neural network for generating the y targets. It has the same architecture as the original Q-Network. Loss = [R(s) + γmaxTargetQ(s’, a’; θ′)] − Q(s, a; θ). Every C time steps we will use the TargetQ-Network to generate the y targets and update the weights of the TargetQ-Network using the weights of the Q-Network.
  • Soft Updates: ← 0.001θ + 0.999, where and represent the weights of the target network and the current network, respectively.
  • Deep Reinforcement Learning, Deep Q-learning.

After finishing learning about machine learning please click Topic 23 – Introduction to Computer Vision to continue.

 

Topic 21 – Introduction to Computation and Programming using Python

Why do I need to learn about computation and programming using Python?

Computational thinking and Python are fundamental tools for understanding many modern theories and techniques such as artificial intelligence, machine learning, deep learning, data mining, security, digital imagine processing and natural language processing.

What can I do after finishing learning about computation and programming using Python ?

You will be prepared to learn modern theories and techniques to create modern security, machine learning, data mining, image processing or natural language processing software.

That sounds useful! What should I do now?

Please read this John V. Guttag (2013). Introduction to Computation and Programming using Python. 2nd Edition. The MIT Press book.

Alternatively, please watch
– this 6.0001 Introduction to Computer Science and Programming in Python. Fall 2016 course (Lecture Notes) and

– this MIT 6.0002 Introduction to Computational Thinking and Data Science, Fall 2016 course (Lecture Notes).

Terminology Review:

  • Big O notation.
  • Monte Carlo Simulation.
  • Random Walk.
  • K-means Clustering.
  • k-Nearest Neighbors Algorithm.

After finishing reading the book please click Topic 22 – Introduction to Machine Learning to continue.

 

Topic 19 – Probability & Statistics

Why do I need to learn about probability and statistics?

Probability and statistics are fundamental tools for understanding many modern theories and techniques such as artificial intelligence, machine learning, deep learning, data mining, security, digital imagine processing and natural language processing.

What can I do after finishing learning about probability and statistics?

You will be prepared to learn modern theories and techniques to create modern security, machine learning, data mining, image processing or natural language processing software.

That sounds useful! What should I do now?

Please read
– this Dimitri P. Bertsekas and John N. Tsitsiklis (2008). Introduction to Probability. Athena Scientific book, or
– this Hossein Pishro-Nik (2014). Introduction to Probability, Statistics, and Random Processes. Kappa Research, LLC book.

Alternatively, please read these notes, then watch
– this MIT 6.041SC – Probabilistic Systems Analysis and Applied Probability, Fall 2011 course (Lecture Notes), and
– this MIT RES.6-012 – Introduction to Probability, Spring 2018 course (Lecture Notes).

Probability and statistics are quite difficult topics so you may need to learn it 2 or 3 times using different sources to actually master the concepts.

Terminology Review:

  • Sample Space (Ω): Set of possible outcomes.
  • Event: Subset of the sample space.
  • Probability Law: Law specified by giving the probabilities of all possible outcomes.
  • Probability Model = Sample Space + Probability Law.
  • Probability Axioms: Nonnegativity: P(A) ≥ 0; Normalization: P(Ω)=1; Additivity: If A ∩ B = Ø, then P(A ∪ B)= P(A)+ P(B).
  • Conditional Probability: P(A|B) = P (A ∩ B) / P(B).
  • Multiplication Rule.
  • Total Probability Theorem.
  • Bayes’ Rule: Given P(Aᵢ) (initial “beliefs” ) and P (B|Aᵢ). P(Aᵢ|B) = ? (revise “beliefs”, given that B occurred).
  • Independence of Two Events: P(B|A) = P(B)  or P(A ∩ B) = P(A) · P(B).
  • Discrete Uniform Law: P(A) = Number of elements of A / Total number of sample points = |A| / |Ω|
  • Basic Counting Principle: r stages, nᵢ choices at stage i, number of choices = n₁ n₂ · · · nᵣ
  • Permutations: Number of ways of ordering elements. No repetition for n slots: [n] [n-1] [n-2] [] [] [] [] [1].
  • Combinations: number of k-element subsets of a given n-element set.
  • Binomial Probabilities. P (any sequence) = p# ʰᵉᵃᵈˢ(1 − p)# ᵗᵃᶦˡˢ.
  • Random Variable: a function from the sample space to the real numbers. It is not random. It is not a variable. It is a function: f: Ω ℝ.
  • Discrete Random Variable.
  • Bernoulli Random Variable (Indicator Random Variable): f: Ω {1, 0}.
  • Probability Mass Function: P(X = 𝑥) or Pₓ(𝑥): A function from the sample space to [0..1] that produces the likelihood that the value of X equals to 𝑥. PMF gives probabilities. 0 ≤ PMF ≤ 1. All the values of PMF must sum to 1.
  • Geometric Random Variable: X = Number of coin tosses until first head.
  • Geometric Probability Mass Function: (1 − p)ᵏ−¹p.
  • Binomial Random Variable: X = Number of heads (e.g. 2) in n (e.g. 4) independent coin tosses.
  • Binomial Probability Mass Function: Combination of (k, n)pᵏ(1 − p)ⁿ−ᵏ.
  • Expectation: E[X] = Sum of xpₓ(x).
  • Let Y=g(X): E[Y] = E[g(X)] = Sum of g(x)pₓ(x). Caution: E[g(X)] ≠ g(E[X]) in general.
  • Variance: var(X) = E[(X−E[X])²].
  • var(aX)=a²var(X).
  • X and Y are independent: var(X+Y) = var(X) + var(Y). Caution: var(X+Y) ≠ var(X) + var(Y) in general.
  • Standard Deviation: Square root of var(X).
  • Conditional Probability Mass Function: P(X=x|A).
  • Conditional Expectation: E[X|A].
  • Joint Probability Mass Function: Pₓᵧ(x,y) = P(X=x, Y=y) = P((X=x) and (Y=y)).
  • Marginal Probability Mass Function: P(x) = Σy Pₓᵧ(x,y).
  • Total Expectation Theorem: E[X|Y = y].
  • Independent Random Variables: P(X=x, Y=y)=P(X=xP(Y=y).
  • Expectation of Multiple Random Variables: E[X + Y + Z] = E[X] + E[Y] + E[Z].
  • Binomial Random Variable: X = Sum of Bernoulli Random Variables.
  • The Hat Problem.
  • Continuous Random Variables.
  • Probability Density Function: P(a ≤ X ≤ b) or Pₓ(𝑥). (a ≤ X ≤ b) means X function produces a real number value within the [a, b] range. Programming language: X(outcome) = 𝑥, where a ≤ 𝑥 ≤ b. PDF does NOT give probabilities. PDF does NOT have to be less than 1. PDF gives probabilities per unit length. The total area under PDF must be 1.
  • Continuous Uniform Random Variable.
  • Cumulative Distribution Function: P(X ≤ b). (X ≤ b) means X function produces a real number value within the [-∞, b] range. Programming language: X(outcome) = 𝑥, where 𝑥 ≤ b.
  • Normal Random Variable, Gaussian Distribution, Normal Distribution.
  • Joint Probability Density Function.
  • Conditional Probability Density Function.
  • Marginal Probability Density Function.
  • Derived Distributions.
  • Convolution: A mathematical operation on two functions (f and g) that produces a third function.
  • Covariance.
  • Correlation Coefficient.
  • Conditional Expectation: E[X | Y = y] = Sum of xpₓ|ᵧ(x|y). If Y is unknown then E[X | Y] is a random variable, i.e. a function of Y. So E[X | Y] also has its expectation and variance.
  • Law of Iterated Expectations: E[E[X | Y]] = E[X].
  • Conditional Variance: var(X | Y) is a function of Y.
  • Law of Total Variance: var(X) =  E[var(X | Y)] +var([E[X | Y]).
  • Bernoulli Process:  A sequence of independent Bernoulli trials. At each trial, i: P(Xᵢ=1)=p, P(Xᵢ=0)=1−p.
  • Poisson Process.
  • Markov Chain.
  • Markov’s Inequality: P(X ≥ a) ≤ E(X)/a (X > 0, a > 0).
  • Chebyshev’s Inequality: P(|X – E(X)| ≥ a) ≤ var(X)/a².
  • The Law of Large Numbers.
  • Central Limit Theorem.
  • Model Building: X = a·S + W, where W: noise, know S, assume W, observe X, find a.
  • Inferring: Know a, assume W, observe X, find S.
  • Hypothesis Testing: Know a, observe X, find S. S can take one of few possible values.
  • Estimation: Know a, observe X, find S. S can take unlimited possible values.
  • Bayesian Inference can be used for both Hypothesis Testing and Estimation, leverages Bayes rule. Output is posterior distribution. Single answer can be Maximum a posteriori probability (MAP) or Conditional Expectation.
  • Least Mean Squares Estimation of Θ based on X.
  • Classical Inference can be used for both Hypothesis Testing and Estimation, leverages .
  • Maximum Likelihood Estimation: Given data the maximum likelihood estimate (MLE) for the parameter p is the value of p that maximizes the likelihood P (data | p). P (data | p) is the likelihood function. For continuous distributions, we use the probability density function to define the likelihood.
  • Log likelihood: the natural log of the likelihood function.

After finishing learning about probability and statistics please click Topic 20 – Discrete Mathematics to continue.

 

Topic 18 – Linear Algebra

Why do I need to learn about linear algebra?

Linear algebra is a fundamental tool for understanding many modern theories and techniques such as artificial intelligence, machine learning, deep learning, data mining, security, digital imagine processing, and natural language processing.

What can I do after finishing learning about linear algebra?

You will be prepared to learn modern theories and techniques to create modern security, machine learning, data mining, image processing or natural language processing software.

That sounds useful! What should I do now?

Please read this David C. Lay et al. (2022). Linear Algebra and Its Applications. Pearson Education book.

Alternatively, please watch this MIT 18.06 – Linear Algebra, Spring 2005 course. While watching this course please do read Lecture Notes, and this Gilbert Strang (2016). Introduction to Linear Algebra. Wellesley-Cambridge Press book for better understanding some complex topics.

Terminology Review:

  • Linear Equations.
  • Row Picture.
  • Column Picture.
  • Triangular matrix is a square matrix where all the values above or below the diagonal are zero.
  • Lower Triangular Matrix.
  • Upper Triangular Matrix.
  • Diagonal matrix is a matrix in which the entries outside the main diagonal are all zero.
  • Tridiagonal Matrix.
  • Identity Matrix.
  • Transpose of a Matrix.
  • Symmetric Matrix.
  • Pivot Columns.
  • Pivot Variables.
  • Augmented Matrix.
  • Echelon Form.
  • Reduced Row Echelon Form.
  • Elimination Matrices.
  • Inverse Matrix.
  • Factorization into A = LU.
  • Free Columns.
  • Free Variables.
  • Gauss-Jordan Elimination.
  • Vector Spaces.
  • Rank of a Matrix.
  • Permutation Matrix.
  • Subspaces.
  • Column space, C(A) consists of all combinations of the columns of A and is a vector space in ℝᵐ.
  • Nullspace, N(A) consists of all solutions x of the equation Ax = 0 and lies in ℝⁿ.
  • Row space, C(Aᵀ) consists of all combinations of the row vectors of A and form a subspace of ℝⁿ. We equate this with C(Aᵀ), the column space of the transpose of A.
  • The left nullspace of A, N(Aᵀ) is the nullspace of Aᵀ. This is a subspace of ℝᵐ.
  • Linearly Dependent Vectors.
  • Linearly Independent Vectors.
  • Linear Span of Vectors.
  • A basis for a vector space is a sequence of vectors with two properties:
    • They are independent.
    • They span the vector space.
  • Given a space, every basis for that space has the same number of vectors; that number is the dimension of the space.
  • Dimension of a Vector Space.
  • Dot Product.
  • Orthogonal Vectors.
  • Orthogonal Subspaces.
  • Row space of A is orthogonal to  nullspace of A.
  • Matrix Spaces.
  • Rank-One Matrix.
  • Orthogonal Complements.
  • Projection matrix: P = A(AᵀA)⁻¹Aᵀ. Properties of projection matrix: Pᵀ = P and P² = P. Projection component: Pb = A(AᵀA)⁻¹Aᵀb = (AᵀA)⁻¹(Aᵀb)A.
  • Linear regression, least squares, and normal equations: Instead of solving Ax = b we solve Ax̂ = p or AᵀAx̂ = Aᵀb.
  • Linear Regression.
  • Orthogonal Matrix.
  • Orthogonal Basis.
  • Orthonormal Vectors.
  • Orthonormal Basis.
  • Orthogonal Subspaces.
  • Gram–Schmidt process.
  • Determinant: A number associated with any square matrix letting us know whether the matrix is invertible, the formula for the inverse matrix, the volume of the parallelepiped whose edges are the column vectors of A. The determinant of a triangular matrix is the product of the diagonal entries (pivots).
  • The big formula for computing the determinant.
  • The cofactor formula rewrites the big formula for the determinant of an n by n matrix in terms of the determinants of smaller matrices.
  • Formula for Inverse Matrix.
  • Cramer’s Rule.
  • Eigenvectors are vectors for which Ax is parallel to x: Ax = λx. λ is an eigenvalue of A, det(A − λI)= 0.
  • Diagonalizing a matrix: AS = SΛ 🡲 S⁻¹AS = Λ 🡲 A = SΛS⁻¹. S: matrix of n linearly independent eigenvectors. Λ: matrix of eigenvalues on diagonal.
  • Matrix exponential eᴬᵗ.
  • Markov matrices: All entries are non-negative and each column adds to 1.
  • Symmetric matrices: Aᵀ = A.
  • Positive definite matrices: all eigenvalues are positive or all pivots are positive or all determinants are positive.
  • Similar matrices: A and B = M⁻¹AM.
  • Singular value decomposition (SVD) of a matrix: A = UΣVᵀ, where U is orthogonal, Σ is diagonal, and V is orthogonal.
  • Linear Transformations: T(v + w) = T(v)+ T(w) and T(cv)= cT(v) . For any linear transformation T we can find a matrix A so that T(v) = Av.

After finishing learning about linear algebra please click Topic 19 – Probability & Statistics to continue.