Category Archives: Software Engineering Curriculum

Topic 27 – Introduction to Blockchain

Why do I need to learn about blockchain?

Blockchain offers an interesting and unique solution for applications that require distributed consensus.

Today, a key skill for software developers is the ability to use blockchain-based platforms and tools to solve real-world problems involving distributed agreements.

What can I do after finishing learning about blockchain?

You will be to create decentralized applications (dApps) using platforms like Ethereum or Hyperledger, and smart contract programming language like Solidity.

That sounds fun! What should I do now?

First, please read this book to learn about the core protocols and algorithms in cryptography: Bruce Schneier (1996). Applied Cryptography – Protocols, Algorithms and Source Code in C. Wiley.

After that, please read this book to learn about the core concepts of Bitcoin: Arvind Narayanan et al. (2016). Bitcoin and Cryptocurrency Technologies – A Comprehensive Introduction. Princeton University Press.

After that, please read the books below to learn programming with Bitcoin:

After that, please read this book to learn programming with Ethereum: Andreas M. Antonopoulos and Gavin Wood (2018). Mastering Ethereum. O’Reilly Media.

After that, please read this book to learn programming blockchain using Hyperledger Fabric: Matt Zand et al. (2021). Hands-On Smart Contract Development with Hyperledger Fabric V2. O’Reilly Media.

After that, please audit this course to gain some ideas about the application of blockchain: MIT 15.S12 Blockchain and Money, Fall 2018 (Lecture Slides).

Terminology Review:

  • Public Keys.
  • Private Keys.
  • Digital Signatures.
  • Digital Signature Scheme.
  • Cryptographic Hash Functions.
  • Merkle Tree: Binary Data Tree with Hashes.
  • Bitcoin: Digital money ecosystem.
  • bitcoin: Unit of currency.
  • Bitcoin Users.
  • Bitcoin Wallets.
  • Bitcoin Addresses.
  • Bitcoin Transactions.
  • Blockchain Explorer.
  • Bitcoin Mining, Miners.
  • The Chain of Transactions.
  • Bitcoin Core: The reference implementation of the bitcoin system.
  • Bitcoin Exchanges.
  • Bitcoin Network.
  • Double‐Spending Attacks.
  • Block Chain: Timestamped Append-Only Log.
  • Sybil Attack: Copies of nodes that a malicious adversary can create to look like there are a lot of different participants.
  • Proof of Work: Find a number, or nonce, such that H(nonce || prev_hash || tx || tx || … || tx) < target.
    51‐Percent Attack.
  • Account-Based Ledger: The ledger keeps track of account balances.
  • Unspent Transaction Output: A transaction output that can be used as input in a new transaction.
  • Transaction-Based Ledger: The ledger keeps track of individual transaction outputs.
  • Coinbase Transactions.
  • Bitcoin Scripting Language.
  • Turing Incompleteness.
  • Stateless Verification.
  • Candidate Block.
  • Genesis Block.
  • Ethereum: The world computer.
  • Ether.
  • Externally Owned Accounts (EOAs).
  • Contract Accounts.
  • Solidity.
  • Smart Contracts.
  • Ethereum Clients.
  • Ethereum Networks.
  • Permissionless Blockchain.
  • Permissioned Blockchain.

After finishing blockchain, please click on Topic 28 – Introduction to AI Agent Development to continue.

 

 

Topic 26 – Introduction to Cloud Computing

Why do I need to learn about cloud computing?

Because you will develop software systems that often leverage cloud services for quick deployment, scalable computation, and storage.

What can I do after finishing learning cloud computing?

You will be able to

  • deploy software systems to public clouds,
  • build your private cloud,
  • develop software using cloud plaftforms,
  • develop software using cloud services,
  • leverage cloud services for training and deploying machine learning models,
  • leverage cloud services for big data analytics and reporting.

What should I do now?

First, please read this book to learn about the core concepts of cloud computing: Dan C. Marinescu (2022). Cloud Computing – Theory and Practice. Morgan Kaufmann.

After that, please read this book to gain hands-on experience with Amazon cloud services: Andreas Wittig and Michael Wittig (2022). Amazon Web Services in Action. Manning Publications.

After that, please read this book to gain hands-on experience with VMware private cloud products: Nick Marshall et al. (2019). Mastering VMware vSphere 6.7. Sybex.

After that, please read the books below to gain hands-on experience with Salesforce platform features:

After that, please read this book to gain hands-on experience with Hadoop and Spark systems: Tomasz Wiktorski (2019). Data-Intensive Systems – Principles and Fundamentals using Hadoop and Spark. Springer.

Terminology Review:

  • Software as a Service
  • Multitenancy
  • Infrastructure as a Service
  • Virtual Machines
  • Software-Defined Networking
  • Infrastructure as Code (IaC)
  • Platform as a Service
  • Containers as a Service
  • Function as a Service (Serverless Computing)
  • File Storage
  • Block Storage
  • Object Storage
  • Direct-Attached Storage (DAS)
  • Network-Attached Storage (NAS)
  • Storage Area Network (SAN)
  • GFS
  • Bigtable
  • MapReduce

After finishing cloud computing, please click on Topic 27 – Introduction to Blockchain to continue.

 

Topic 25 – Introduction to Distributed Systems

Why do I need to learn about distributed systems?

Distributed systems provide the foundation for understanding the theories and techniques behind cloud computing and blockchain technology.

The architectures, protocols, and algorithms introduced in distributed systems are also necessary for creating complex software.

What can I do after finishing learning distributed systems?

You will be able to design software that can

  • tolerate faults,
  • shard data,
  • handle massive number of requests, and
  • perform expensive computations.

You will also be prepared to learn about cloud computing and blockchain technology.

What should I do now?

First, please audit this course to familiarize yourself with the core concepts and protocols of distributed systems: Distributed Systems, UC Santa Cruz Baskin School of Engineering, 2021.

Afterward, please audit the course and read the books below to learn how to design large-scale distributed systems:

Terminology Review:

  • Fault Tolerance
  • Consistency
  • System Models
  • Failure Detectors
  • Communication
  • Ordering
  • State Machine Replication
  • Primary-Backup Replication
  • Bully Algorithm
  • Ring Election
  • Multi-Leader Replication
  • Leaderless Replication
  • Cristian’s Algorithm
  • Berkeley Algorithm
  • Lamport Clocks
  • Vector Clocks
  • Version Vectors
  • Chain Replication
  • Consensus
  • FLP
  • Raft
  • Paxos
  • Viewstamped Replication
  • Zab
  • Consistent Hashing
  • Distributed Transactions
  • ACID
  • Two-Phase Commit
  • Three-Phase Commit
  • Serializability
  • Two-Phase Locking
  • Distributed Locks
  • CAP
  • Consistency Models
  • Linearizability
  • Distributed Architectures
  • Distributed Programming
  • Hadoop
  • Spark
  • Tensorflow
  • PyTorch
  • Kubernetes
  • Bitcoin
  • Smart Contracts

After finishing distributed systems, please click on Topic 26 – Introduction to Cloud Computing to continue.

 

Topic 20 – Discrete Mathematics

Why do I need to learn about discrete mathematics?

Discrete mathematics is a fundamental tool for understanding many theories and techniques behind artificial intelligence, machine learning, deep learning, data mining, security, digital imagine processing and natural language processing.

The problem-solving techniques and computation thinking introduced in discrete mathematics are necessary for creating complicated software too.

What can I do after finishing learning discrete mathematics?

You will be equipped with the core concepts of logic, set theory, number theory, combinatorics, graph theory, Boolean algebra, and discrete probability.

These concepts will prepare you to learn modern theories and techniques for developing software in security, machine learning, data mining, image processing, and natural language processing.

What should I do now?

Please read the following books to grasp the core concepts of discrete mathematics:

Alternatively, if you want to learn the topic through interactive explanations, please audit the course and read its textbook: MIT 6.042J – Mathematics for Computer Science, Fall 2010 (Textbook).

Terminology Review:

  • Statement: An assertion that is either true or false.
  • Mathematical Statements.
  • Mathematical Proof: A convincing argument about the accuracy of a statement.
  • If p, then q. p is hypothesis. q is conclusion.
  • Proposition: A true statement.
  • Theorem: An important proposition.
  • Lemmas: Supporting propositions.
  • Logic: A language for reasoning that contains a collection of rules that we use when doing logical reasoning.
  • Propositional Logic: A logic about truth and falsity of statements.
  • Logic Connectives: Not (Negation), And (Conjunction), Or (Disjunction), If then (Implication), If and only if (Equivalence).
  • Truth Table.
  • Contrapositive of Proposition: The contrapositive of p q is the proposition ¬q ¬p.
  • Modus Ponens: If both P  Q and P hold, then Q can be concluded.
  • Predicate: A property of some objects or a relationship among objects represented by the variables.
  • Quantifier: Tells how many objects have a certain property.
  • Mathematical Induction: Base Case, Inductive Case.
  • Recursion: A Base, An Recursive Step.
  • Sum Example: Annuity.
  • Set.
  • Subset.
  • Set Operations: A ∪ B, A ∩ B, A ⊂ U: A’ = {x : x ∈ U and x ∉ A}, A \ B = A ∩ B’ = {x : x ∈ A and x ∉ B}.
  • Cartesian Product: A × B = {(a; b) : a ∈ A and b ∈ B};
  • A binary relation (or just relation) from X to Y is a subset R ⊆ X × Y. To describe the relation R, we  may list the collection of all ordered pairs (x, y) such that x is related to y by R.
  • A mapping or function f ⊂ A × B from a set A to a set B to be the special type of relation in which for each element a ∈ A there is a unique element b ∈ B such that (a, b) ∈ f.
  • Equivalence Relation.
  • Equivalence Class.
  • Partition.
  • A state machine is a binary relation on a set, the elements of the set are called states, the relation is called the transition relation, and an arrow in the graph of the transition relation is called a transition.
  • Greatest Common Divisor.
  • Division Algorithm.
  • Prime Numbers.
  • The Fundamental Theorem of Arithmetic: Let n be an integer such that n > 1. Then n can be factored as a product of prime numbers. n = p₁p₂ ∙ ∙ ∙ pₖ
  • Congruence: a is congruent to b modulo n if n | (a – b), written a ≡ b (mod n).
  • Fermat’s Little Theorem.
  • Stirling’s Approximation.
  • Probability.
  • Example: The Monty Hall Problem.
  • The Four Step Method: (1) Find the Sample Space (Set of possible outcomes), (2) Define Events of Interest (Subset of the sample space),  (3) Determine Outcome Probabilities, (4) Compute Event Probabilities.
  • A tree diagram is a graphical tool that can help us work through the four step approach when the number of outcomes is not too large or the problem is nicely structured.
  • Example: The Strange Dice.
  • Conditional Probability: P(A|B) = P (A ∩ B) / P(B).
  • A conditional probability P(B|A) is called a posteriori if event B precedes event A in time.
  • Example: Medical Testing.
  • Independence: P(B|A) = P(B)  or P(A∩B) = P(A) · P(B).
  • Mutual Independence: The probability of each event is the same no matter which of the other events has occurred.
  • Pairwise Independence: Any two events are independent.
  • Example: The Birthday Problem.
  • The birthday paradox refers to the counterintuitive fact that only 23 people are needed for that probability to exceed 50%, for 70 people: P = 99.9%.
  • Bernoulli Random Variable (Indicator Random Variable): f: Ω {1, 0}.
  • Binomial Random Variable: A number of successes in an experiment consisting of n trails. P (X = x) = [(n!)/((x!) · (n-x)!))]pˣ(1 − p)ⁿ − ˣ
  • Expectation (Average, Mean). E = Sum(R(w) · P(w)) = Sum(x · P(X = x)).
  • Median P(R < x) ≤ 1/2 and P(R>x) < 1/2.
  • Example: Splitting the Pot.
  • Mean Time to Failure: If a system independently fails at each time step with probability p, then the expected number of steps up to the first failure is 1/p.
  • Linearity of Expectation.
  • Example: The Hat Check Problem.
  • Example: Benchmark: E(Z/R) = 1.2 does NOT mean that E(Z) = 1.2E(R).
  • Variance: var(X) = E[(X−E[X])²].
  • Kurtosis: E[(X−E[X])⁴].
  • Markov’s Theorem: P(R ≥ x) ≤ E(R)/x (R > 0, x > 0).
  • Chebyshev’s Theorem: P(|R – E(R)| ≥ x) ≤ var(R)/x². Boundary of the probability of deviation from the mean.
  • The Chernoff Bound: P(T ≥ c·E(T)) ≤ e−ᶻ·ᴱ⁽ᵀ⁾, where z = c·lnc − c + 1, T = Sum(Tᵢ),  0 ≤ Tᵢ ≤ 1.

After finishing discrete mathematics, please click on Topic 21 – Introduction to Computational Thinking to continue.

 

Topic 2 – Introduction to Computer Networks

Why do I need to learn about computer networks?

Because you will develop software system that usually connects with other software systems via various networks.

What can I do after finishing learning computer networks?

You will be able to set up various software systems such as Domain Name System, Active Directory System, Electronic Mail, File Transfer Protocol System, Remote Desktop Services, File Services, HTTP Services.

You will be prepared to learn about network programming, game development, web application development, distributed systems, cloud computing, and blockchain.

What should I do now?

Please audit this course and complete all the quizzes, to learn about computer networking concepts: The Bits and Bytes of Computer Networking.

Alternatively, please read the books below to grasph the core concepts of computer networks:

After that, please read the two books below to get hands-on experience with networking labs on Windows Server:

After that, please read this book to learn about the problems and their coresponding solutions for building networks: Larry L. Peterson and Bruce S. Davie (2021). Computer Networks: A Systems Approach. Morgan Kaufmann.

Terminology Review:

  • Computer Networking.
  • Computer Networks, Peer-to-Peer Systems, Local Area Networks, Wide Area Networks, Virtual Private Networks, ISP Networks, The Internet.
  • Network Software, Distributed Systems, World Wide Web, Network Protocols.
  • The OSI Reference Model: The Physical Layer, The Data Link Layer, The Network Layer, The Transport Layer, The Session Layer, The Presentation Layer, The Application Layer.
  • The TCP/IP Reference Model: The Link Layer, The Internet Layer, The Transport Layer, The Application Layer.
  • The TCP/IP 5-Layer Model: The Physical Layer, The Data Link Layer, The Network Layer, The Transport Layer, The Application Layer.
  • Network Interface Cards, RJ45 Ports and Plugs, Cables, Hubs, Switches, Routers, Servers, Clients, Nodes.
  • Bit, Octet (Byte), Modulation, Line Coding, Twisted Pair Cables, Simplex Communication, Duplex Communication, Full-Duplex, Half-Duplex.
  • Collision Domain, Ethernet, Carrier-Sense Multiple Access with Collision Detection (CSMA/CD), MAC Address.
  • Unicast, Broadcast, Multicast.
  • Data Packet, Ethernet Frame, Virtual LAN (VLAN), VLAN Header.
  • First-in-First-Out (FIFO).
  • ∞×∞
  • IPv4 Addresses, IIPv4 Datagrams, IPv4 Address Classes, Subnet Masks, CIDR (Classless Inter-Domain Routing).
  • Address Resolution Protocol (ARP).
  • Routing Tables, Autonomous System, Interior Gateway Protocols,  Exterior Gateway Protocols, Distance Vector Routing Protocols, Link State Routing Protocols, Core Internet Routers, Border Gateway Protocol (BGP), Non-Routable Address Space.
  • IP Security Protocol.
  • ∞×∞
  • Multiplexing, Demultiplexing, Ports.
  • TCP Segment, TCP Control Flags, Three-way Handshake, Four-way Handshake, Transmission Control Protocol (TCP), TCP Socket, TCP Socket States.
  • Connection-Oriented Protocols, Connectionless Protocols.
  • User Datagram Protocol (UDP).
  • Network Address Translation.
  • Frames, Packets, Messages.
  • Network Sockets.
  • Transport Service Primitives: LISTEN, CONNECT, SEND, RECEIVE, DISCONNECT.
  • ∞×∞
  • Public Switched Telephone Network (PSTN), Plain Old Telephone Service (POTS), Modems, Dial-up (Phone Lines), Usenet.
  • Broadband, T-Carrier Technologies, Digital Subscriber Line (DSL, Phone Lines), Asymmetric Digital Subscriber Line (ADSL), Symmetric Digital Subscriber Line (SDSL), High Bit-Rate Digital Subscriber Line (HDAL), Digital Subscriber Line Access Multiplexers (DSLAM).
  • Cable Broadband (Television Lines), Cable Modems, Cable Modem Termination System (CMTS).
  • Fiber to the X (FTTX), Fiber to the Neighborhood (FTTN), Fiber to the Building (FTTB), Fiber to the Home (FTTH), Fiber to the Premises (FTTP), Optical Network Terminator.
  • Point to Point Protocol (PPP), Network Control Protocol (NCP), Link Control Protocol (LCP), Point to Point Protocol over Ethernet (PPPoE).
  • **********
  • Network Drivers.
  • Firewalls.
  • **********
  • Domain Name System (DNS).
  • Electronic Mail, SMTP Protocol.
  • FTP (File Transfer Protocol), FTPS (File Transfer Protocol Secure), SFTP (SSH File Transfer Protocol).
  • Remote Desktop Services.
  • File Services.
  • HTTP Services.
  • Time Services.
  • Short Message Service (SMS).

After finishing computer networks, please click on Topic 3 – Introduction to Programming to continue.

 

Topic 24 – Introduction to Nature Language Processing

Why do I need to learn about nature language processing?

Natural language processing (NLP) has become increasingly interesting, with breakthrough achievements such as speech recognition, speech synthesis, autonomous driving, and chatbots.

Nowadays, a key skill for software developers is the ability to use NLP algorithms and tools to solve real-world problems involving text, audio, natural language sentences, and speech.

What can I do after finishing learning about nature language processing?

You will be to create software that could recognize speech, translate text to speech, translate a sentence from English to French, answer a customer’s question.

That sounds fun! What should I do now?

First, please take a quick look at the following two books to grasp the core concepts and classical methods in natural language processing:

After that, please audit this course, Sequence Models, to obtain the core concepts and hands-on experience with sequence models.

After that please watch these videos to learn about audio signal processing for machine learning.

After that, please audit the courses below to learn how to understand and generate natural language using deep learning models:

After that, please read the book below to learn how to use large lanugage models to build NLP applications:

Terminology Review:

  • Natural Language Processing.
  • Text Classification (e.g. Spam Detection).
  • Named Entity Recognition.
  • Chatbots.
  • Speech Processing.
  • Speech Recognition.
  • Speech Synthesis.
  • Machine Translation.
  • Corpus: A body of texts.
  • Token: a word or a number or a punctuation mark.
  • Collocation: compounds (e.g. disk drive), phrasal verbs (e.g. make up), and other stock phrases (e.g. bacon and eggs).
  • Unigram: word.
  • Bigrams: pairs of words that occur commonly.
  • Trigrams: 3 words that occur commonly.
  • N-grams: n words that occur commonly.
  • Hypothesis Testing.
  • t-Test.
  • Likelihood Ratios.
  • Language Model: statistical model of word sequences.
  • Naive Bayes.
  • Hidden Markov Models.
  • Bag-of-Words Model.
  • Term Frequency–Inverse Document Frequency (TF–IDF).
  • Bag-of-n-Grams.
  • One-Hot Representation: You have a vocabulary of n words and you represent each word using a vector that is n bits long, in which all bits are zero except for one bit that is set to 1.
  • Word Embedding (Featurized Representation) is the transformation from words to dense vector.
  • Euclidean Distance, Dot Product Similarity, Cosine Similarity.
  • Embedding Matrix.
  • Neural Language Model.
  • Word2Vec: Skip-Gram Model, Bag-of-Words Model.
  • Negative Sampling.
  • GloVe, Global Vectors.
  • Recurrent Neural Networks.
  • Backpropagation Through Time.
  • Recurrent Neural Net Language Model (RNNLM).
  • Gated Recurrent Unit (GRU).
  • Long Short Term Memory (LSTM).
  • Bidirectional RNN.
  • Deep RNNs.
  • Sequence to Sequence Model.
  • Teacher Forcing.
  • Image Captioning.
  • Greedy Search.
  • Beam Search, Length Normalization.
  • BLEU (BiLingual Evaluation Understudy) Score.
  • ROUGE (Recall-Oriented Understudy for Gisting Evaluation) Score.
  • F1 Score.
  • Minimum Bayes-Risk.
  • Attention Mechanism.
  • Self-Attention (Scaled and Dot-Product Attention): Queries, Keys and Values.
  • Positional Encoding.
  • Masked Self-Attention.
  • Multi-Head Attention.
  • Residual Dropout.
  • Label Smoothing.
  • Transformer Encoder.
  • Transformer Decoder.
  • Transformer Encoder-Decoder.
  • Cross-Attention.
  • Byte Pair Encoding.
  • BERT (Bidirectional Encoder Representations from Transformers).

After finishing natural language processing, please click on Topic 25 – Introduction to Distributed Systems to continue.

 

 

Topic 23 – Introduction to Computer Vision

Why do I need to learn about computer vision?

Computer vision has become an increasingly interesting field, with achievements such as image recognition, autonomous driving, and disease detection.

Nowadays, a key skill for software developers is the ability to use computer vision algorithms and tools to solve real-world problems involving images and videos.

What can I do after finishing learning about applied computer vision?

You will be able to create software that can recognize a face or transform a picture of a young person into an older person.

That sounds fun! What should I do now?

First, please take a quick look at the following two books to grasp the core concepts and methods in computer vision:

After that, please audit the course and read the book below to solidify your knowledge and gain hands-on experience with computer vision algorithms:

After that, please audit the following courses to grasp the core concepts of generative adversarial networks and gain hands-on experience with them:

After that, please audit the following courses and read the book below to grasp the core concepts of generative models, including diffusion models, and to gain hands-on experience with these models:

After that, please audit this course to learn how to efficiently represent, compress, and train large generative models: TinyML and Efficient Deep Learning Computing.

Terminology Review:

  • Digital Image: f(x, y)
  • Intensity (Gray Level): ℓ = f(x, y)
  • Gray Scale: ℓ = 0 is considered black and ℓ = L – 1 is considered white.
  • Quantization: Digitizing the amplitude values.
  • Sampling: Digitizing the coordinate values.
  • Representing Digital Images: Matrix or Vector.
  • Pixel or Picture Element: An element of matrix or vector.
  • Deep Learning.
  • Artificial Neural Networks.
  • Filter: 2-dimensional matrix commonly square in size containing weights shared all over the input space.
  • The Convolution Operation: Element-wise multiply, and add the outputs.
  • Stride: Filter step size.
  • Padding.
  • Upsampling: Nearest Neighbors, Linear Interpolation, Bilinear Interpolation.
  • Max Pooling, Average Pooling, Min Pooling.
  • Convolutional Layers.
  • Feature Maps.
  • Convolutional Neural Networks (CNNs).
  • Object Localization.
  • Bounding Box.
  • Landmark Detection.
  • Sliding Windows Detection.
  • Bounding Box Predictions.
  • Intersection over Union.
  • Non-max Suppression Algorithm.
  • Anchor Box Algorithm.
  • Object Detection.
  • YOLO Algorithm.
  • Semantic Segmentation.
  • Transpose Convolution.
  • U-Net.
  • Face Verification.
  • Face Recognition.
  • One-shot Learning.
  • Siamese Network.
  • Triplet Loss.
  • Neural Style Transfer.
  • Content Cost Function.
  • Style Cost Function.
  • 1D Convolution.
  • 3D Convolution.
  • Latent Variable.
  • Autoencoders.
  • Variational Autoencoders.
  • Generators.
  • Discriminators.
  • Binary Cross Entropy Loss Function, Log Loss Function.
  • Generative Adversarial Networks (GANs).
  • Deep Convolutional Generative Adversarial Networks.
  • Mode Collapse.
  • Earth Mover’s Distance.
  • Wasserstein Loss (W-Loss).
  • 1-Lipschitz Continuous Function.
  • Wasserstein GANs.
  • Conditional GANs.
  • Pixel Distance.
  • Feature Distance.
  • Fréchet Inception Distance (FID).
  • Inception Score (IS).
  • Autoregressive Models.
  • Variational Autoencoders (VAEs).
  • Flow Models.
  • StyleGAN.
  • Pix2Pix.
  • CycleGAN.
  • Diffusion Models.
  • Magnitude-based Pruning.
  • K-Means-based Weight Quantization.
  • Linear Quantization.
  • Neural Architecture Search.
  • Knowledge Distillation.
  • Self and Online Distillation.
  • Network Augmentation.
  • Loop Reordering, Loop Tiling, Loop Unrolling, SIMD (Single Instruction, Multiple Data) Programming, Multithreading, CUDA Programming.
  • Data Parallelism.
  • Pipeline Parallelism.
  • Tensor Parallelism.
  • Hybrid Parallelism.
  • Automated Parallelism.
  • Gradient Pruning: Sparse Communication, Deep Gradient Compression, PowerSGD.
    Gradient Quantization: 1-Bit SGD, Threshold Quantization, TernGrad.
  • Delayed Gradient Averaging.

After finishing computer vision, please click on Topic 24 – Introduction to Nature Language Processing to continue.

 

 

Topic 22 – Introduction to Machine Learning

Why do I need to learn about machine learning?

Machine learning has been used to solve many important and difficult problems, including speech recognition, speech synthesis, image recognition, autonomous driving, and chatbots. Today, a key skill for software developers is the ability to use machine learning algorithms to solve real-world problems.

What can I do after finishing learning about machine learning?

You will be to create software that could recognize car plate number from an image, identify probability of breast cancer for a patient.

That sounds useful! What should I do now?

First, please audit these couses to learn the core concepts of machine learning and gain hands-on experience with them:

After that, please read the following books to reinforce your theoretical understanding and practical competence in machine learning:

After that, please audit this course and read its readings to learn the core approaches and algorithms for building artificial intelligence systems: MIT 6.034 – Artificial Intelligence, Fall 2010 (Readings).

After that, please read the following books to to study the mathematical foundations underlying machine learning algorithms:

After that, please audit the following courses and read the book below to learn the core concepts and algorithms of reinforcement learning:

Supervised Learning Terminology Review:

  • Artificial Intelligence.
  • Machine Learning.
  • Deep Learning.
  • Linear Regression: Y = θX + Ε.
  • Cost Function measures how good/bad your model is.
  • Mean Square Error (MSE) measures the average of the squares of the errors.
  • Gradient Descent, Learning Rate.
  • Batch Gradient Descent.
  • The R-Squared Test measures the proportion of the total variance in the output (y) that can be explained by the variation in x. It can be used to evaluate how good a “fit” some model is on the given data.
  • Stochastic Gradient Descent.
  • Mini-Batch Gradient Descent.
  • Overfitting: machine learning model gives accurate predictions for training data but not for new data.
  • Regularization: Ridge Regression, Lasso Regression, Elastic Net, Early Stopping.
  • Normalization.
  • Logistic Regression.
  • Sigmoid Function.
  • Binary Cross Entropy Loss Function, Log Loss Function.
  • One Hot Encoding.
  • The Softmax function takes an N-dimensional vector of arbitrary real values and produces another N-dimensional vector with real values in the range (0, 1) that add up to 1.0.
  • Softmax Regression.
  • Gradient Ascent.
  • Newton’s Method.
  • Support Vector Machines.
  • Decision Trees.
  • Parametric vs. Non-parametric Models.
  • K-Nearest Neighbors.
  • Locally Weighted Regression.
  • McCulloch-Pitts Neuron.
  • Linear Threshold Unit with threshold T calculates the weighted sum of its inputs, and then outputs 0 if this sum is less than T, and 1 if the sum is greater than T.
  • Perceptron.
  • Artificial Neural Networks.
  • Backpropagation.
  • Activation Functions: Rectified Linear Unit (ReLU), Leaky ReLU, Sigmoid, Hyperbolic Tangent.
  • Batch Normalization.
  • Learning Rate Decay.
  • Exponentially Weighted Averages.
  • Gradient Descent Optimization Algorithms: Momentum, Adagrad, Adadelta, RMSprop, Adam.
  • Regularization: Dropout.
  • The Joint Probability Table.
  • Bayesian Networks.
  • Naive Bayes Inference.

Unsupervised Learning Terminology Review:

  • K-Means.
  • Principal Component Analysis.
  • User-Based Collaborative Filtering.
  • Item-based Collaborative Filtering.
  • Matrix Factorization.

    Reinforcement Learning Terminology Review:

    • k-armed Bandit Problem.
    • Sample-Average Method.
    • Greedy Action.
    • Exploration and Exploitation.
    • ϵ-Greedy Action Selection.
      • Bandit Algorithm.
      • Exponential Recency-Weighted Average.
      • Optimistic Initial Values.
      • Upper-Confidence-Bound Action Selection.
      • Rewards.
      • Agent, Actions, World or Environment.
      • History, States, Terminal State, Environment State, Agent State, Information State.
      • Fully Observable Environments.
      • Partially Observable Environments.
      • Policy,  Value Function, Model.
      • Value Based RL Agent, Policy Based RL Agent, Actor Critic RL Agent.
      • Model Free RL Agent, Model Based RL Agent.
      • Learning Problem and Planning Problem.
      • Prediction and Control.
      • Markov Property.
      • State Transition Matrix.
      • Markov Process.
      • Episodic Tasks.
      • Continuing Tasks.
      • Horizon (H): Number of time steps in each episode, can be infinite.
      • Markov Reward Process.
      • Discount Factor, Discount Rate: 0 ≤ γ ≤ 1.
      • Return.
      • Discounted Return: Discounted sum of rewards from time step t to horizon H.
      • State-Value Function.
      • Bellman Equation for Markov Reward Processes.
      • Markov Decision Process.
      • Policy: Mapping from states to actions. Deterministic policy: π (s) = a. Stochastic policy: π (a|s) = P(aₜ=a|sₜ=s).
      • State Value Function – Vπ(s): The expected return starting from state s following policy π.
      • Bellman Expectation Equation for Vπ.
      • Action Value Function (also known as State-Action Value Fucntion or the Quality Function) – Qπ(s, a): The expected return starting from state , taking action , then following policy .
      • Bellman Expectation Equation for Qπ.
      • Optimal State Value Function.
      • Optimal Action Value Function.
      • Bellman Optimality Equation for v*.
      • Bellman Optimality Equation for q*.
      • Optimal Policies.
      • Dynamic Programming.
      • Iterative Policy Evaluation.
      • Policy Improvement.
      • Policy Improvement Theorem.
      • Policy Iteration.
      • Value Iteration.
      • Synchronous Dynamic Programming.
      • Asynchronous Dynamic Programming.
      • Generalized Policy Iteration.
      • Bootstrapping: Updating estimates on the basis of other estimates.
      • Monte-Carlo Policy Evaluation.
      • First-Visit Monte-Carlo Policy Evaluation.
      • Every-Visit Monte-Carlo Policy Evaluation.
      • Incremental Mean.
      • Incremental Monte-Carlo Updates.
      • Temporal-Difference Learning.
      • Forward-View TD(λ).
      • Eligibility Traces.
      • Backward-View TD(λ).
      • On-Policy Learning.
      • Off-Policy Learning.
      • ϵ-Greedy Exploration.
      • ϵ-greedy Policies: Most of the time they choose an action that has maximal estimated action value, but with probability ϵ they instead select an action at random.
      • Monte-Carlo Policy Iteration. Policy evaluation: Monte-Carlo policy evaluation, Q = qπ. Policy improvement: ϵ-greedy policy improvement.
      • Monte-Carlo Control. Policy evaluation: Monte-Carlo policy evaluation, Q ≈ qπ. Policy improvement: ϵ-greedy policy improvement.
      • Exploring Starts: Specify that the episodes start in a state–action pair, and that every pair has a nonzero probability of being selected as the start.
      • Monte Carlo Control Exploring Starts.
      • Greedy in the Limit with In nite Exploration (GLIE) Monte-Carlo Control.
      • ϵ-soft Policies: Policies for which π(a|s) ≥ ϵ/|A(s)| for all states and actions, for some ϵ > 0.
      • On-Policy First-Visit MC Control.
      • SARSA: State (S), Action (A), Reward (R), State (S’), Action (A’).
      • On-Policy Control with SARSA. Policy evaluation: SARSA evaluation, Q ≈ qπ. Policy improvement: ϵ-greedy policy improvement.
      • Forward-View SARSA (λ).
      • Backward-View SARSA (λ).
      • Target Policy.
      • Behavior Policy.
      • Importance Sampling: Use samples from one distribution to estimate the expectation of a diff erent distribution.
      • Importance Sampling for Off-Policy Monte-Carlo.
      • Importance Sampling for Off-Policy TD.
      • Q-Learning: Next action is chosen using behaviour policy. Q is updated using alternative successor action.
      • Off -Policy Control with Q-Learning.
      • Expected SARSA.
      • Value Function Approximation.
      • Function Approximators.
      • Differentiable Function Approximators.
      • Feature Vectors.
      • State Aggregation.
      • Coarse Coding.
      • Tile Coding.
      • Continuous States.
      • Incremental Prediction Algorithms.
      • Control with Value Function Approximation. Policy evaluation: Approximate policy evaluation, q(.,., w) ≈ qπ. Policy improvement: ϵ-greedy policy improvement.
      • Learning State Action Value function: Replay Buffer: 10,000 tuples most recent (s, a, R(s), s’). x = (s, a) → Q(θ) → y = R(s) + γmaxQ(s’, a’, θ). Loss = [R(s) + γmaxQ(s’, a’; θ)] − Q(s, a; θ).
      • Expected SARSA with Function Approximation.
      • Target Network: A separate neural network for generating the y targets. It has the same architecture as the original Q-Network. Loss = [R(s) + γmaxTargetQ(s’, a’; θ′)] − Q(s, a; θ). Every C time steps we will use the TargetQ-Network to generate the y targets and update the weights of the TargetQ-Network using the weights of the Q-Network.
      • Soft Updates: ← 0.001θ + 0.999, where and represent the weights of the target network and the current network, respectively.
      • Deep Q-learning.
      • Linear Least Squares Prediction Algorithms.
      • Least Squares Policy Iteration. Policy evaluation: Least squares Q-Learning. Policy improvement: Greedy policy improvement.
      • Average Reward.
      • Discounted Returns, Returns for Average Reward.
      • Stochastic Policies.
      • Softmax Policies.
      • Gaussian Policies.
      • Policy Objective Functions: Start State Objective, Average Reward Objective and Average Value Objective.
      • Score Function.
      • Policy Gradient Theorem.
      • Monte-Carlo Policy Gradient (REINFORCE).
      • Action-Value Actor-Critic: Critic updates w by linear TD(0). Actor updates θ by policy gradient.
      • The Tabular Dyna-Q Algorithm.
      • The Dyna-Q+ Algorithm.
      • Forward Search.
      • Simulation-Based Search.
      • Monte-Carlo Tree Search.
      • Temporal-Difference Search.
      • Dyna-2.

      Probabilistic Machine Learning Terminology Review:

      • Probabilistic Machine Learning
      • Non-Probabilistic Machine Learning
      • Algorithmic Machine Learning.
      • Array Programming.
      • Frequentist and Bayesian Approaches.

      After finishing machine learning, please click on Topic 23 – Introduction to Computer Vision to continue.

       

      Topic 21 – Introduction to Computational Thinking

      Why do I need to learn about computational thinking?

      Computational thinking is a fundamental tool for understanding, implementing, and evaluating modern theories in artificial intelligence, machine learning, deep learning, data mining, security, digital image processing, and natural language processing.

      What can I do after finishing learning about computation thinking?

      You will be able to:

      • use a programming language to express computations,
      • apply systematic problem-solving strategies such as decomposition, pattern recognition, abstraction, and algorithmic thinking to turn an ambiguous problem statement into a computational solution method,
      • apply algorithmic and problem-reduction techniques,
      • use randomness and simulations to address problems that cannot be solved with closed-form solutions,
      • use computational tools, including basic statistical, visualization, and machine learning tools, to model and understand data.

      These skills foster abstract thinking that enables you not only to use technology effectively but also to understand what is possible, recognize inherent trade-offs, and account for computational constraints that shape the software you design.

      You will also be prepared to learn how to design and build compilers, operating systems, database management systems, and distributed systems.

      That sounds useful! What should I do now?

      First, please read this book to learn how to apply computational methods such as simulation, randomized algorithms, and statistical analysis to solve problems such as modeling disease spread, simulating physical systems, analyzing biological data, optimizing transportation, and designing communication networks: John V. Guttag (2021). Introduction to Computation and Programming using Python. 3rd Edition. The MIT Press.

      Alternatively, if you want to gain the same concepts through interactive explanations, please audit the following courses:

      After that, please read chapters 5 and 6 of the following book to learn about the theory of computing and how a machine performs computations: Robert Sedgewick and Kevin Wayne (2016). Computer Science – An Interdisciplinary Approach. Addison-Wesley Professional.

      Alternatively, if you want to gain the same concepts through interactive explanations, please audit the following courses: Computer Science: Algorithms, Theory, and Machines.

      After that, please read the following book to learn what is going on “under the hood” of a computer system: Randal E. Bryant and David R. O’Hallaron (2015). Computer Systems. A Programmer’s Perspective. Pearson.

      After that, please audit this course to learn how to build scalable and high-performance software systems: MIT 6.172 Performance Engineering of Software Systems, Fall 2018 (Lecture Notes).

      Terminology Review:

      • Algorithms.
      • Fixed Program Computer, Stored Program Computer.
      • Computer Architecture.
      • Hardware or Computer Architecture Primitives, Programming Language Primitives, Theoretical or Computability Primitives
      • Mathematical Abstraction of a Computing Machine (Turing Machine, Abstract Device), Turing’s Primitives.
      • Programming Languages.
      • Expressions, Syntax, Static Sematics, Semantics, Variables, Bindings.
      • Programming vs. Math.
      • Programs.
      • Big O notation.
      • Optimization Models: Knapsack Problem.
      • Graph-Theoretic Models: Shortest Path Problems.
      • Simulation Models: Monte Carlo Simulation, Random Walk.
      • Statistical Models.
      • K-means Clustering.
      • k-Nearest Neighbors Algorithm.

      After finishing computational thinking, please click on Topic 22 – Introduction to Machine Learning to continue.

       

      Topic 19 – Probability & Statistics

      Why do I need to learn about probability and statistics?

      Probability and statistics are fundamental tools for understanding many modern theories and techniques such as artificial intelligence, machine learning, deep learning, data mining, security, digital imagine processing and natural language processing.

      What can I do after finishing learning about probability and statistics?

      You will be prepared to learn modern theories and techniques to create modern security, machine learning, data mining, image processing or natural language processing software.

      That sounds useful! What should I do now?

      Please read one of the following books to grasp the core concepts of probability and statistics:

      Alternatively, please read these notes first, and then audit the courses below if you would like to learn through interactive explanations:

      Perhaps probability and statistics are among the most difficult topics in mathematics, so you may need to study them two or three times using different sources to truly master the concepts. For example, you may audit the course and read the books below to gain additional examples and intuition about the concepts:

      Learning probability and statistics requires patience. However, the rewards will be worthwhile: you will be able to master AI algorithms more quickly and with greater confidence.

      Terminology Review:

      • Sample Space (Ω): Set of possible outcomes.
      • Event: Subset of the sample space.
      • Probability Law: Law specified by giving the probabilities of all possible outcomes.
      • Probability Model = Sample Space + Probability Law.
      • Probability Axioms: Nonnegativity: P(A) ≥ 0; Normalization: P(Ω)=1; Additivity: If A ∩ B = Ø, then P(A ∪ B)= P(A)+ P(B).
      • Conditional Probability: P(A|B) = P (A ∩ B) / P(B).
      • Multiplication Rule.
      • Total Probability Theorem.
      • Bayes’ Rule: Given P(Aᵢ) (initial “beliefs” ) and P (B|Aᵢ). P(Aᵢ|B) = ? (revise “beliefs”, given that B occurred).
      • The Monty Problem: 3 doors, behind which are two goats and a car.
      • The Spam Detection Problem: “Lottery” word in spam emails.
      • Independence of Two Events: P(B|A) = P(B)  or P(A ∩ B) = P(A) · P(B).
      • The Birthday Problem: P(Same Birthday of 23 People) > 50%.
      • The Naive Bayes Model: “Naive” means features independence assumption.
      • Discrete Uniform Law: P(A) = Number of elements of A / Total number of sample points = |A| / |Ω|
      • Basic Counting Principle: r stages, nᵢ choices at stage i, number of choices = n₁ n₂ · · · nᵣ
      • Permutations: Number of ways of ordering elements. No repetition for n slots: [n] [n-1] [n-2] [] [] [] [] [1].
      • Combinations: number of k-element subsets of a given n-element set.
      • Binomial Probabilities. P (any sequence) = p# ʰᵉᵃᵈˢ(1 − p)# ᵗᵃᶦˡˢ.
      • Random Variable: A function from the sample space to the real numbers. It is not random. It is not a variable. It is a function: f: Ω ℝ. Random variable is used to model the whole experiment at once.
      • Discrete Random Variables.
      • Probability Mass Function: P(X = 𝑥) or Pₓ(𝑥): A function from the sample space to [0..1] that produces the likelihood that the value of X equals to 𝑥. PMF gives probabilities. 0 ≤ PMF ≤ 1. All the values of PMF must sum to 1. PMF is used to model a random variable.
      • Bernoulli Random Variable (Indicator Random Variable): f: Ω {1, 0}. Only 2 outcomes: 1 and 0. p(1) = p and p(0) = 1 – p.
      • Binomial Random Variable: X = Number of successes in n trials. X = Number of heads in n independent coin tosses.
      • Binomial Probability Mass Function: Combination of (k, n)pᵏ(1 − p)ⁿ−ᵏ.
      • Geometric Random Variable: X = Number of coin tosses until first head.
      • Geometric Probability Mass Function: (1 − p)ᵏ−¹p.
      • Expectation: E[X] = Sum of xpₓ(x).
      • Let Y=g(X): E[Y] = E[g(X)] = Sum of g(x)pₓ(x). Caution: E[g(X)] ≠ g(E[X]) in general.
      • Variance: var(X) = E[(X−E[X])²].
      • var(aX)=a²var(X).
      • X and Y are independent: var(X+Y) = var(X) + var(Y). Caution: var(X+Y) ≠ var(X) + var(Y) in general.
      • Standard Deviation: Square root of var(X).
      • Conditional Probability Mass Function: P(X=x|A).
      • Conditional Expectation: E[X|A].
      • Joint Probability Mass Function: Pₓᵧ(x,y) = P(X=x, Y=y) = P((X=x) and (Y=y)).
      • Marginal Distribution: Distribution of one variable
        while ignoring the other.
      • Marginal Probability Mass Function: P(x) = Σy Pₓᵧ(x,y).
      • Total Expectation Theorem: E[X|Y = y].
      • Independent Random Variables: P(X=x, Y=y)=P(X=xP(Y=y).
      • Expectation of Multiple Random Variables: E[X + Y + Z] = E[X] + E[Y] + E[Z].
      • Binomial Random Variable: X = Sum of Bernoulli Random Variables.
      • The Hat Problem.
      • Continuous Random Variables.
      • Probability Density Function: P(a ≤ X ≤ b) or Pₓ(𝑥). (a ≤ X ≤ b) means X function produces a real number value within the [a, b] range. Programming language: X(outcome) = 𝑥, where a ≤ 𝑥 ≤ b. PDF does NOT give probabilities. PDF does NOT have to be less than 1. PDF gives probabilities per unit length. The total area under PDF must be 1. PDF is used to define the random variable’s probability coming within a distinct range of values.
      • Cumulative Distribution Function: P(X ≤ b). (X ≤ b) means X function produces a real number value within the [-∞, b] range. Programming language: X(outcome) = 𝑥, where 𝑥 ≤ b.
      • Continuous Uniform Random Variables: fₓ(x) = 1/(b – a) if a ≤ X ≤ b, otherwise f = 0.
      • Normal Random Variable, Gaussian Distribution, Normal Distribution: Fitting bell shaped data.
      • Chi-Squared Distribution: Modelling communication noise.
      • Sampling from a Distribution: The process of drawing a random value (or set of values) from a probability distribution.
      • Joint Probability Density Function.
      • Marginal Probability Density Function.
      • Conditional Probability Density Function.
      • Derived Distributions.
      • Convolution: A mathematical operation on two functions (f and g) that produces a third function.
      • The Distribution of W = X + Y.
      • The Distribution of X + Y where X, Y: Independent Normal Ranndom Variables.
      • Covariance.
      • Covariance Matrix.
      • Correlation Coefficient.
      • Conditional Expectation: E[X | Y = y] = Sum of xpₓ|ᵧ(x|y). If Y is unknown then E[X | Y] is a random variable, i.e. a function of Y. So E[X | Y] also has its expectation and variance.
      • Law of Iterated Expectations: E[E[X | Y]] = E[X].
      • Conditional Variance: var(X | Y) is a function of Y.
      • Law of Total Variance: var(X) =  E[var(X | Y)] +var([E[X | Y]).
      • Bernoulli Process:  A sequence of independent Bernoulli trials. At each trial, i: P(Xᵢ=1)=p, P(Xᵢ=0)=1−p.
      • Poisson Process.
      • Markov Chain.

      • Bar Chart, Line Charts, Scatter Plots, Histograms.
      • Mean, Median, Mode.
      • Moments of a Distribution.
      • Skewness: E[((X – μ)/σ)³].
      • Kurtosis: E[((X – μ)/σ)⁴].
      • k% Quantile: Value k such that P (X ≤ qₖ/₁₀₀) = k/100.
      • Interquartile Range: IQR = Q₃ − Q₁.
      • Box-Plots: Q₁, Q₂, Q₃, IQR, min, max.
      • Kernel Density Estimation.
      • Violin Plot = Box-Plot + Kernel Density Estimation.
      • Quantile-Quantile Plots (QQ Plots).
      • Population: N.
      • Sample: n.
      • Random Sampling.
      • Population Mean: μ.
      • Sample Mean: x̄.
      • Population Proportion: p.
      • Sample Proportion: p̂.
      • Population Variance: σ².
      • Sample Variance: s².
      • Sampling Distributions.
      • Sampling from a Distribution: Drawing random values directly from a probability distribution. Purpose: Simulating or modeling real-world processes when the underlying distribution is known.
      • Markov’s Inequality: P(X ≥ a) ≤ E(X)/a (X > 0, a > 0).
      • Chebyshev’s Inequality: P(|X – E(X)| ≥ a) ≤ var(X)/a².
      • Week Law of Large Numbers: The average of the samples will get closer to the population mean as the sample size (not number of items) increases.
      • Central Limit Theorem: The distribution of sample means approximates a normal distribution as the sample size (not number of items) gets larger, regardless of the population’s distribution.
      • Sampling Distributions: Distribution of Sample Mean, Distribution of Sample Proportion, Distribution of Sample Variance.
      • Point Estimate: A single number, calculated from a sample, that estimates a parameter of the population.
      • Maximum Likelihood Estimation: Given data the maximum likelihood estimate (MLE) for the parameter p is the value of p that maximizes the likelihood P (data | p). P (data | p) is the likelihood function. For continuous distributions, we use the probability density function to define the likelihood.
      • Log likelihood: the natural log of the likelihood function.
      • Frequentists: Assume no prior belief, the goal is to find the model that most likely generated observed data.
      • Bayesians: Assume prior belief, the goal is to update prior belief based on observed data.
      • Maximum A Posteriori (MAP): Good for instances when you have limited data or strong prior beliefs. Wrong priors, wrong conclusions. MAP with uninformative priors is just MLE.
      • Margin of Error: A bound that we can confidently place on the difference between an estimate of something and the true value.
      • Significance Level: α, the probability that the event could have occurred by chance.
      • Confidence Level: 1 − α,  a measure of how confident we are in a given margin of error.
      • Confidence Interval: A 95% confidence interval (CI) of the mean is a range with an upper and lower number calculated from a sample. Because the true population mean is unknown, this range describes possible values that the mean could be. If multiple samples were drawn from the same population and a 95% CI calculated for each sample, we would expect the population mean to be found within 95% of these CIs.
      • z-score: the number of standard deviations from the mean value of the reference population.
      • Confidence Interval: Unknown σ.
      • Confidence Interval for Proportions.
      • Hypothesis: A statement about a population developed for the purpose of testing.
      • Hypothesis Testing.
      • Null Hypothesis (H₀): A statement about the value of a population parameter, contains equal sign.
      • Alternate Hypothesis (H₁): A statement that is accepted if the sample data provide sufficient evidence that the null hypothesis is false, never contains equal sign.
      • Type I Error: Reject the null hypothesis when it is true.
      • Type II Error: Do not reject the null hypothesis when it is false.
      • Significance Level, α: The maximum probability of rejecting the null hypothesis when it is true.
      • Test Statistic:  A number, calculated from samples, used to find if your data could have occurred under the null hypothesis.
      • Right-Tailed Test: The alternative hypothesis states that the true value of the parameter specified in the null hypothesis is greater than the null hypothesis claims.
      • Left-Tailed Test: The alternative hypothesis states that the true value of the parameter specified in the null hypothesis is less than the null hypothesis claims.
      • Two-Tailed Test: The alternative hypothesis which does not specify a direction, i.e. when the alternative hypothesis states that the null hypothesis is wrong.
      • p-value: The probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. μ₀ is assumed to be known and H₀ is assumed to be true.
      • Decision Rules: If H₀ is true then acceptable x̄ must fall in (1 − α) region.
      • Critical Value or k-value: A value on a test distribution that is used to decide whether the null hypothesis should be rejected or not.
      • Power of a Test: The probability of rejecting the null hypothesis when it is false; in other words, it is the probability of avoiding a type II error.
      • t-Distribution.
      • T-Statistic.
      • t-Tests: Unknown σ, use T-Statistic.
      • Independent Two-Sample t-Tests.
      • Paired t-Tests.
      • A/B testing: A methodology for comparing two variations (A/B) that uses t-Tests for statistical analysis and making a decision.
      • Model Building: X = a·S + W, where X: output, S: “signal”, a: parameters, W: noise. Know S, assume W, observe X, find a.
      • Inferring: X = a·S + W. Know a, assume W, observe X, find S.
      • Hypothesis Testing: X = a·S + W. Know a, observe X, find S. S can take one of few possible values.
      • Estimation: X = a·S + W. Know a, observe X, find S. S can take unlimited possible values.
      • Bayesian Inference can be used for both Hypothesis Testing and Estimation by leveraging Bayes rule. Output is posterior distribution. Single answer can be Maximum a posteriori probability (MAP) or Conditional Expectation.
      • Least Mean Squares Estimation of Θ based on X.
      • Classical Inference can be used for both Hypothesis Testing and Estimation.

      After finishing probability and statistics, please click on Topic 20 – Discrete Mathematics to continue.