Category Archives: Mathematics

Topic 20 – Discrete Mathematics

Why do I need to learn about discrete mathematics?

Discrete mathematics is a fundamental tool for understanding many theories and techniques behind artificial intelligence, machine learning, deep learning, data mining, security, digital imagine processing and natural language processing.

The problem-solving techniques and computation thinking introduced in discrete mathematics are necessary for creating complicated software too.

What can I do after finishing learning discrete mathematics?

You will be prepared to learn modern theories and techniques to create modern security, machine learning, data mining, image processing or natural language processing software.

What should I do now?

Please read
– this Kenneth H. Rosen (2012). Discrete Mathematics and Its Applications. McGraw-Hill book and
– this Alfred V. Aho and Jeffrey D. Ullman (1994). Foundations of Computer Science  book (free online version).

Alternatively, please watch this MIT 6.042J – Mathematics for Computer Science, Fall 2010 course (Textbook).

Terminology Review:

  • Statement: An assertion that is either true or false.
  • Mathematical Statements.
  • Mathematical Proof: A convincing argument about the accuracy of a statement.
  • If p, then q. p is hypothesis. q is conclusion.
  • Proposition: A true statement.
  • Theorem: An important proposition.
  • Lemmas: Supporting propositions.
  • Logic: A language for reasoning that contains a collection of rules that we use when doing logical reasoning.
  • Propositional Logic: A logic about truth and falsity of statements.
  • Logic Connectives: Not (Negation), And (Conjunction), Or (Disjunction), If then (Implication), If and only if (Equivalence).
  • Truth Table.
  • Contrapositive of Proposition: The contrapositive of p q is the proposition ¬q ¬p.
  • Modus Ponens: If both P  Q and P hold, then Q can be concluded.
  • Predicate: A property of some objects or a relationship among objects represented by the variables.
  • Quantifier: Tells how many objects have a certain property.
  • Mathematical Induction: Base Case, Inductive Case.
  • Recursion: A Base, An Recursive Step.
  • Sum Example: Annuity.
  • Set.
  • Subset.
  • Set Operations: A ∪ B, A ∩ B, A ⊂ U: A’ = {x : x ∈ U and x ∉ A}, A \ B = A ∩ B’ = {x : x ∈ A and x ∉ B}.
  • Cartesian Product: A × B = {(a; b) : a ∈ A and b ∈ B};
  • A binary relation (or just relation) from X to Y is a subset R ⊆ X × Y. To describe the relation R, we  may list the collection of all ordered pairs (x, y) such that x is related to y by R.
  • A mapping or function f ⊂ A × B from a set A to a set B to be the special type of relation in which for each element a ∈ A there is a unique element b ∈ B such that (a, b) ∈ f.
  • Equivalence Relation.
  • Equivalence Class.
  • Partition.
  • A state machine is a binary relation on a set, the elements of the set are called states, the relation is called the transition relation, and an arrow in the graph of the transition relation is called a transition.
  • Greatest Common Divisor.
  • Division Algorithm.
  • Prime Numbers.
  • The Fundamental Theorem of Arithmetic: Let n be an integer such that n > 1. Then n can be factored as a product of prime numbers. n = p₁p₂ ∙ ∙ ∙ pₖ
  • Congruence: a is congruent to b modulo n if n | (a – b), written a ≡ b (mod n).
  • Fermat’s Little Theorem.
  • Stirling’s Approximation.
  • Probability.
  • Example: The Monty Hall Problem.
  • The Four Step Method: (1) Find the Sample Space (Set of possible outcomes), (2) Define Events of Interest (Subset of the sample space),  (3) Determine Outcome Probabilities, (4) Compute Event Probabilities.
  • A tree diagram is a graphical tool that can help us work through the four step approach when the number of outcomes is not too large or the problem is nicely structured.
  • Example: The Strange Dice.
  • Conditional Probability: P(A|B) = P (A ∩ B) / P(B).
  • A conditional probability P(B|A) is called a posteriori if event B precedes event A in time.
  • Example: Medical Testing.
  • Independence: P(B|A) = P(B)  or P(A∩B) = P(A) · P(B).
  • Mutual Independence: The probability of each event is the same no matter which of the other events has occurred.
  • Pairwise Independence: Any two events are independent.
  • Example: The Birthday Problem.
  • The birthday paradox refers to the counterintuitive fact that only 23 people are needed for that probability to exceed 50%, for 70 people: P = 99.9%.
  • Bernoulli Random Variable (Indicator Random Variable): f: Ω {1, 0}.
  • Binomial Random Variable: A number of successes in an experiment consisting of n trails. P (X = x) = [(n!)/((x!) · (n-x)!))]pˣ(1 − p)ⁿ − ˣ
  • Expectation (Average, Mean). E = Sum(R(w) · P(w)) = Sum(x · P(X = x)).
  • Median P(R < x) ≤ 1/2 and P(R>x) < 1/2.
  • Example: Splitting the Pot.
  • Mean Time to Failure: If a system independently fails at each time step with probability p, then the expected number of steps up to the first failure is 1/p.
  • Linearity of Expectation.
  • Example: The Hat Check Problem.
  • Example: Benchmark: E(Z/R) = 1.2 does NOT mean that E(Z) = 1.2E(R).
  • Variance: var(X) = E[(X−E[X])²].
  • Kurtosis: E[(X−E[X])⁴].
  • Markov’s Theorem: P(R ≥ x) ≤ E(R)/x (R > 0, x > 0).
  • Chebyshev’s Theorem: P(|R – E(R)| ≥ x) ≤ var(R)/x². Boundary of the probability of deviation from the mean.
  • The Chernoff Bound: P(T ≥ c·E(T)) ≤ e−ᶻ·ᴱ⁽ᵀ⁾, where z = c·lnc − c + 1, T = Sum(Tᵢ),  0 ≤ Tᵢ ≤ 1.

After finishing learning about discrete mathematics please click Topic 21 – Introduction to Computation and Programming using Python to continue.

 

Topic 19 – Probability & Statistics

Why do I need to learn about probability and statistics?

Probability and statistics are fundamental tools for understanding many modern theories and techniques such as artificial intelligence, machine learning, deep learning, data mining, security, digital imagine processing and natural language processing.

What can I do after finishing learning about probability and statistics?

You will be prepared to learn modern theories and techniques to create modern security, machine learning, data mining, image processing or natural language processing software.

That sounds useful! What should I do now?

Please read
– this Dimitri P. Bertsekas and John N. Tsitsiklis (2008). Introduction to Probability. Athena Scientific book, or
– this Hossein Pishro-Nik (2014). Introduction to Probability, Statistics, and Random Processes. Kappa Research, LLC book.

Alternatively, please read these notes, then watch
– this MIT 6.041SC – Probabilistic Systems Analysis and Applied Probability, Fall 2011 course (Lecture Notes), and
– this MIT RES.6-012 – Introduction to Probability, Spring 2018 course (Lecture Notes).

Probability and statistics are quite difficult topics so you may need to learn it 2 or 3 times using different sources to actually master the concepts.

Terminology Review:

  • Sample Space (Ω): Set of possible outcomes.
  • Event: Subset of the sample space.
  • Probability Law: Law specified by giving the probabilities of all possible outcomes.
  • Probability Model = Sample Space + Probability Law.
  • Probability Axioms: Nonnegativity: P(A) ≥ 0; Normalization: P(Ω)=1; Additivity: If A ∩ B = Ø, then P(A ∪ B)= P(A)+ P(B).
  • Conditional Probability: P(A|B) = P (A ∩ B) / P(B).
  • Multiplication Rule.
  • Total Probability Theorem.
  • Bayes’ Rule: Given P(Aᵢ) (initial “beliefs” ) and P (B|Aᵢ). P(Aᵢ|B) = ? (revise “beliefs”, given that B occurred).
  • Independence of Two Events: P(B|A) = P(B)  or P(A ∩ B) = P(A) · P(B).
  • Discrete Uniform Law: P(A) = Number of elements of A / Total number of sample points = |A| / |Ω|
  • Basic Counting Principle: r stages, nᵢ choices at stage i, number of choices = n₁ n₂ · · · nᵣ
  • Permutations: Number of ways of ordering elements. No repetition for n slots: [n] [n-1] [n-2] [] [] [] [] [1].
  • Combinations: number of k-element subsets of a given n-element set.
  • Binomial Probabilities. P (any sequence) = p# ʰᵉᵃᵈˢ(1 − p)# ᵗᵃᶦˡˢ.
  • Random Variable: a function from the sample space to the real numbers. It is not random. It is not a variable. It is a function: f: Ω ℝ.
  • Discrete Random Variable.
  • Bernoulli Random Variable (Indicator Random Variable): f: Ω {1, 0}.
  • Probability Mass Function: P(X = 𝑥) or Pₓ(𝑥): A function from the sample space to [0..1] that produces the likelihood that the value of X equals to 𝑥. PMF gives probabilities. 0 ≤ PMF ≤ 1. All the values of PMF must sum to 1.
  • Geometric Random Variable: X = Number of coin tosses until first head.
  • Geometric Probability Mass Function: (1 − p)ᵏ−¹p.
  • Binomial Random Variable: X = Number of heads (e.g. 2) in n (e.g. 4) independent coin tosses.
  • Binomial Probability Mass Function: Combination of (k, n)pᵏ(1 − p)ⁿ−ᵏ.
  • Expectation: E[X] = Sum of xpₓ(x).
  • Let Y=g(X): E[Y] = E[g(X)] = Sum of g(x)pₓ(x). Caution: E[g(X)] ≠ g(E[X]) in general.
  • Variance: var(X) = E[(X−E[X])²].
  • var(aX)=a²var(X).
  • X and Y are independent: var(X+Y) = var(X) + var(Y). Caution: var(X+Y) ≠ var(X) + var(Y) in general.
  • Standard Deviation: Square root of var(X).
  • Conditional Probability Mass Function: P(X=x|A).
  • Conditional Expectation: E[X|A].
  • Joint Probability Mass Function: Pₓᵧ(x,y) = P(X=x, Y=y) = P((X=x) and (Y=y)).
  • Marginal Probability Mass Function: P(x) = Σy Pₓᵧ(x,y).
  • Total Expectation Theorem: E[X|Y = y].
  • Independent Random Variables: P(X=x, Y=y)=P(X=xP(Y=y).
  • Expectation of Multiple Random Variables: E[X + Y + Z] = E[X] + E[Y] + E[Z].
  • Binomial Random Variable: X = Sum of Bernoulli Random Variables.
  • The Hat Problem.
  • Continuous Random Variables.
  • Probability Density Function: P(a ≤ X ≤ b) or Pₓ(𝑥). (a ≤ X ≤ b) means X function produces a real number value within the [a, b] range. Programming language: X(outcome) = 𝑥, where a ≤ 𝑥 ≤ b. PDF does NOT give probabilities. PDF does NOT have to be less than 1. PDF gives probabilities per unit length. The total area under PDF must be 1.
  • Continuous Uniform Random Variable.
  • Cumulative Distribution Function: P(X ≤ b). (X ≤ b) means X function produces a real number value within the [-∞, b] range. Programming language: X(outcome) = 𝑥, where 𝑥 ≤ b.
  • Normal Random Variable, Gaussian Distribution, Normal Distribution.
  • Joint Probability Density Function.
  • Conditional Probability Density Function.
  • Marginal Probability Density Function.
  • Derived Distributions.
  • Convolution: A mathematical operation on two functions (f and g) that produces a third function.
  • Covariance.
  • Correlation Coefficient.
  • Conditional Expectation: E[X | Y = y] = Sum of xpₓ|ᵧ(x|y). If Y is unknown then E[X | Y] is a random variable, i.e. a function of Y. So E[X | Y] also has its expectation and variance.
  • Law of Iterated Expectations: E[E[X | Y]] = E[X].
  • Conditional Variance: var(X | Y) is a function of Y.
  • Law of Total Variance: var(X) =  E[var(X | Y)] +var([E[X | Y]).
  • Bernoulli Process:  A sequence of independent Bernoulli trials. At each trial, i: P(Xᵢ=1)=p, P(Xᵢ=0)=1−p.
  • Poisson Process.
  • Markov Chain.
  • Markov’s Inequality: P(X ≥ a) ≤ E(X)/a (X > 0, a > 0).
  • Chebyshev’s Inequality: P(|X – E(X)| ≥ a) ≤ var(X)/a².
  • The Law of Large Numbers.
  • Central Limit Theorem.
  • Model Building: X = a·S + W, where W: noise, know S, assume W, observe X, find a.
  • Inferring: Know a, assume W, observe X, find S.
  • Hypothesis Testing: Know a, observe X, find S. S can take one of few possible values.
  • Estimation: Know a, observe X, find S. S can take unlimited possible values.
  • Bayesian Inference can be used for both Hypothesis Testing and Estimation, leverages Bayes rule. Output is posterior distribution. Single answer can be Maximum a posteriori probability (MAP) or Conditional Expectation.
  • Least Mean Squares Estimation of Θ based on X.
  • Classical Inference can be used for both Hypothesis Testing and Estimation, leverages .
  • Maximum Likelihood Estimation: Given data the maximum likelihood estimate (MLE) for the parameter p is the value of p that maximizes the likelihood P (data | p). P (data | p) is the likelihood function. For continuous distributions, we use the probability density function to define the likelihood.
  • Log likelihood: the natural log of the likelihood function.

After finishing learning about probability and statistics please click Topic 20 – Discrete Mathematics to continue.

 

Topic 18 – Linear Algebra

Why do I need to learn about linear algebra?

Linear algebra is a fundamental tool for understanding many modern theories and techniques such as artificial intelligence, machine learning, deep learning, data mining, security, digital imagine processing, and natural language processing.

What can I do after finishing learning about linear algebra?

You will be prepared to learn modern theories and techniques to create modern security, machine learning, data mining, image processing or natural language processing software.

That sounds useful! What should I do now?

Please read this David C. Lay et al. (2022). Linear Algebra and Its Applications. Pearson Education book.

Alternatively, please watch this MIT 18.06 – Linear Algebra, Spring 2005 course. While watching this course please do read Lecture Notes, and this Gilbert Strang (2016). Introduction to Linear Algebra. Wellesley-Cambridge Press book for better understanding some complex topics.

Terminology Review:

  • Linear Equations.
  • Row Picture.
  • Column Picture.
  • Triangular matrix is a square matrix where all the values above or below the diagonal are zero.
  • Lower Triangular Matrix.
  • Upper Triangular Matrix.
  • Diagonal matrix is a matrix in which the entries outside the main diagonal are all zero.
  • Tridiagonal Matrix.
  • Identity Matrix.
  • Transpose of a Matrix.
  • Symmetric Matrix.
  • Pivot Columns.
  • Pivot Variables.
  • Augmented Matrix.
  • Echelon Form.
  • Reduced Row Echelon Form.
  • Elimination Matrices.
  • Inverse Matrix.
  • Factorization into A = LU.
  • Free Columns.
  • Free Variables.
  • Gauss-Jordan Elimination.
  • Vector Spaces.
  • Rank of a Matrix.
  • Permutation Matrix.
  • Subspaces.
  • Column space, C(A) consists of all combinations of the columns of A and is a vector space in ℝᵐ.
  • Nullspace, N(A) consists of all solutions x of the equation Ax = 0 and lies in ℝⁿ.
  • Row space, C(Aᵀ) consists of all combinations of the row vectors of A and form a subspace of ℝⁿ. We equate this with C(Aᵀ), the column space of the transpose of A.
  • The left nullspace of A, N(Aᵀ) is the nullspace of Aᵀ. This is a subspace of ℝᵐ.
  • Linearly Dependent Vectors.
  • Linearly Independent Vectors.
  • Linear Span of Vectors.
  • A basis for a vector space is a sequence of vectors with two properties:
    • They are independent.
    • They span the vector space.
  • Given a space, every basis for that space has the same number of vectors; that number is the dimension of the space.
  • Dimension of a Vector Space.
  • Dot Product.
  • Orthogonal Vectors.
  • Orthogonal Subspaces.
  • Row space of A is orthogonal to  nullspace of A.
  • Matrix Spaces.
  • Rank-One Matrix.
  • Orthogonal Complements.
  • Projection matrix: P = A(AᵀA)⁻¹Aᵀ. Properties of projection matrix: Pᵀ = P and P² = P. Projection component: Pb = A(AᵀA)⁻¹Aᵀb = (AᵀA)⁻¹(Aᵀb)A.
  • Linear regression, least squares, and normal equations: Instead of solving Ax = b we solve Ax̂ = p or AᵀAx̂ = Aᵀb.
  • Linear Regression.
  • Orthogonal Matrix.
  • Orthogonal Basis.
  • Orthonormal Vectors.
  • Orthonormal Basis.
  • Orthogonal Subspaces.
  • Gram–Schmidt process.
  • Determinant: A number associated with any square matrix letting us know whether the matrix is invertible, the formula for the inverse matrix, the volume of the parallelepiped whose edges are the column vectors of A. The determinant of a triangular matrix is the product of the diagonal entries (pivots).
  • The big formula for computing the determinant.
  • The cofactor formula rewrites the big formula for the determinant of an n by n matrix in terms of the determinants of smaller matrices.
  • Formula for Inverse Matrix.
  • Cramer’s Rule.
  • Eigenvectors are vectors for which Ax is parallel to x: Ax = λx. λ is an eigenvalue of A, det(A − λI)= 0.
  • Diagonalizing a matrix: AS = SΛ 🡲 S⁻¹AS = Λ 🡲 A = SΛS⁻¹. S: matrix of n linearly independent eigenvectors. Λ: matrix of eigenvalues on diagonal.
  • Matrix exponential eᴬᵗ.
  • Markov matrices: All entries are non-negative and each column adds to 1.
  • Symmetric matrices: Aᵀ = A.
  • Positive definite matrices: all eigenvalues are positive or all pivots are positive or all determinants are positive.
  • Similar matrices: A and B = M⁻¹AM.
  • Singular value decomposition (SVD) of a matrix: A = UΣVᵀ, where U is orthogonal, Σ is diagonal, and V is orthogonal.
  • Linear Transformations: T(v + w) = T(v)+ T(w) and T(cv)= cT(v) . For any linear transformation T we can find a matrix A so that T(v) = Av.

After finishing learning about linear algebra please click Topic 19 – Probability & Statistics to continue.

 

Topic 17 – Calculus

Why do I need to learn about calculus?

Calculus is a fundamental tool for understanding modern theories and techniques to create software such as artificial intelligence, machine learning, deep learning, data mining, security, digital imagine processing and natural language processing.

What can I do after finishing learning about calculus?

You will then be prepared to be able to learn modern theories and techniques to create security, data mining, image processing or natural language processing software.

What should I do now?

Please watch this MIT 18.01 – Single Variable Calculus, Fall 2007 course (Lecture Notes).  When you watch this course please refer to this George F. Simmons (1996). Calculus With Analytic Geometry. McGraw-Hill book when you have difficulties with understanding some lectures.

Alternatively, you can read
– this George F. Simmons (1996). Calculus With Analytic Geometry. McGraw-Hill book or
– this C. Henry Edwards David E. Penney (2008). Calculus – Early Transcendentals. Pearson book or
– this George B. Thomas et al. (2018). Thomas’ Calculus: Early Transcendentals. Pearson Education book or
– this James Stewart et al. (2020). Calculus: Early Transcendentals. Cengage Learning book.

After that please watch this MIT 18.02 – Multivariable Calculus, Fall 2007 course (Lecture Notes). You will need some Linear Algebra knowledge (specifically Inverse Matrix and Determinant) to understand Multivariable Calculus.

After that please watch this Highlights of Calculus course to review many core concepts of Calculus.

After that please watch this MIT 18.03 – Differential Equations, Spring 2006 course (Readings). When you watch this course please refer to this C. Henry Edwards and David E. Penney (2013). Elementary Differential Equations with Boundary Value Problems. Pearson Education book when you have difficulties with understanding some lectures.

What is the difference between calculus and analysis?

Calculus means a method of calculation. Calculus is about differentiation and integration.

Real analysis includes calculus, and other topics that may not be of interest to engineers but of interest to pure mathematicians such as measure theory, lebesgue integral, topology, functional analysis, complex analysis, PDE, ODE, proofs of theorems.

What does early transcendentals mean?

Transcendentals in this context refers to functions like the exponential, logarithmic, and trigonometric functions.

The early transcendentals approach means that the book introduces polynomial, rational functions, exponential, logarithmic, and trigonometric functions at the beginning, then use them as examples when developing differential calculus. This approach is good for students who do not need to take much rigorous math.

The classical approach is the late transcendentals. It means that the book develops differential calculus using only polynomials and rational functions as examples, then introduces the other functions afterwards. This approach is good for students who need to understand more rigorous definitions of the transcendental functions.

Single Variable Calculus Terminology Review:

  • Slope.
  • Derivative.
  • Rate of Change.
  • Limit.
  • Continuity.
  • Chain Rule.
  • Implicit Differentiation.
  • Linear Approximations.
  • Quadratic Approximations.
  • Critical Point.
  • Newton’s Method.
  • Mean Value Theorem.
  • Differentials.
  • Antiderivatives.
  • Differential Equations.
  • Separation of Variables.
  • First Fundamental Theorem of Calculus.
  • Indeterminate Forms.
  • L’Hospital’s Rule.
  • Improper Integrals.
  • Infinite Series.
  • Taylor’s Series.
  • Taylor’s Formula.

Multivariable Calculus Terminology Review:

  • Vectors.
  • Dot Product.
  • Cross Product.
  • Inverse Matrix.
  • Determinant.
  • Equations of Planes: ax + by + cz = d
  • Parametric Equations = as trajectory of a moving point.
  • Velocity Vector.
  • Acceleration Vector.
  • Level Curve.
  • Tangent Plane.
  • Saddle Point.
  • Functions of Several Variables.
  • Partial Derivatives.
  • Second Derivatives.
  • Second Derivative Test.
  • Differentials.
  • Gradients.
  • Directional Derivatives.
  • Lagrange Multipliers.
  • Power Series.
  • Geometric Series.
  • Euler’s Formula.

Differential Equation Terminology Review:

  • Isocline (equal slope): a line which joins neighboring points with the same gradient.
  • Direction Fields.
  • Integral Curve: The graph of a particular solution of a differential equation.
  • IVP: Initial Value Problem.
  • Euler’s Numerical Method.
  • Linear First Order ODE Standard Form: y′ + p(x)y = q(x)
  • Integrating Factor or Euler Multiplier: The method is based on (ux)’ = ux’ + u’x.
  • Substitution: to change variables to end up with a simpler equation to solve.
  • Bernoulli equations: y′ + p(x)y = q(x)yⁿ.
  • Homogeneous equations: y′ = F (y/x)
  • Autonomous equations: dx/dt = f(x). If we think of as time, the naming comes from the fact that the equation is independent of time.

After finishing learning about calculus please click Topic 18 – Linear Algebra to continue.