# Topic 20 – Discrete Mathematics

Why do I need to learn about discrete mathematics?

Discrete mathematics is a fundamental tool for understanding many theories and techniques behind artificial intelligence, machine learning, deep learning, data mining, security, digital imagine processing and natural language processing.

The problem-solving techniques and computation thinking introduced in discrete mathematics are necessary for creating complicated software too.

What can I do after finishing learning discrete mathematics?

You will be prepared to learn modern theories and techniques to create modern security, machine learning, data mining, image processing or natural language processing software.

What should I do now?

Alternatively, please watch this MIT 6.042J – Mathematics for Computer Science, Fall 2010 course (Textbook).

Terminology Review:

• Statement: An assertion that is either true or false.
• Mathematical Statements.
• Mathematical Proof: A convincing argument about the accuracy of a statement.
• If p, then q. p is hypothesis. q is conclusion.
• Proposition: A true statement.
• Theorem: An important proposition.
• Lemmas: Supporting propositions.
• Logic: A language for reasoning that contains a collection of rules that we use when doing logical reasoning.
• Propositional Logic: A logic about truth and falsity of statements.
• Logic Connectives: Not (Negation), And (Conjunction), Or (Disjunction), If then (Implication), If and only if (Equivalence).
• Truth Table.
• Contrapositive of Proposition: The contrapositive of p q is the proposition ¬q ¬p.
• Modus Ponens: If both P  Q and P hold, then Q can be concluded.
• Predicate: A property of some objects or a relationship among objects represented by the variables.
• Quantifier: Tells how many objects have a certain property.
• Mathematical Induction: Base Case, Inductive Case.
• Recursion: A Base, An Recursive Step.
• Sum Example: Annuity.
• Set.
• Subset.
• Set Operations: A ∪ B, A ∩ B, A ⊂ U: A’ = {x : x ∈ U and x ∉ A}, A \ B = A ∩ B’ = {x : x ∈ A and x ∉ B}.
• Cartesian Product: A × B = {(a; b) : a ∈ A and b ∈ B};
• A binary relation (or just relation) from X to Y is a subset R ⊆ X × Y. To describe the relation R, we  may list the collection of all ordered pairs (x, y) such that x is related to y by R.
• A mapping or function f ⊂ A × B from a set A to a set B to be the special type of relation in which for each element a ∈ A there is a unique element b ∈ B such that (a, b) ∈ f.
• Equivalence Relation.
• Equivalence Class.
• Partition.
• A state machine is a binary relation on a set, the elements of the set are called states, the relation is called the transition relation, and an arrow in the graph of the transition relation is called a transition.
• Greatest Common Divisor.
• Division Algorithm.
• Prime Numbers.
• The Fundamental Theorem of Arithmetic: Let n be an integer such that n > 1. Then n can be factored as a product of prime numbers. n = p₁p₂ ∙ ∙ ∙ pₖ
• Congruence: a is congruent to b modulo n if n | (a – b), written a ≡ b (mod n).
• Fermat’s Little Theorem.
• Stirling’s Approximation.
• Probability.
• Example: The Monty Hall Problem.
• The Four Step Method: (1) Find the Sample Space (Set of possible outcomes), (2) Define Events of Interest (Subset of the sample space),  (3) Determine Outcome Probabilities, (4) Compute Event Probabilities.
• A tree diagram is a graphical tool that can help us work through the four step approach when the number of outcomes is not too large or the problem is nicely structured.
• Example: The Strange Dice.
• Conditional Probability: P(A|B) = P (A ∩ B) / P(B).
• A conditional probability P(B|A) is called a posteriori if event B precedes event A in time.
• Example: Medical Testing.
• Independence: P(B|A) = P(B)  or P(A∩B) = P(A) · P(B).
• Mutual Independence: The probability of each event is the same no matter which of the other events has occurred.
• Pairwise Independence: Any two events are independent.
• Example: The Birthday Problem.
• The birthday paradox refers to the counterintuitive fact that only 23 people are needed for that probability to exceed 50%, for 70 people: P = 99.9%.
• Bernoulli Random Variable (Indicator Random Variable): f: Ω {1, 0}.
• Binomial Random Variable: A number of successes in an experiment consisting of n trails. P (X = x) = [(n!)/((x!) · (n-x)!))]pˣ(1 − p)ⁿ − ˣ
• Expectation (Average, Mean). E = Sum(R(w) · P(w)) = Sum(x · P(X = x)).
• Median P(R < x) ≤ 1/2 and P(R>x) < 1/2.
• Example: Splitting the Pot.
• Mean Time to Failure: If a system independently fails at each time step with probability p, then the expected number of steps up to the first failure is 1/p.
• Linearity of Expectation.
• Example: The Hat Check Problem.
• Example: Benchmark: E(Z/R) = 1.2 does NOT mean that E(Z) = 1.2E(R).
• Variance: var(X) = E[(X−E[X])²].
• Kurtosis: E[(X−E[X])⁴].
• Markov’s Theorem: P(R ≥ x) ≤ E(R)/x (R > 0, x > 0).
• Chebyshev’s Theorem: P(|R – E(R)| ≥ x) ≤ var(R)/x². Boundary of the probability of deviation from the mean.
• The Chernoff Bound: P(T ≥ c·E(T)) ≤ e−ᶻ·ᴱ⁽ᵀ⁾, where z = clnc − c + 1, T = Sum(Tᵢ),  0 ≤ Tᵢ ≤ 1.

After finishing learning about discrete mathematics please click Topic 21 – Introduction to Computation and Programming using Python to continue.

# How to Be Creative in Software Engineering Research

#### Motivation:

You plan to do a software engineering research and want to make some minor authentic contributions.

#### Guidelines:

1. You may present a problem and corresponding existing solution using your own understanding.

A method to do this is to

• Read papers and books about a concept (for example backpropagation or distributed transactions), then
• Write down the concept and some related terminologies, then
• Try explain the concept with examples using your speech, and
• Write down your transcript, then

2. You may try to replicate an existing result. When doing this you may need to make minor changes due to specific technology or environment conditions. Then you can compare your result with the original result.

For example, you may compare your business workflows with existing business workflows to determine which solution may solve a specific problem faster or more reliable.

In case you do not make any minor changes, the replication process may also inspire you some technical ideas. You may get errors while replicating the result. Try to fix these errors and document your experience.

For example you may get errors when upgrading an existing system from Node.js 12 to Node.js 18, or when upgrading an existing deep learning model code from Python 3.9 to Python 3.11. Try to fix the errors, then document your inputs, errors and solution.

3. The core idea to be creative is to do something that you have not done before. You may use trial and error method but be sure that you have a hard unsolved problem first. Trying to search for partial solutions to a problem will inspire you some ideas which may be the starting point for your minor authentic contributions.

4. Each individual’s creativity will need to be developed over time rather than in accordance with any kind of set formula.

# How to Pose a Software Engineering Research Question?

##### Motivation:

You begin to do software engineering research.
You want to have a research question.
You have several ideas but you wonder whether they are good enough for conducting a research.

##### Suggestions:

1. Your question should contain well-defined terms.
Are you talking about something that everyone mostly agree about its definition and core characteristics.
For example, are you talking about Microservices, Event Sourcing, Relational Database, NoSQL, Unit Tests, Go Language, Speech Recognition, Speech Synthesis?

2. Your question should have a purpose and specific audience.
Why should the audience be interested in your question?
For example, are they going to upgrade a an event sourcing system? Are they going to apply test automation in our project?
Do they have specific security issues with their system?
Have they gotten specific performance issues with their system?
Are they going to build a new identity management platform for their legacy system?
Do they need to accelerate the development of a portal for their legacy system?
Are they going to integrate voice search into their existing system?

# Topic 19 – Probability & Statistics

Why do I need to learn about probability and statistics?

Probability and statistics are fundamental tools for understanding many modern theories and techniques such as artificial intelligence, machine learning, deep learning, data mining, security, digital imagine processing and natural language processing.

What can I do after finishing learning about probability and statistics?

You will be prepared to learn modern theories and techniques to create modern security, machine learning, data mining, image processing or natural language processing software.

That sounds useful! What should I do now?

– this MIT 6.041SC – Probabilistic Systems Analysis and Applied Probability, Fall 2011 course (Lecture Notes), and
– this MIT RES.6-012 – Introduction to Probability, Spring 2018 course (Lecture Notes).

Probability and statistics are quite difficult topics so you may need to learn it 2 or 3 times using different sources to actually master the concepts.

Terminology Review:

• Sample Space (Ω): Set of possible outcomes.
• Event: Subset of the sample space.
• Probability Law: Law specified by giving the probabilities of all possible outcomes.
• Probability Model = Sample Space + Probability Law.
• Probability Axioms: Nonnegativity: P(A) ≥ 0; Normalization: P(Ω)=1; Additivity: If A ∩ B = Ø, then P(A ∪ B)= P(A)+ P(B).
• Conditional Probability: P(A|B) = P (A ∩ B) / P(B).
• Multiplication Rule.
• Total Probability Theorem.
• Bayes’ Rule: Given P(Aᵢ) (initial “beliefs” ) and P (B|Aᵢ). P(Aᵢ|B) = ? (revise “beliefs”, given that B occurred).
• Independence of Two Events: P(B|A) = P(B)  or P(A ∩ B) = P(A) · P(B).
• Discrete Uniform Law: P(A) = Number of elements of A / Total number of sample points = |A| / |Ω|
• Basic Counting Principle: r stages, nᵢ choices at stage i, number of choices = n₁ n₂ · · · nᵣ
• Permutations: Number of ways of ordering elements. No repetition for n slots: [n] [n-1] [n-2] [] [] [] [] .
• Combinations: number of k-element subsets of a given n-element set.
• Binomial Probabilities. P (any sequence) = p# ʰᵉᵃᵈˢ(1 − p)# ᵗᵃᶦˡˢ.
• Random Variable: a function from the sample space to the real numbers. It is not random. It is not a variable. It is a function: f: Ω ℝ.
• Discrete Random Variable.
• Bernoulli Random Variable (Indicator Random Variable): f: Ω {1, 0}.
• Probability Mass Function: P(X = 𝑥) or Pₓ(𝑥): A function from the sample space to [0..1] that produces the likelihood that the value of X equals to 𝑥. PMF gives probabilities. 0 ≤ PMF ≤ 1. All the values of PMF must sum to 1.
• Geometric Random Variable: X = Number of coin tosses until first head.
• Geometric Probability Mass Function: (1 − p)ᵏ−¹p.
• Binomial Random Variable: X = Number of heads (e.g. 2) in n (e.g. 4) independent coin tosses.
• Binomial Probability Mass Function: Combination of (k, n)pᵏ(1 − p)ⁿ−ᵏ.
• Expectation: E[X] = Sum of xpₓ(x).
• Let Y=g(X): E[Y] = E[g(X)] = Sum of g(x)pₓ(x). Caution: E[g(X)] ≠ g(E[X]) in general.
• Variance: var(X) = E[(X−E[X])²].
• var(aX)=a²var(X).
• X and Y are independent: var(X+Y) = var(X) + var(Y). Caution: var(X+Y) ≠ var(X) + var(Y) in general.
• Standard Deviation: Square root of var(X).
• Conditional Probability Mass Function: P(X=x|A).
• Conditional Expectation: E[X|A].
• Joint Probability Mass Function: Pₓᵧ(x,y) = P(X=x, Y=y) = P((X=x) and (Y=y)).
• Marginal Probability Mass Function: P(x) = Σy Pₓᵧ(x,y).
• Total Expectation Theorem: E[X|Y = y].
• Independent Random Variables: P(X=x, Y=y)=P(X=xP(Y=y).
• Expectation of Multiple Random Variables: E[X + Y + Z] = E[X] + E[Y] + E[Z].
• Binomial Random Variable: X = Sum of Bernoulli Random Variables.
• The Hat Problem.
• Continuous Random Variables.
• Probability Density Function: P(a ≤ X ≤ b) or Pₓ(𝑥). (a ≤ X ≤ b) means X function produces a real number value within the [a, b] range. Programming language: X(outcome) = 𝑥, where a ≤ 𝑥 ≤ b. PDF does NOT give probabilities. PDF does NOT have to be less than 1. PDF gives probabilities per unit length. The total area under PDF must be 1.
• Continuous Uniform Random Variable.
• Cumulative Distribution Function: P(X ≤ b). (X ≤ b) means X function produces a real number value within the [-∞, b] range. Programming language: X(outcome) = 𝑥, where 𝑥 ≤ b.
• Normal Random Variable, Gaussian Distribution, Normal Distribution.
• Joint Probability Density Function.
• Conditional Probability Density Function.
• Marginal Probability Density Function.
• Derived Distributions.
• Convolution: A mathematical operation on two functions (f and g) that produces a third function.
• Covariance.
• Correlation Coefficient.
• Conditional Expectation: E[X | Y = y] = Sum of xpₓ|ᵧ(x|y). If Y is unknown then E[X | Y] is a random variable, i.e. a function of Y. So E[X | Y] also has its expectation and variance.
• Law of Iterated Expectations: E[E[X | Y]] = E[X].
• Conditional Variance: var(X | Y) is a function of Y.
• Law of Total Variance: var(X) =  E[var(X | Y)] +var([E[X | Y]).
• Bernoulli Process:  A sequence of independent Bernoulli trials. At each trial, i: P(Xᵢ=1)=p, P(Xᵢ=0)=1−p.
• Poisson Process.
• Markov Chain.
• Markov’s Inequality: P(X ≥ a) ≤ E(X)/a (X > 0, a > 0).
• Chebyshev’s Inequality: P(|X – E(X)| ≥ a) ≤ var(X)/a².
• The Law of Large Numbers.
• Central Limit Theorem.
• Model Building: X = a·S + W, where W: noise, know S, assume W, observe X, find a.
• Inferring: Know a, assume W, observe X, find S.
• Hypothesis Testing: Know a, observe X, find S. S can take one of few possible values.
• Estimation: Know a, observe X, find S. S can take unlimited possible values.
• Bayesian Inference can be used for both Hypothesis Testing and Estimation, leverages Bayes rule. Output is posterior distribution. Single answer can be Maximum a posteriori probability (MAP) or Conditional Expectation.
• Least Mean Squares Estimation of Θ based on X.
• Classical Inference can be used for both Hypothesis Testing and Estimation, leverages .
• Maximum Likelihood Estimation: Given data the maximum likelihood estimate (MLE) for the parameter p is the value of p that maximizes the likelihood P (data | p). P (data | p) is the likelihood function. For continuous distributions, we use the probability density function to define the likelihood.
• Log likelihood: the natural log of the likelihood function.

After finishing learning about probability and statistics please click Topic 20 – Discrete Mathematics to continue.

# Guide to Citing & Referencing

What is referencing

When writing a piece of academic work, you must acknowledge any sources you have used. You do this by including a ‘citation’ within your text (usually a number or an author’s name) next to the material you have used. This brief citation leads your reader to a full reference to the work, which you include in your list of references at the end of your text. These references should allow anyone reading your work to identify and find the material to which you have referred. You need to be consistent in the way you reference your sources by following an established referencing system and style.