Category Archives: Software Engineering Research

Algorithms, Computer Science Curriculum, Software Engineering Curriculum, Software Engineering Research

Topic 27 – Introduction to Blockchain

September 13, 2025 admin Leave a comment

Why do I need to learn about blockchain?

Blockchain offers an interesting and unique solution for applications that require distributed consensus.

Today, a key skill for software developers is the ability to use blockchain-based platforms and tools to solve real-world problems involving distributed agreements.

What can I do after finishing learning about blockchain?

You will be to create decentralized applications (dApps) using platforms like Ethereum or Hyperledger, and smart contract programming language like Solidity.

That sounds fun! What should I do now?

First, please read this book to learn about the core protocols and algorithms in cryptography: Bruce Schneier (1996). Applied Cryptography – Protocols, Algorithms and Source Code in C. Wiley.

After that, please read this book to learn about the core concepts of Bitcoin: Arvind Narayanan et al. (2016). Bitcoin and Cryptocurrency Technologies – A Comprehensive Introduction. Princeton University Press.

After that, please read the books below to learn programming with Bitcoin:

After that, please read this book to learn programming with Ethereum: Andreas M. Antonopoulos and Gavin Wood (2018). Mastering Ethereum. O’Reilly Media.

After that, please read this book to learn programming blockchain using Hyperledger Fabric: Matt Zand et al. (2021). Hands-On Smart Contract Development with Hyperledger Fabric V2. O’Reilly Media.

After that, please audit this course to gain some ideas about the application of blockchain: MIT 15.S12 Blockchain and Money, Fall 2018 (Lecture Slides).

Terminology Review:

Public Keys.
Private Keys.
Digital Signatures.
Digital Signature Scheme.
Cryptographic Hash Functions.
Merkle Tree: Binary Data Tree with Hashes.
Bitcoin: Digital money ecosystem.
bitcoin: Unit of currency.
Bitcoin Users.
Bitcoin Wallets.
Bitcoin Addresses.
Bitcoin Transactions.
Blockchain Explorer.
Bitcoin Mining, Miners.
The Chain of Transactions.
Bitcoin Core: The reference implementation of the bitcoin system.
Bitcoin Exchanges.
Bitcoin Network.
Double‐Spending Attacks.
Block Chain: Timestamped Append-Only Log.
Sybil Attack: Copies of nodes that a malicious adversary can create to look like there are a lot of different participants.
Proof of Work: Find a number, or nonce, such that H(nonce || prev_hash || tx || tx || … || tx) < target.
51‐Percent Attack.
Account-Based Ledger: The ledger keeps track of account balances.
Unspent Transaction Output: A transaction output that can be used as input in a new transaction.
Transaction-Based Ledger: The ledger keeps track of individual transaction outputs.
Coinbase Transactions.
Bitcoin Scripting Language.
Turing Incompleteness.
Stateless Verification.
Candidate Block.
Genesis Block.
Ethereum: The world computer.
Ether.
Externally Owned Accounts (EOAs).
Contract Accounts.
Solidity.
Smart Contracts.
Ethereum Clients.
Ethereum Networks.
Permissionless Blockchain.
Permissioned Blockchain.

After finishing blockchain, please click on Topic 28 – Introduction to AI Agent Development to continue.

Software Engineering Curriculum, Software Engineering Research

Topic 26 – Introduction to Cloud Computing

February 1, 2024 admin Leave a comment

Why do I need to learn about cloud computing?

Because you will develop software systems that often leverage cloud services for quick deployment, scalable computation, and storage.

What can I do after finishing learning cloud computing?

You will be able to

deploy software systems to public clouds,
build your own private cloud,
develop software using cloud plaftforms,
develop software using cloud services,
leverage cloud services for training and deploying machine learning models,
leverage cloud services for big data analytics and reporting.

What should I do now?

First, please read this book to learn about the core concepts of cloud computing: Dan C. Marinescu (2022). Cloud Computing – Theory and Practice. Morgan Kaufmann.

After that, please read this book to gain hands-on experience with Amazon cloud services: Andreas Wittig and Michael Wittig (2023). Amazon Web Services in Action. Manning Publications.

After that, please read this book to gain hands-on experience with VMware private cloud products: Nick Marshall et al. (2018). Mastering VMware vSphere 6.7. Sybex.

After that, please read this book to gain hands-on experience with Spark: Bill Chambers and Matei Zaharia (2018). Spark: The Definitive Guide. O’Reilly Media.

After that, please read the books below to gain hands-on experience with the Salesforce platform:

Terminology Review:

Multitenancy.
Cloud Computing.
Grid Computing.
Fog Computing.
Datacenters, Public Clouds, Private Clouds, Hybrid Clouds.
∞×∞
Infrastructure as a Service.
Hypervisors, Virtual Machines.
Software-Defined Networking: Virtual Switches, Virtual Distributed Switches, Virtual Routers, Virtual LANs.
∞×∞
Direct-Attached Storage, RAID, File Systems.
Network-Attached Storage (NAS), Storage Area Network (SAN).
Block Storage Services, File Storage Services, Object Storage Services, Relational Database Services, NoSQL Storage Services, Vector Storage Services.
The Google File System.
Amazon Aurora.
Bigtable.
Spanner.
∞×∞
Programming in the Cloud: MapReduce, Hadoop, Spark.
∞×∞
Cluster Management: Borg and Omega.
∞×∞
Cloud-Native Applications: Microservices, Containers (Docker), Container Orchestration (Kubernetes, Helm), Event-Driven Architecture, DevOps, Identity and Access Management, Observability and Monitoring (Prometheus), Infrastructure as Code (Terraform, CloudFormation).
∞×∞
Serverless Computing (Function as a Service): OpenLambda, Firecracker, AWS Serverless Application Model, Front End, Compute, Workflows, Storage, Identity and Access Management, Reporting.
∞×∞
Platform as a Service: Salesforce Platform, Google App Engine, Azure Web Apps, AWS Elastic Beanstalk.
∞×∞
Software as a Service.

After finishing cloud computing, please click on Topic 27 – Introduction to Blockchain to continue.

Algorithms, Computer Science Curriculum, Software Engineering Curriculum, Software Engineering Research

Topic 25 – Introduction to Distributed Systems

February 1, 2024 admin Leave a comment

Why do I need to learn about distributed systems?

Distributed systems provide the foundation for understanding the theories and techniques behind cloud computing and blockchain technology.

The architectures, protocols, and algorithms introduced in distributed systems are also necessary for creating complex software.

What can I do after finishing learning distributed systems?

You will be able to design software that can

tolerate faults,
shard data,
handle massive number of requests, and
perform expensive computations.

You will also be prepared to learn about cloud computing and blockchain technology.

What should I do now?

First, please audit this course to familiarize yourself with the core concepts and protocols of distributed systems: Distributed Systems, UC Santa Cruz Baskin School of Engineering, 2021.

Afterward, please audit the course and read the books below to learn how to design large-scale distributed systems:

Terminology Review:

Fault Tolerance.
Consistency.
System Models.
Failure Detectors.
Communication.
Ordering.
∞×∞
State Machine Replication.
Primary-Backup Replication.
Bully Algorithm.
Ring Election.
Multi-Leader Replication.
Leaderless Replication.
Cristian’s Algorithm.
Berkeley Algorithm.
Lamport Clocks.
Vector Clocks.
Version Vectors.
Chain Replication.
∞×∞
Consensus.
FLP Theorem.
Raft.
Paxos.
Viewstamped Replication.
Zab.
∞×∞
Consistent Hashing.
Merkle Tree.
CAP.
∞×∞
Distributed Transactions.
ACID.
Two-Phase Commit.
Three-Phase Commit.
Serializability.
Two-Phase Locking.
Distributed Locks.
∞×∞
Consistency Models.
Linearizability.
∞×∞
Distributed Architectures.
Distributed Programming.
Spark.
Tensorflow.
PyTorch.
Ray.
Kubernetes.
Bitcoin.
Smart Contracts.

After finishing distributed systems, please click on Topic 26 – Introduction to Cloud Computing to continue.

Computer Science Curriculum, Mathematics, Software Engineering Curriculum, Software Engineering Research

Topic 20 – Discrete Mathematics

October 19, 2023 admin Leave a comment

Why do I need to learn about discrete mathematics?

Discrete mathematics is a fundamental tool for understanding many theories and techniques behind artificial intelligence, machine learning, deep learning, data mining, security, digital imagine processing and natural language processing.

The problem-solving techniques and computation thinking introduced in discrete mathematics are necessary for creating complicated software too.

What can I do after finishing learning discrete mathematics?

You will be equipped with the core concepts of logic, set theory, number theory, combinatorics, graph theory, Boolean algebra, and discrete probability.

These concepts will prepare you to learn modern theories and techniques for developing software in security, machine learning, data mining, image processing, and natural language processing.

What should I do now?

Please read the following books to grasp the core concepts of discrete mathematics:

Alternatively, if you want to learn the topic through interactive explanations, please audit the course and read its textbook: MIT 6.042J – Mathematics for Computer Science, Fall 2010 (Textbook).

Terminology Review:

Statement: An assertion that is either true or false.
Mathematical Statements.
Mathematical Proof: A convincing argument about the accuracy of a statement.
If p, then q. p is hypothesis. q is conclusion.
Proposition: A true statement.
Theorem: An important proposition.
Lemmas: Supporting propositions.
Logic: A language for reasoning that contains a collection of rules that we use when doing logical reasoning.
Propositional Logic: A logic about truth and falsity of statements.
Logic Connectives: Not (Negation), And (Conjunction), Or (Disjunction), If then (Implication), If and only if (Equivalence).
Truth Table.
Contrapositive of Proposition: The contrapositive of p ⇒ q is the proposition ¬q ⇒ ¬p.
Modus Ponens: If both P ⇒ Q and P hold, then Q can be concluded.
Predicate: A property of some objects or a relationship among objects represented by the variables.
Quantifier: Tells how many objects have a certain property.
Mathematical Induction: Base Case, Inductive Case.
Recursion: A Base, An Recursive Step.
Sum Example: Annuity.
Set.
Subset.
Set Operations: A ∪ B, A ∩ B, A ⊂ U: A’ = {x : x ∈ U and x ∉ A}, A \ B = A ∩ B’ = {x : x ∈ A and x ∉ B}.
Cartesian Product: A × B = {(a; b) : a ∈ A and b ∈ B};
A binary relation (or just relation) from X to Y is a subset R ⊆ X × Y. To describe the relation R, we may list the collection of all ordered pairs (x, y) such that x is related to y by R.
A mapping or function f ⊂ A × B from a set A to a set B to be the special type of relation in which for each element a ∈ A there is a unique element b ∈ B such that (a, b) ∈ f.
Equivalence Relation.
Equivalence Class.
Partition.
A state machine is a binary relation on a set, the elements of the set are called states, the relation is called the transition relation, and an arrow in the graph of the transition relation is called a transition.
Greatest Common Divisor.
Division Algorithm.
Prime Numbers.
The Fundamental Theorem of Arithmetic: Let n be an integer such that n > 1. Then n can be factored as a product of prime numbers. n = p₁p₂ ∙ ∙ ∙ pₖ
Congruence: a is congruent to b modulo n if n | (a – b), written a ≡ b (mod n).
Fermat’s Little Theorem.
Stirling’s Approximation.
Probability.
Example: The Monty Hall Problem.
The Four Step Method: (1) Find the Sample Space (Set of possible outcomes), (2) Define Events of Interest (Subset of the sample space), (3) Determine Outcome Probabilities, (4) Compute Event Probabilities.
A tree diagram is a graphical tool that can help us work through the four step approach when the number of outcomes is not too large or the problem is nicely structured.
Example: The Strange Dice.
Conditional Probability: P(A|B) = P (A ∩ B) / P(B).
A conditional probability P(B|A) is called a posteriori if event B precedes event A in time.
Example: Medical Testing.
Independence: P(B|A) = P(B) or P(A∩B) = P(A) · P(B).
Mutual Independence: The probability of each event is the same no matter which of the other events has occurred.
Pairwise Independence: Any two events are independent.
Example: The Birthday Problem.
The birthday paradox refers to the counterintuitive fact that only 23 people are needed for that probability to exceed 50%, for 70 people: P = 99.9%.
Bernoulli Random Variable (Indicator Random Variable): f: Ω ↦ {1, 0}.
Binomial Random Variable: A number of successes in an experiment consisting of n trails. P (X = x) = [(n!)/((x!) · (n-x)!))]pˣ(1 − p)ⁿ − ˣ
Expectation (Average, Mean). E = Sum(R(w) · P(w)) = Sum(x · P(X = x)).
Median P(R < x) ≤ 1/2 and P(R>x) < 1/2.
Example: Splitting the Pot.
Mean Time to Failure: If a system independently fails at each time step with probability p, then the expected number of steps up to the first failure is 1/p.
Linearity of Expectation.
Example: The Hat Check Problem.
Example: Benchmark: E(Z/R) = 1.2 does NOT mean that E(Z) = 1.2E(R).
Variance: var(X) = E[(X−E[X])²].
Kurtosis: E[(X−E[X])⁴].
Markov’s Theorem: P(R ≥ x) ≤ E(R)/x (R > 0, x > 0).
Chebyshev’s Theorem: P(|R – E(R)| ≥ x) ≤ var(R)/x². Boundary of the probability of deviation from the mean.
The Chernoff Bound: P(T ≥ c·E(T)) ≤ e−ᶻ·ᴱ⁽ᵀ⁾, where z = c·lnc − c + 1, T = Sum(Tᵢ), 0 ≤ Tᵢ ≤ 1.

After finishing discrete mathematics, please click on Topic 21 – Introduction to Computational Thinking to continue.

Software Engineering Research

How to Be Creative in Software Engineering Research

July 9, 2023 admin Leave a comment

Motivation:

You plan to do a software engineering research and want to make some minor authentic contributions.

Guidelines:

1. You may present a problem and corresponding existing solution using your own understanding.

A method to do this is to

Read papers and books about a concept (for example backpropagation or distributed transactions), then
Write down the concept and some related terminologies, then
Try explain the concept with examples using your speech, and
Record your presentation, then
Write down your transcript, then
Rephrase your transcript.

2. You may try to replicate an existing result. When doing this you may need to make minor changes due to specific technology or environment conditions. Then you can compare your result with the original result.

For example, you may compare your business workflows with existing business workflows to determine which solution may solve a specific problem faster or more reliable.

In case you do not make any minor changes, the replication process may also inspire you some technical ideas. You may get errors while replicating the result. Try to fix these errors and document your experience.

For example you may get errors when upgrading an existing system from Node.js 12 to Node.js 18, or when upgrading an existing deep learning model code from Python 3.9 to Python 3.11. Try to fix the errors, then document your inputs, errors and solution.

3. The core idea to be creative is to do something that you have not done before. You may use trial and error method but be sure that you have a hard unsolved problem first. Trying to search for partial solutions to a problem will inspire you some ideas which may be the starting point for your minor authentic contributions.

4. Each individual’s creativity will need to be developed over time rather than in accordance with any kind of set formula.

Software Engineering Research

How to Pose a Software Engineering Research Question?

May 17, 2022 admin Leave a comment

Motivation:

You begin to do software engineering research.
You want to have a research question.
You have several ideas but you wonder whether they are good enough for conducting a research.

Suggestions:

1. Your question should contain well-defined terms.
Are you talking about something that everyone mostly agree about its definition and core characteristics.
For example, are you talking about Microservices, Event Sourcing, Relational Database, NoSQL, Unit Tests, Go Language, Speech Recognition, Speech Synthesis?

2. Your question should have a purpose and specific audience.
Why should the audience be interested in your question?
For example, are they going to upgrade a an event sourcing system? Are they going to apply test automation in our project?
Do they have specific security issues with their system?
Have they gotten specific performance issues with their system?
Are they going to build a new identity management platform for their legacy system?
Do they need to accelerate the development of a portal for their legacy system?
Are they going to integrate voice search into their existing system?

3. Your question should have verifiable answer.
What are the possible answers to your question? How can we compare these answers.
What is your concrete answer?
How can we replicate your answer?
How can we test your answer against the existing “standards“.

Computer Science Curriculum, Mathematics, Software Engineering Curriculum, Software Engineering Research

Topic 19 – Probability & Statistics

February 13, 2019 admin Leave a comment

Why do I need to learn about probability and statistics?

Probability and statistics are fundamental tools for understanding many modern theories and techniques such as artificial intelligence, machine learning, deep learning, data mining, security, digital imagine processing and natural language processing.

What can I do after finishing learning about probability and statistics?

You will be prepared to learn modern theories and techniques to create modern security, machine learning, data mining, image processing or natural language processing software.

That sounds useful! What should I do now?

Please read one of the following books to grasp the core concepts of probability and statistics:

Alternatively, please read these notes first, and then audit the courses below if you would like to learn through interactive explanations:

Perhaps probability and statistics are among the most difficult topics in mathematics, so you may need to study them two or three times using different sources to truly master the concepts. For example, you may audit the course and read the books below to gain additional examples and intuition about the concepts:

Learning probability and statistics requires patience. However, the rewards will be worthwhile: you will be able to master AI algorithms more quickly and with greater confidence.

Terminology Review:

Sample Space (Ω): Set of possible outcomes.
Event: Subset of the sample space.
Probability Law: Law specified by giving the probabilities of all possible outcomes.
Probability Model = Sample Space + Probability Law.
Probability Axioms: Nonnegativity: P(A) ≥ 0; Normalization: P(Ω)=1; Additivity: If A ∩ B = Ø, then P(A ∪ B)= P(A)+ P(B).
Conditional Probability: P(A|B) = P (A ∩ B) / P(B).
Multiplication Rule.
Total Probability Theorem.
Bayes’ Rule: Given P(Aᵢ) (initial “beliefs” ) and P (B|Aᵢ). P(Aᵢ|B) = ? (revise “beliefs”, given that B occurred).
The Monty Problem: 3 doors, behind which are two goats and a car.
The Spam Detection Problem: “Lottery” word in spam emails.
Independence of Two Events: P(B|A) = P(B) or P(A ∩ B) = P(A) · P(B).
The Birthday Problem: P(Same Birthday of 23 People) > 50%.
The Naive Bayes Model: “Naive” means features independence assumption.
Discrete Uniform Law: P(A) = Number of elements of A / Total number of sample points = |A| / |Ω|
Basic Counting Principle: r stages, nᵢ choices at stage i, number of choices = n₁ n₂ · · · nᵣ
Permutations: Number of ways of ordering elements. No repetition for n slots: [n] [n-1] [n-2] [] [] [] [] [1].
Combinations: number of k-element subsets of a given n-element set.
Binomial Probabilities. P (any sequence) = p# ʰᵉᵃᵈˢ(1 − p)# ᵗᵃᶦˡˢ.
Random Variable: A function from the sample space to the real numbers. It is not random. It is not a variable. It is a function: f: Ω ↦ ℝ. Random variable is used to model the whole experiment at once.
Discrete Random Variables.
Probability Mass Function: P(X = 𝑥) or Pₓ(𝑥): A function from the sample space to [0..1] that produces the likelihood that the value of X equals to 𝑥. PMF gives probabilities. 0 ≤ PMF ≤ 1. All the values of PMF must sum to 1. PMF is used to model a random variable.
Bernoulli Random Variable (Indicator Random Variable): f: Ω ↦ {1, 0}. Only 2 outcomes: 1 and 0. p(1) = p and p(0) = 1 – p.
Binomial Random Variable: X = Number of successes in n trials. X = Number of heads in n independent coin tosses.
Binomial Probability Mass Function: Combination of (k, n)pᵏ(1 − p)ⁿ−ᵏ.
Geometric Random Variable: X = Number of coin tosses until first head.
Geometric Probability Mass Function: (1 − p)ᵏ−¹p.
Expectation: E[X] = Sum of xpₓ(x).
Let Y=g(X): E[Y] = E[g(X)] = Sum of g(x)pₓ(x). Caution: E[g(X)] ≠ g(E[X]) in general.
Variance: var(X) = E[(X−E[X])²].
var(aX)=a²var(X).
X and Y are independent: var(X+Y) = var(X) + var(Y). Caution: var(X+Y) ≠ var(X) + var(Y) in general.
Standard Deviation: Square root of var(X).
Conditional Probability Mass Function: P(X=x|A).
Conditional Expectation: E[X|A].
Joint Probability Mass Function: $\begin{aligned} P_{X Y} (x, y) = P (X = x, Y = y) . \end{aligned}$
Marginal Distribution: Distribution of one variable
while ignoring the other.
Marginal Probability Mass Function $\begin{aligned} P_{X Y} (x, y) = P (X = x, Y = y) . \end{aligned}$
Total Expectation Theorem: E[X|Y = y].
Independent Random Variables: $P (X = x, Y = y) = P (X = x) P (Y = y), for all x, y .$ .
Expectation of Multiple Random Variables: E[X + Y + Z] = E[X] + E[Y] + E[Z].
Binomial Random Variable: X = Sum of Bernoulli Random Variables.
The Hat Problem.
Continuous Random Variables.
Probability Density Function: P(a ≤ X ≤ b) or Pₓ(𝑥). (a ≤ X ≤ b) means X function produces a real number value within the [a, b] range. Programming language: X(outcome) = 𝑥, where a ≤ 𝑥 ≤ b. PDF does NOT give probabilities. PDF does NOT have to be less than 1. PDF gives probabilities per unit length. The total area under PDF must be 1. PDF is used to define the random variable’s probability coming within a distinct range of values.
Cumulative Distribution Function: P(X ≤ b). (X ≤ b) means X function produces a real number value within the [-∞, b] range. Programming language: X(outcome) = 𝑥, where 𝑥 ≤ b.
Continuous Uniform Random Variables: fₓ(x) = 1/(b – a) if a ≤ X ≤ b, otherwise f = 0.
Normal Random Variable, Gaussian Distribution, Normal Distribution: Fitting bell shaped data.
Chi-Squared Distribution: Modelling communication noise.
Sampling from a Distribution: The process of drawing a random value (or set of values) from a probability distribution.
Joint Probability Density Function.
Marginal Probability Density Function.
Conditional Probability Density Function.
Derived Distributions.
Convolution: A mathematical operation on two functions (f and g) that produces a third function.
The Distribution of W = X + Y.
The Distribution of X + Y where X, Y: Independent Normal Ranndom Variables.
Covariance.
Covariance Matrix.
Correlation Coefficient.
Conditional Expectation: E[X | Y = y] = Sum of xpₓ|ᵧ(x|y). If Y is unknown then E[X | Y] is a random variable, i.e. a function of Y. So E[X | Y] also has its expectation and variance.
Law of Iterated Expectations: E[E[X | Y]] = E[X].
Conditional Variance: var(X | Y) is a function of Y.
Law of Total Variance: var(X) = E[var(X | Y)] +var([E[X | Y]).
Bernoulli Process: A sequence of independent Bernoulli trials. At each trial, i: P(Xᵢ=1)=p, P(Xᵢ=0)=1−p.
Poisson Process.
Markov Chain.

Bar Chart, Line Charts, Scatter Plots, Histograms.
Mean, Median, Mode.
Moments of a Distribution.
Skewness: E[((X – μ)/σ)³].
Kurtosis: E[((X – μ)/σ)⁴].
k% Quantile: Value k such that P (X ≤ qₖ/₁₀₀) = k/100.
Interquartile Range: IQR = Q₃ − Q₁.
Box-Plots: Q₁, Q₂, Q₃, IQR, min, max.
Kernel Density Estimation.
Violin Plot = Box-Plot + Kernel Density Estimation.
Quantile-Quantile Plots (QQ Plots).
Population: N.
Sample: n.
Random Sampling.
Population Mean: μ.
Sample Mean: x̄.
Population Proportion: p.
Sample Proportion: p̂.
Population Variance: σ².
Sample Variance: s².
Sampling Distributions.
Sampling from a Distribution: Drawing random values directly from a probability distribution. Purpose: Simulating or modeling real-world processes when the underlying distribution is known.
Markov’s Inequality: P(X ≥ a) ≤ E(X)/a (X > 0, a > 0).
Chebyshev’s Inequality: P(|X – E(X)| ≥ a) ≤ var(X)/a².
Week Law of Large Numbers: The average of the samples will get closer to the population mean as the sample size (not number of items) increases.
Central Limit Theorem: The distribution of sample means approximates a normal distribution as the sample size (not number of items) gets larger, regardless of the population’s distribution.
Sampling Distributions: Distribution of Sample Mean, Distribution of Sample Proportion, Distribution of Sample Variance.
Point Estimate: A single number, calculated from a sample, that estimates a parameter of the population.
Maximum Likelihood Estimation: Given data the maximum likelihood estimate (MLE) for the parameter p is the value of p that maximizes the likelihood P (data | p). P (data | p) is the likelihood function. For continuous distributions, we use the probability density function to define the likelihood.
Log likelihood: the natural log of the likelihood function.
Frequentists: Assume no prior belief, the goal is to find the model that most likely generated observed data.
Bayesians: Assume prior belief, the goal is to update prior belief based on observed data.
Maximum A Posteriori (MAP): Good for instances when you have limited data or strong prior beliefs. Wrong priors, wrong conclusions. MAP with uninformative priors is just MLE.
Margin of Error: A bound that we can confidently place on the difference between an estimate of something and the true value.
Significance Level: α, the probability that the event could have occurred by chance.
Confidence Level: 1 − α, a measure of how confident we are in a given margin of error.
Confidence Interval: A 95% confidence interval (CI) of the mean is a range with an upper and lower number calculated from a sample. Because the true population mean is unknown, this range describes possible values that the mean could be. If multiple samples were drawn from the same population and a 95% CI calculated for each sample, we would expect the population mean to be found within 95% of these CIs.
z-score: the number of standard deviations from the mean value of the reference population.
Confidence Interval: Unknown σ.
Confidence Interval for Proportions.
Hypothesis: A statement about a population developed for the purpose of testing.
Hypothesis Testing.
Null Hypothesis (H₀): A statement about the value of a population parameter, contains equal sign.
Alternate Hypothesis (H₁): A statement that is accepted if the sample data provide sufficient evidence that the null hypothesis is false, never contains equal sign.
Type I Error: Reject the null hypothesis when it is true.
Type II Error: Do not reject the null hypothesis when it is false.
Significance Level, α: The maximum probability of rejecting the null hypothesis when it is true.
Test Statistic: A number, calculated from samples, used to find if your data could have occurred under the null hypothesis.
Right-Tailed Test: The alternative hypothesis states that the true value of the parameter specified in the null hypothesis is greater than the null hypothesis claims.
Left-Tailed Test: The alternative hypothesis states that the true value of the parameter specified in the null hypothesis is less than the null hypothesis claims.
Two-Tailed Test: The alternative hypothesis which does not specify a direction, i.e. when the alternative hypothesis states that the null hypothesis is wrong.
p-value: The probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. μ₀ is assumed to be known and H₀ is assumed to be true.
Decision Rules: If H₀ is true then acceptable x̄ must fall in (1 − α) region.
Critical Value or k-value: A value on a test distribution that is used to decide whether the null hypothesis should be rejected or not.
Power of a Test: The probability of rejecting the null hypothesis when it is false; in other words, it is the probability of avoiding a type II error.
t-Distribution.
T-Statistic.
t-Tests: Unknown σ, use T-Statistic.
Independent Two-Sample t-Tests.
Paired t-Tests.
A/B testing: A methodology for comparing two variations (A/B) that uses t-Tests for statistical analysis and making a decision.
Model Building: X = a $P (X = x, Y = y) = P (X = x) P (Y = y), for all x, y .$ S + W, where X: output, S: “signal”, a: parameters, W: noise. Know S, assume W, observe X, find a.
Inferring: X = a $P (X = x, Y = y) = P (X = x) P (Y = y), for all x, y .$ S + W. Know a, assume W, observe X, find S.
Hypothesis Testing: X = a $P (X = x, Y = y) = P (X = x) P (Y = y), for all x, y .$ S + W. Know a, observe X, find S. S can take one of few possible values.
Estimation: X = a $P (X = x, Y = y) = P (X = x) P (Y = y), for all x, y .$ S + W. Know a, observe X, find S. S can take unlimited possible values.
Bayesian Inference can be used for both Hypothesis Testing and Estimation by leveraging Bayes rule. Output is posterior distribution. Single answer can be Maximum a posteriori probability (MAP) or Conditional Expectation.
Least Mean Squares Estimation of Θ based on X.
Classical Inference can be used for both Hypothesis Testing and Estimation.

After finishing probability and statistics, please click on Topic 20 – Discrete Mathematics to continue.

Software Engineering Research

Guide to Citing & Referencing

March 13, 2016 admin Leave a comment

What is referencing

When writing a piece of academic work, you must acknowledge any sources you have used. You do this by including a ‘citation’ within your text (usually a number or an author’s name) next to the material you have used. This brief citation leads your reader to a full reference to the work, which you include in your list of references at the end of your text. These references should allow anyone reading your work to identify and find the material to which you have referred. You need to be consistent in the way you reference your sources by following an established referencing system and style.

Please download these 2 files for the full guide.

Please download these 2 guides for how to working with references using Microsoft Word 2007 or 2010.

If you want to use IEEE and ACM style with alphabetical (name) sequence then please download this BibWord file, unzip and copy IEEE_Alphabetical.XSL and ACMNameSeq.XSL to C:\Program Files (x86)\Microsoft Office\Office12\Bibliography\Style (The directory may be different to this in your machine).

Whenever you update your bibliography, close your document then run BibWordExtender2.exe, click “OK”, select your Word document, select Bibliography style, click “Extend”, re-open your document, re-select the style in Word.

Software Development

Category Archives: Software Engineering Research

Topic 27 – Introduction to Blockchain

Topic 26 – Introduction to Cloud Computing

Topic 25 – Introduction to Distributed Systems

Topic 20 – Discrete Mathematics

How to Be Creative in Software Engineering Research

Motivation:

Guidelines:

How to Pose a Software Engineering Research Question?

Motivation:

Suggestions:

Topic 19 – Probability & Statistics

Guide to Citing & Referencing

Software development and software engineering research