Probability Rules
Motivations
Probability is the underlying foundation for the methods of statistical inference. It is the third part in the Big Picture of Statistics. We use probability to quantify how much we expect random samples to vary. In the final stage of statistical analysis – Inference – we attempt to draw final conclusions. During inference, having selected our sample, we know that our uncertainty is due to chance, and not due to problems with how the sample was collected. We can use probability to describe the likelihood that our sample is within a desired level of accuracy.
Fundamentals
Probability is a mathematical description of randomness and uncertainty. Probability \(P(A)\) quantifies the likelihood of the occurrence of a particular event \(A\).
- The probability of an event \(A\) is expressed as a decimal such that \(0 \leq P(A) \leq 1\).
- \(P(A)=0\) means “there is a zero percent chance of \(A\).”
- \(P(A)=1\) means “there is a hundred percent chance of \(A\).”
- Probability is not always intuitive, as the Monty Hall and Birthday problems illustrate.
Determining Probability
There are two fundamental ways in which we can determine probability:
- Theoretical (also known as Classical)
- Used for games of chance, such as flipping coins, rolling dice, spinning spinners, roulette wheels, or lotteries.
- “Classical” because their values are determined by the nature of the game (or situation) itself.
- Empirical (also known as Observational)
- A series of trials are used to determine the relative frequency.
- Used particularly to answer probability questions that arise in a situation that does not follow any pattern and cannot be predetermined (i.e. most probabilities of interest).
\begin{equation} \mathrm{Relative \ Frequency \ of \ Event \ } A = { \frac {\text{number of times } A \mathrm{\ occurred}} {\text{total number of trials}} } \end{equation}
The law of large numbers states that as the number of trials increases, the relative (empirical) frequency approaches the theoretical probability. In other words, we can estimate the theoretical probability by performing a long series of trials.
Sample Spaces
In any random experiment we can define a sample space \(S\) that will contain all the possible outcomes of the experiment. For example, in the experiment of tossing two coins we would have the sample space \(S = \{ \ HH, HT, TH, TT \ \}\). Within \(S\) we can observe events, which contain n outcomes; for example, the event “at least one tail was tossed” could be defined as \(A = \{ \ HT, TH, TT\ \}\). Then we can calculate the relative frequency of individual events, e.g. \(P(A) = 0.75\). In an experiment where every outcome is equally likely this is as simple as counting the outcomes in the event and dividing that total by the total number of outcomes in the sample space.
Additional Fundamental Concepts
Union and Intersection
Given \(A = \{ \ 1,2,3 \ \}\) and \(B = \{ \ 2,3,4 \ \}\) we have
-
Union (“or”) \begin{equation} A \cup B = \{ \ 1,2,3,4 \ \} \end{equation}
-
Intersection (“and”) \begin{equation} A \cap B = \{ \ 2,3 \ \} \end{equation}
Disjoint (Mutually Exclusive) Events
Two events A and B are disjoint or mutually exclusive if they cannot occur at the same time, meaning \(P(A \cap B) = 0\).
Independence
Two events A and B are independent if the fact that one event has occurred does not affect the probability that the other event will occur.
Two events A and B are dependent if whether or not one event occurs does affect the probability that the other event will occur.
Five Probability Rules
Range Rule
Probabilities can never be less than 0 or greater than 1. \begin{equation} 0 \leq P(A) \leq 1 \tag{1}\label{1} \end{equation}
Sum Rule
The sum of the probabilities of all possible outcomes (in sample space \(S\)) is 1. \begin{equation} P(S) = 1 \tag{2}\label{2} \end{equation}
Complement Rule
Given \(\eqref{1}\) and \(\eqref{2}\), \begin{equation} P(\text{not A}) = 1 - P(A) \tag{3} \end{equation}
Sometimes it’s easier to figure out the complement probability.
General Addition Rule
- In probability the word “or” is always associated with the operation of addition.
- Note: for disjoint events, \(P(A \cap B) = 0\) \begin{equation} P(A \cup B) = P(A) + P(B) - P(A \cap B) \tag{4} \end{equation}
Multiplication Rule For Independent Events
- In probability the word “and” is always associated with the operation of multiplication \begin{equation} P(A \cap B) = P(A) \cdot P(B) \tag{5} \end{equation}
Probability Tables
To determine all the values in the table we need to know
- one value in the Total column
- one value in the Total row
- one value out of \(A, B, \text{not A}, \text{not B}\)
Each right and bottom margin contains sums of a row or column.
B | Not B | Total | |
---|---|---|---|
A | \(P(A \cap B)\) | \(P(A) - P(A \cap B)\) | \(P(A)\) |
Not A | \(P(B) - P(A \cap B)\) | \(1 - P(A) - P(B) + P(A \cap B)\) | \(1 - P(A)\) |
Total | \(P(B)\) | \(1 - P(B)\) | \(1\) |
The two-way tables used to compare two categorical values during EDA record values of two categorical variables for a concrete sample of individuals; whereas the probability two-way table conveys data for an entire population, presumably based on relative frequencies recorded over many repetitions.