Probability vs Likelihood¶
Probability and statistics have different meanings, though they are often misinterpreted or used interchangeably. Let's break it down and understand them further.
What is Probability?¶
Probability is a branch of mathematics and statistics concerned with the numerical description of how likely events are to occur. The probability of an event is a number between 0 and 1 — the closer it is to 1, the more likely the event is to occur.
Example¶
Tossing a fair (unbiased) coin. The outcomes are two — heads and tails. Since the coin is fair:
Now, let's say you toss the same fair coin 10 times and observe 7 heads and 3 tails. Let's compute the probability of this outcome.
This is a Bernoulli experiment repeated n = 10 times, and follows the Binomial distribution. The probability of getting k successes (heads) is:
Substituting in our values (n = 10, k = 7, p = 0.5):
This is how you compute the probability given that the parameter (p = 0.5) is known.
What if the Coin is Biased?¶
Now suppose you toss a biased coin 5 times and observe the outcomes:
- 3 heads
- 2 tails
But this time, you don’t know the coin’s probability p of landing heads. So instead of computing probability, we compute the likelihood of different values of p given the observed data.
-
Probability: Given a model (parameter), how likely is the observed data?
Example:
$$ P(\text{3 Heads} \mid p = 0.5) $$ -
Likelihood: Given the observed data, how likely is a specific value of the parameter?
Example:
$$ \mathcal{L}(p \mid \text{3 Heads}) $$
We don't know p here, The value of \( p \) that maximizes this function is called the Maximum Likelihood Estimate (MLE).
In our case, the MLE is:
Deriving the MLE for Likelihood¶
We are given the likelihood function (excluding the binomial coefficient since it does not affect the MLE):
Let:
- \( u = p^3 \)
- \( v = (1 - p)^2 \)
Then the derivative using the product rule is:
Now plug in:
This can also be written equivalently as:
Find Critical Points by setting the derivative to zero:
Solutions:
- \( p = 0 \)
- \( p = 1 \)
- \( p = \frac{3}{5} \)
Select the Maximum, Since \( p = 0 \) and \( p = 1 \) yield zero likelihood, the maximum occurs at:
This is the Maximum Likelihood Estimate (MLE) for the parameter \( p \).
In simpler terms:¶
- Probability evaluates data given the parameter
- Likelihood evaluates parameter given the data
| Feature | Probability | Likelihood |
|---|---|---|
| Viewpoint | Model → Data | Data → Model |
| Expression | \(P(X \mid θ)\) | \(\mathcal{L}(θ \mid X)\) |
| Normalization | Integrates to 1 over x | Not normalized over θ |
| Use-case | Predictions, simulations | Parameter inference (e.g. MLE) |
Conclusion¶
-
Probability: Model → Data.
“Given θ, how likely is X?” -
Likelihood: Data → Model.
“Given observed X, how plausible is θ?”