Probability vs. Likelihood: What’s the Difference?¶
In statistics and data science, probability and likelihood are related but distinct concepts. This guide breaks down their difference, provides formulas, and includes practical Python examples.
## 1. Definitions
Probability¶
- Probability quantifies how likely a future event (data) is, given a known model or hypothesis.
- Mathematically: $$ P(X = x \mid \theta) $$
- Ranges from 0 to 1.
- Example: The chance of getting 2 heads in 3 fair coin flips: $$ P(X=2 \mid p=0.5) = \binom{3}{2} 0.5^2 0.5^1 = 0.375 $$
Likelihood¶
- Likelihood measures how plausible a parameter (θ) is, given observed data.
- Treat data as fixed and view the model/hypothesis as variable.
- Defined as: $$ \mathcal{L}(\theta \mid x) = P(x \mid \theta) \quad \text{(for discrete)} $$ or $$ \mathcal{L}(\theta \mid x) = f(x \mid \theta) \quad \text{(for continuous)} $$
- Crucially, likelihood is not a probability distribution over θ (does not integrate to 1).
2. Inferential Direction¶
-
Probability: Model → Data.
“Given θ, how likely is X?” -
Likelihood: Data → Model.
“Given observed X, how plausible is θ?”
3. Example: Coin Flip¶
Imagine we toss a coin 10 times, observing 7 heads (x = 7).
- As Probability: $$ P(X = 7 \mid p) = \binom{10}{7} p^7 (1-p)^3 $$
- As Likelihood: $$ \mathcal{L}(p \mid x=7) = P(X = 7 \mid p) $$ This function of p tells us which values are most plausible.
Often, we maximize it (Maximum Likelihood Estimate, MLE):
\[
\hat{p}\_{\text{MLE}} = \underset{p}{\arg\max}\;\mathcal{L}(p \mid x)
\]
For binomial data, this gives:
\[
\hat{p} = \frac{x}{n} = \frac{7}{10} = 0.7
\]
4. Distinguishing Features¶
| Feature | Probability | Likelihood |
|---|---|---|
| Viewpoint | Model → Data | Data → Model |
| Expression | \(P(X \mid θ)\) | \(\mathcal{L}(θ \mid X)\) |
| Normalization | Integrates to 1 over x | Not normalized over θ |
| Use-case | Predictions, simulations | Parameter inference (e.g. MLE) |
5. Python Example¶
import numpy as np
from scipy.stats import binom
# Observed data
n, x_obs = 10, 7
ps = np.linspace(0, 1, 101)
# Likelihood values
likelihoods = binom.pmf(x_obs, n, ps)
# MLE
p_mle = ps[np.argmax(likelihoods)]
print(f"Observed {x_obs}/{n} heads.")
print(f"MLE for p: {p_mle:.2f}") # → 0.70
6. Gaussian (Normal) Distribution Likelihood¶
For a continuous random variable (e.g., height, weight) modeled by a normal distribution:
\[
f(x \mid \mu, \sigma) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left( -\frac{(x - \mu)^2}{2\sigma^2} \right)
\]
For data points \( x_1, x_2, \dots, x_n \), the likelihood function for \( \mu \) and \( \sigma \) is:
\[
\mathcal{L}(\mu, \sigma \mid \mathbf{x}) = \prod\_{i=1}^n \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left( -\frac{(x_i - \mu)^2}{2\sigma^2} \right)
\]
The log-likelihood is often used:
\[
\ell(\mu, \sigma) = -\frac{n}{2} \log(2\pi) - n \log \sigma - \frac{1}{2\sigma^2} \sum\_{i=1}^n (x_i - \mu)^2
\]
Python Example for Gaussian Likelihood¶
import numpy as np
from scipy.stats import norm
# Observed data (example)
data = np.array([5.1, 4.9, 5.0, 5.2, 4.8])
# Range of mu values to evaluate likelihood
mus = np.linspace(4.5, 5.5, 100)
sigma = 0.1
# Compute likelihood for each mu
likelihoods = np.array([np.prod(norm.pdf(data, loc=mu, scale=sigma)) for mu in mus])
# MLE for mu
mu_mle = mus[np.argmax(likelihoods)]
print(f"MLE for mu (mean): {mu_mle:.2f}")
7. Summary¶
- Probability: From model to data; predicts outcomes given parameters.
- Likelihood: From data to model; infers parameters given observed outcomes.
- Though both use the same mathematical function, the variable of interest changes.
8. References¶
- Wikipedia: Likelihood function
- Wikipedia: Probability
- Stats.SE clarification: "Likelihood is … probability of observed outcomes given parameters"
- Articles from Built In, GeeksforGeeks and other statistics blogs.