Skip to content

Probability vs. Likelihood: What’s the Difference?

In statistics and data science, probability and likelihood are related but distinct concepts. This guide breaks down their difference, provides formulas, and includes practical Python examples.


## 1. Definitions

Probability

  • Probability quantifies how likely a future event (data) is, given a known model or hypothesis.
  • Mathematically: $$ P(X = x \mid \theta) $$
  • Ranges from 0 to 1.
  • Example: The chance of getting 2 heads in 3 fair coin flips: $$ P(X=2 \mid p=0.5) = \binom{3}{2} 0.5^2 0.5^1 = 0.375 $$

Likelihood

  • Likelihood measures how plausible a parameter (θ) is, given observed data.
  • Treat data as fixed and view the model/hypothesis as variable.
  • Defined as: $$ \mathcal{L}(\theta \mid x) = P(x \mid \theta) \quad \text{(for discrete)} $$ or $$ \mathcal{L}(\theta \mid x) = f(x \mid \theta) \quad \text{(for continuous)} $$
  • Crucially, likelihood is not a probability distribution over θ (does not integrate to 1).

2. Inferential Direction

  • Probability: Model → Data.
    “Given θ, how likely is X?”

  • Likelihood: Data → Model.
    “Given observed X, how plausible is θ?”


3. Example: Coin Flip

Imagine we toss a coin 10 times, observing 7 heads (x = 7).

  • As Probability: $$ P(X = 7 \mid p) = \binom{10}{7} p^7 (1-p)^3 $$
  • As Likelihood: $$ \mathcal{L}(p \mid x=7) = P(X = 7 \mid p) $$ This function of p tells us which values are most plausible.

Often, we maximize it (Maximum Likelihood Estimate, MLE):

\[ \hat{p}\_{\text{MLE}} = \underset{p}{\arg\max}\;\mathcal{L}(p \mid x) \]

For binomial data, this gives:

\[ \hat{p} = \frac{x}{n} = \frac{7}{10} = 0.7 \]

4. Distinguishing Features

Feature Probability Likelihood
Viewpoint Model → Data Data → Model
Expression \(P(X \mid θ)\) \(\mathcal{L}(θ \mid X)\)
Normalization Integrates to 1 over x Not normalized over θ
Use-case Predictions, simulations Parameter inference (e.g. MLE)

5. Python Example

import numpy as np
from scipy.stats import binom

# Observed data
n, x_obs = 10, 7
ps = np.linspace(0, 1, 101)

# Likelihood values
likelihoods = binom.pmf(x_obs, n, ps)

# MLE
p_mle = ps[np.argmax(likelihoods)]

print(f"Observed {x_obs}/{n} heads.")
print(f"MLE for p: {p_mle:.2f}")  # → 0.70

6. Gaussian (Normal) Distribution Likelihood

For a continuous random variable (e.g., height, weight) modeled by a normal distribution:

\[ f(x \mid \mu, \sigma) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left( -\frac{(x - \mu)^2}{2\sigma^2} \right) \]

For data points \( x_1, x_2, \dots, x_n \), the likelihood function for \( \mu \) and \( \sigma \) is:

\[ \mathcal{L}(\mu, \sigma \mid \mathbf{x}) = \prod\_{i=1}^n \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left( -\frac{(x_i - \mu)^2}{2\sigma^2} \right) \]

The log-likelihood is often used:

\[ \ell(\mu, \sigma) = -\frac{n}{2} \log(2\pi) - n \log \sigma - \frac{1}{2\sigma^2} \sum\_{i=1}^n (x_i - \mu)^2 \]

Python Example for Gaussian Likelihood

import numpy as np
from scipy.stats import norm

# Observed data (example)
data = np.array([5.1, 4.9, 5.0, 5.2, 4.8])

# Range of mu values to evaluate likelihood
mus = np.linspace(4.5, 5.5, 100)
sigma = 0.1

# Compute likelihood for each mu
likelihoods = np.array([np.prod(norm.pdf(data, loc=mu, scale=sigma)) for mu in mus])

# MLE for mu
mu_mle = mus[np.argmax(likelihoods)]

print(f"MLE for mu (mean): {mu_mle:.2f}")

7. Summary

  • Probability: From model to data; predicts outcomes given parameters.
  • Likelihood: From data to model; infers parameters given observed outcomes.
  • Though both use the same mathematical function, the variable of interest changes.

8. References