:::info The formula booklet gives the probability mass function for the Binomial and Poisson
Distributions, and the normal distribution function. You must know when to use each distribution and
How to find probabilities.
:::
1. Discrete Random Variables
1.1 Definition
Definition. A discrete random variableX takes values from a countable set with
Probabilities P(X=xi)=pi satisfying:
pi≥0 for all i
∑ipi=1
1.2 Expectation and variance
E(X)=μ=∑xipiVar(X)=σ2=E(X2)−[E(X)]2=∑xi2pi−μ2
2. The Binomial Distribution
2.1 Derivation from Bernoulli trials
A Bernoulli trial is an experiment with exactly two outcomes: success (probability p) and
Failure (probability 1−p).
If we perform n independent Bernoulli trials, the number of successes X follows a Binomial
Distribution: X∼B(n,p).
Derivation of the PMF. Each sequence of k successes and n−k failures has probability
pk(1−p)n−k. The number of such sequences is (kn) (choosing which k of the n
Trials are successes). Therefore:
P(X=k)=(kn)pk(1−p)n−k,k=0,1,…,n
2.2 Proof that E(X)=np
Proof. Let Xi be the indicator variable for the i-th trial: Xi=1 if success, 0 if
Failure.
X=X1+X2+⋯+Xn.
E(Xi)=1⋅p+0⋅(1−p)=p.
By linearity of expectation: E(X)=∑E(Xi)=np. ■
2.3 Proof that Var(X)=np(1−p)
Proof.E(Xi2)=12⋅p+02⋅(1−p)=p.
Var(Xi)=E(Xi2)−[E(Xi)]2=p−p2=p(1−p).
Since the Xi are independent: Var(X)=∑Var(Xi)=np(1−p).
■
2.4 Properties
The distribution is symmetric when p=0.5.
It is skewed left when p>0.5 and skewed right when p<0.5.
The mode is at ⌊(n+1)p⌋.
2.5 Direct derivation of E(X)=np from the PMF
The proofs in Sections 2.2 and 2.3 use indicator variables. Here we derive the same results directly
From the probability mass function using algebraic identities.
Proof. Starting from the definition of expectation applied to the binomial PMF:
E(X)=∑k=0nk(kn)pk(1−p)n−k
The k=0 term vanishes, so begin the sum at k=1. Apply the identity
k(kn)=n(k−1n−1):
The Central Limit Theorem (CLT) states that the sum (or mean) of a large number of independent,
Identically distributed random variables is approximately normally distributed, regardless of the
Original distribution.
This is why the normal distribution appears so widely in nature: any quantity that is the sum of
Many small independent effects (height, measurement error, etc.) will be approximately normal.
If X∼Po(λ) and Y∼Po(μ) are independent, then
X+Y∼Po(λ+μ).
4.6 Conditions for the Poisson model
The Poisson distribution is appropriate when all of the following hold:
Events occur independently of one another.
Events occur at a constant average rateλ in a fixed interval of time, space, or
volume.
The probability of more than one event occurring in a sufficiently small sub-interval is
negligible.
These are sometimes called the Poisson postulates. When they are satisfied, the number of events
In any interval of length t follows Po(λt).
Typical applications include: calls arriving at a call centre per hour, typing errors per page,
Radioactive decays per second, and cars passing a checkpoint per minute.
:::tip Tip Constant over the interval and that events do not cluster. If events tend to occur in
bursts, the Poisson model is not appropriate.
:::
4.7 Poisson approximation to the Binomial
Practical rule. When n>50 and p<0.1We may approximate B(n,p) by
Po(λ) where λ=np.
Justification. The theoretical result in Section 4.2 shows that as n→∞ and p→0
With np=λ held constant, the binomial PMF converges pointwise to the Poisson PMF. The
Conditions n>50 and p<0.1 are practical thresholds that ensure:
n is large enough that the discrete binomial is well-approximated by a limit distribution.
p is small enough that the “rare event” assumption of the Poisson model is satisfied.
λ=np is moderate ( 0<λ<10), so that neither distribution is heavily
concentrated at a single point.
The approximation improves as n increases and p decreases while λ=np remains fixed.
:::caution Warning And n is large, use the normal approximation (Section 3.6) instead. The two
approximations are Complementary: Poisson handles the case of many trials with rare success, while
normal handles the Case of many trials with moderate success probability.
:::
5. Choosing the Right Distribution
Situation
Distribution
Fixed n trials, success/failure
Binomial B(n,p)
Events in continuous interval, rare events
Poisson Po(λ)
Continuous, bell-shaped
Normal N(μ,σ2)
6. Coding of Random Variables
6.1 Definition
A coding (or linear transformation) of a discrete random variable X is a new random variable
Y=aX+b where a and b are constants with a=0.
Coding arises when changing units (e.g. Centimetres to metres, or Celsius to Fahrenheit) Or when
shifting and scaling a distribution.
6.2 Effect on expectation
Theorem. If Y=aX+bThen E(Y)=aE(X)+b.
Proof. Applying the definition of expectation to Y:
Note how the terms 2abE(X) and b2 cancel between E(Y2) and [E(Y)]2.
:::info Adding a constant b (a location shift) has no effect on variance. Only multiplying by
a (a scale change) affects variance, and it does so by a factor of a2. This is why variance is
Measured in squared units of the original variable.
:::
6.4 Effect on standard deviation
Since Var(Y)=a2Var(X)Taking square roots gives:
SD(Y)=∣a∣SD(X)
The absolute value ensures the standard deviation remains non-negative regardless of the sign of
a.
Problem 2
Heights of men are normally distributed with mean 175 cm and standard deviation 8 cm. Find the probability that a randomly chosen man is taller than 185 cm.
Solution 2
$X \sim N(175, 64)$. $P(X \gt 185) = P\!\left(Z \gt \dfrac{185-175}{8}\right) = P(Z \gt 1.25) = 1 - \Phi(1.25) \approx 1 - 0.8944 = 0.1056$.
Problem 3
A call centre receives an average of 4.5 calls per minute. Find the probability of receiving exactly 6 calls in a given minute, and the probability of receiving more than 8 calls.
Solution 3
$X \sim \mathrm{Po}(4.5)$.
Problem 5
Find $c$ such that $P(-c \lt Z \lt c) = 0.95$ where $Z \sim N(0,1)$.
Solution 5
$P(-c \lt Z \lt c) = 2\Phi(c) - 1 = 0.95 \implies \Phi(c) = 0.975$.
From tables: c≈1.96.
If you get this wrong, revise:Standard Normal — Section 3.4.
Problem 6
The number of emails received per hour follows $\mathrm{Po}(12)$. Find the probability of receiving between 10 and 15 emails (inclusive) in a given hour.
Solution 6
$X \sim \mathrm{Po}(12)$.
Problem 7
A machine produces bolts with lengths $X \sim N(50, 0.04)$ cm. Bolts with length less than 49.7 cm or greater than 50.3 cm are rejected. Find the proportion of bolts rejected.
Solution 7
$\sigma = \sqrt{0.04} = 0.2$.
Problem 10
If $X \sim \mathrm{Po}(3)$ and $Y \sim \mathrm{Po}(5)$ are independent, find $P(X + Y = 6)$.
Solution 10
By additivity: $X + Y \sim \mathrm{Po}(3+5) = \mathrm{Po}(8)$.
If you get this wrong, revise:Additivity — Section 4.5.
Problem 11
Starting from the definition $E(X) = \sum_{k=0}^{n} k\binom{n}{k}p^k(1-p)^{n-k}$Derive $E(X) = np$ using the identity $k\binom{n}{k} = n\binom{n-1}{k-1}$ and the binomial theorem.
Solution 11
Problem 13
$X \sim B(80, 0.03)$. State whether the Poisson approximation is valid, giving reasons. If valid, use it to find $P(X \leq 1)$.
Solution 13
Check conditions: $n = 80 \gt 50$ and $p = 0.03 \lt 0.1$. Both conditions are satisfied, so the
Poisson approximation is valid with $\lambda = np = 80 \times 0.03 = 2.4$.
Problem 16
$X \sim B(120, 0.025)$. (a) Show that the Poisson approximation is appropriate. (b) Use it to find $P(X = 5)$. (c) State why the normal approximation would not be appropriate here.
Solution 16
(a) $n = 120 \gt 50$ and $p = 0.025 \lt 0.1$So the Poisson approximation is appropriate.
$\lambda = np = 120 \times 0.025 = 3$.
(c) For the normal approximation we need np>5 and n(1−p)>5. Here np=3<5 So the
normal approximation is not appropriate. The Poisson approximation is the correct choice Since p
is small.
Problem 17
Temperatures in a city are modelled by $X \sim N(15, 9)$ in degrees Celsius. The temperature in
Fahrenheit is $F = \frac{9}{5}X + 32$. Find $E(F)$, $\mathrm{Var}(F)$And $P(F \gt 68)$.
Solution 17
$E(F) = \frac{9}{5}E(X) + 32 = \frac{9}{5}(15) + 32 = 27 + 32 = 59^\circ\mathrm{F}$.
:::tip Tip Ready to test your understanding of Statistical Distributions? The contains the
hardest questions within the A-Level specification for this topic, each with a full worked solution.
Unit tests probe edge cases and common misconceptions. Integration tests combine Statistical
Distributions with other topics to test synthesis under exam conditions.
See for instructions on self-marking and
building a personal test matrix.
:::
Common Pitfalls
Forgetting to check that solutions satisfy the original equation (especially with squaring both
sides or dividing by variables).
Dropping negative signs during algebraic manipulation — substitute back to verify your answer.
Rounding too early in multi-step calculations — carry full precision through and round only the
final answer.
Confusing the domain and range of functions, or not considering restrictions (e.g., denominator
cannot be zero).
Summary
The key principles covered in this topic are linked in the sub-pages above. Focus on understanding
the definitions, applying the formulas or frameworks, and evaluating strengths and limitations of
each approach.
Worked Examples
Worked examples demonstrating the application of key concepts are covered in the detailed sub-pages
linked above.