Definition
A C% confidence interval for a population parameter is an interval of numbers such that, if we could generate many different random samples from the population, C% of the samples would contain the true value of the population parameter. A confidence interval provides an interval estimate for a population parameter. To calculate a confidence interval, you must specify the level of confidence, you must have a random sample from the population, and you must know the sampling distribution of the statistic that forms the basis for the confidence interval.
Theoretical example
We are interested in estimating the mean of a population, µ. The population has a standard deviation of σ. We have a random sample of N=100 observations from this population. The sample mean, , has a normal distribution with mean µ and standard deviation &sigma/10. The 95% confidence interval for µ is
. We obtain the number 1.96 from a table giving values for a standard normal distribution, using the fact that the probability that a standard normal random variable lies in the interval [ -1.96,1.96 ] is 0.95.
The idea is that provides a good estimate of the population parameter μ. A confidence interval provides another estimator of μ that takes into account the variability of
. Once we have calculated a particular confidence interval given an observed sample, we are 95% confident that our interval contains the true value of the population parameter.
Practical example
We are interested in estimating the average amount, µ, spent by teenage shoppers at an online music store in a one-month period with 90% confidence. We know that the standard deviation of purchases is $16. We collect a random sample of purchases of 64 teens. The mean value of purchases of these 64 teens is a random variable which has a normal distribution with a mean of µ and standard deviation of $16/8 = $2. In one particular sample, the 64 teens purchased an average of $56 worth of music online. The $56 is the sample average for this particular sample. A different sample of the purchases of 64 teens would have produced a different sample average.
A 90% confidence interval for μ is given by [ $56 - 1.645 x $2, $56 + 1.645 x $2 ] = [ $52.71, $58.29 ]. The number 1.645 is obtained from a standard normal table: the probability that a standard normal random variable is in the interval [ -1.645, 1.645 ] is 90%. We are 90% confident that the population average purchases by teen shoppers is between $52.71 and $58.29.
The best estimate of the population mean is the sample mean, $56. The confidence interval allows us to consider the amount of uncertainty that we have about this estimate. Although our best estimate of μ is $56, we are 90% confident that the value of μ is in the interval [$52,71,$58.29]. This is an estimate that takes into account our uncertainty about our estimation.
Properties
- As our level of confidence increases, the width of the interval increases and the estimate becomes less precise. Specifically, a 90% confidence interval is wider than an 80% confidence interval.
- If we keep the level of confidence the same and if the standard deviation stays the same, an increase in the sample size, N, reduces the width of the confidence interval and the estimate becomes more precise.
- The sample statistic is always contained in the interval.
- If we create many different samples, and calculate a confidence interval for each sample, then C% of those samples will contain the true population parameter. 100 - C% of those intervals will NOT contain the true population parameter value.
