#### Introduction

Statistical **hypothesis testing** is used to assess the strength of the evidence in a random sample against a stated null hypothesis concerning a population parameter. A null hypothesis is a conjecture about a population parameter that is stated as a mathematical equation.

For example, this null hypothesis states a conjecture about a population parameter, namely, the population mean:

The alternative hypothesis states what we suspect is true about the population parameter:

or

.

In the first case, , we say that we have a two-sided test. We'll reject H_{0} if our sample evidence indicates that the true population mean is either bigger than 0 or smaller than 0. In the second case, we have a one-sided test. We only reject H_{0} if the evidence indicates that the true population mean is bigger than 0.

To conduct the hypothesis test, we require a random sample of observed data. From that sample, we calculate a value for a statistic that is relevant for the hypothesis. In the example above, we could use the sample mean as our statistic because the sample mean provides an estimate for the population mean. The value of the sample mean will never be exactly equal to the value given by the null hypothesis. Statistical hypothesis testing provides us with a way to evaluate how much of a difference between the sample mean and the conjectured population mean we would need to observe in order to reject the null hypothesis.

There are two types of errors that can occur when conducting an hypothesis test: the researcher can reject a null hypothesis which is actually true (**Type I error**) or the researcher can fail to reject a null hypothesis that is actually false (**Type II error**). The **significance level** of the test (denoted by a) is the probability that a Type I error has been committed. The significance level is chosen prior to conducting the hypothesis test.

The null hypothesis must be stated in such as way that the researcher can calculate the probability of samples from the population when the null hypothesis is true. The **p-value** of a test statistic is the probability of observing a value of the test statistic that is at least as extreme as the one observed, assuming that the null hypothesis is true.

#### Hypothesis Testing Summary

- State the null and alternative hypothesis
- Chose a test statistic that summarizes the observed data and is relevant to the null hypothesis
- Calculate the test statistic from the random sample and calculate its p-value
- Using the p-value, assess the strength of the evidence against the null hypothesis

In fixed level testing, a significance level is chosen prior to collecting the sample, and the following decision rule is used:

- If the p-value of the test statistic is less than or equal to the significance level (α), reject the null hypothesis
- If the p-value of the test statistic is greater than the significance level (α), fail to reject the null hypothesis

#### Example

##### Testing a mean

The mean composite score on the ACT among the students at a large Midwestern University is 24 with a standard deviation of 4. We wish to know whether the average composite ACT score for business majors is different from the average for the University. We sample 100 business majors and calculate an average score of 26.

- State the null and alternative hypothesis

and choose a significance level, such as . - Choose a relevant test statistic. Since the null hypothesis concerns the mean of the distribution, one statistic we could use is the sample mean, . This statistic has a sampling distribution that is normal with a mean of and a standard deviation,

.

Alternatively, we could also use the following statistic: .

This statistic has a sampling distribution that is normal with a mean of 0 and a standard deviation of 1. Either statistic can be used to evaluate the sample evidence.

In both cases, the value of μ is when the null hypothesis is true.

- Calculate the test statistic

,

or

- Find the p-value of the test statistic. The p-value can be found through use of a statistical software package, remembering that when the null hypothesis is true, the top statistic has a normal distribution with a mean of 24 and a standard deviation of 0.4 and the bottom statistic has a normal distribution with a mean of 0 and a standard deviation of 1. If you are forced to use a table in a book, you will need to use the second statistic since the book will only allow you to estimate p-values for a N(0,1) distribution. In this case, the p value is approximately equal to zero. Find the p-value of the test statistic. The p-value can be found through use of a statistical software package, remembering that
*when the null hypothesis is true*, the top statistic has a normal distribution with a mean of 24 and a standard deviation of 0.4 and the bottom statistic has a normal distribution with a mean of 0 and a standard deviation of 1. If you are forced to use a table in a book, you will need to use the second statistic since the book will only allow you to estimate p-values for a N(0,1) distribution. In this case, the p value is approximately equal to zero. - Assess the strength of the evidence against the null hypothesis.

Since the p-value is smaller than the significance level, we reject the null hypothesis and conclude that the average ACT composite score for business students is higher than the average score for all students. The p-value is the probability of observing a value of the test statistic at least as extreme (as least as far away from the null hypothesized value) as the sample statistic*under the assumption that the null hypothesis is true*. So a p-value of zero tells us that the probability of observing a value of the sample mean that is at least 2 units away from 24 (2 = 26-24) is nearly zero. It's highly unlikely that we would observe such an extreme value if the null hypothesis were true, so the evidence indicates that we can reject the null hypothesis.

Note that the p-value is so small that we would reject the null hypothesis at any relevant significance level.

#### Things to think about

##### Practical significance versus statistical significance.

Given the way that hypothesis testing operates, we will always be able to reject the null hypothesis if we collect a large enough sample.