Week 6 p-values Reading

Sherri Spriggs

7 Week 6 p-values Reading

What is a p-value?

Rare Events

Suppose you make an assumption about a property of the population (this assumption is the null hypothesis). Then you gather sample data randomly. If the sample has properties that would be very unlikely to occur if the assumption is true, then you would conclude that your assumption about the population is probably incorrect. (Remember that your assumption is just an assumption—it is not a fact and it may or may not be true. But your sample data are real, and the data are showing you a fact that seems to contradict your assumption.)

Let’s think about an example, Didi and Ali are at a birthday party for a friend. They hurry to be first in line to grab a prize from a basket that they cannot see inside because they will be blindfolded. There are 200 plastic bubbles in the basket, and Didi and Ali have been told that there is only one bubble containing a $100 bill. Didi is the first person to reach into the basket, and pull out a bubble. Her bubble contains a $100 bill. The probability of this happening is $LaTeX: \frac{1}{200}=0.005$ . Because this is so unlikely, Ali is hoping that what the two of them were told is wrong, and that there are more $100 bills in the basket. A “rare event” has occurred (Didi getting the $100 bill), so Ali doubts the assumption about only one $100 bill being in the basket.

Using the Sample to Test the Null Hypothesis

A collection of sample data will be used to calculate the actual probability of getting the test result, called the p-value. The p-value is the probability that, if the null hypothesis is true, the results from another randomly selected sample will be as extreme or more extreme as the results obtained from the given sample. The p-value can also be thought of as the probability that the results (from the sample) that we are seeing are solely due to chance.

A large p-value calculated from the data indicates that we should not reject the null hypothesis. The smaller the p-value, the more unlikely the outcome, and the stronger the evidence is against the null hypothesis. We would reject the null hypothesis if the evidence is strongly against it.

Another Example

The owner of a plant nursery claims that his small shrub heights are more than 15 cm, on average. Several of his customers do not believe him, so to persuade the customers that he is right, the owner decides to do a hypothesis test. He randomly chooses 20 of his small shrubs and measures their heights. The mean height of those shrubs is 17 cm, with a standard deviation of 0.5 cm, and he knows that the small shrub height distribution is Normal. The owner then conducts a formal hypothesis test, a One Sample t-test to test the research question: Is there sufficient evidence to show that the mean small shrub height is greater than 15 cm?

His Statistical Hypothesis:

: $LaTeX: \mu_{height}\le15$

: $LaTeX: \mu_{height}>15$

The null hypothesis, , gives the assumption of the customers, that his small shrub is not more than 15 cm, but rather less than or equal to 15 cm in height.

The alternative hypothesis, , is what the owner wants to prove to be true, that his shrub’s height IS more than 15 cm.

After running his data through SPSS, he finds a p-value of 0.002. This means that IF his small shrub’s height was truly less than or equal to 15 cm, there would be a .002 probability of finding this sample mean of 17 cm. If we were to take other samples, this p-value is the probability that any other sample mean would fall at least as far out as 17 cm. So it is unlikely that the true mean of the small shrub height is 15 cm, based on this hypothesis test. In other words, it would be very unlikely to find a sample mean of 17 cm by chance, if the true mean is 15 cm.

Using the p-value to Make Decisions

In order to truly decide if a sample is unusual enough to reject the null, in favor of an alternative, we compare the p-value to the chosen level of significance called alpha, $LaTeX: \alpha$ . The level of significance is the probability of rejecting the null, when in fact, the null hypothesis is true. “We” set this value ahead of time. The most common alpha values are 0.05 and 0.01. If selecting alpha to be 0.05, you are saying that 5% of the time, you might reject the null, when it is really true. Or if you select alpha to be 0.01, then you are saying that 1% of the time, you might reject the null, when you should have retained the null. The process of selecting an alpha value depends upon the situation being analyzed and how willing you are to reject the null hypothesis, when you should really retain it.