haihongyuan.com

# Hypothesis Testing

Hypothesis Testing: OneSample Tests

Hypothesis Testing
I believe the population mean age is 50 (hypothesis).

Population

Reject hypothesis! Not close.

?

? ? ? ? ? ?

Random sample

Mean ? ??X = 20

Five Step Model for Hypothesis Tests
Step 1: State null and alternate hypotheses

Step 2: Select a level of significance

Step 3: Identify the test statistic

Step 4: Formulate a decision rule

Step 5: Take a sample, arrive at a decision

Do not reject null

Reject null and accept alternate

Step One: The Null Hypothesis, H0
?
?

States the Assumption (numerical) to be tested
e.g. The grade point average of juniors is at least 3.0 (H0: ??? 3.0)

?

Begin with the assumption that the null hypothesis is TRUE.
(Similar to the notion of innocent until proven guilty)

?Always contains the ? = ? sign

?The Null Hypothesis may or may not be rejected.

The Alternative Hypothesis, H1
?

Is the opposite of the null hypothesis

e.g. The grade point average of juniors is less than 3.0 (H1: ? < 3.0)
? ?

Never contains the ?=? sign The Alternative Hypothesis may or may not be accepted

?

Sometimes it is easier to form the alternative hypothesis first.

State the null and alternate hypotheses for population mean H0: ? = μ0 H1: ? = μ0 H0: ? < μ0 H1: ? > μ0 H0: ? > μ0 H1: ? < μ0
3 hypotheses about means

Three possibilities regarding means

The null hypothesis always contains equality.

Step Two: Select a Level of Significance α

?

The probability of rejecting the null hypothesis when it is actually true; the level of risk in so doing. Designated a (alpha) ? Typical values are 0.01, 0.05, 0.10

?

Type I Error: Rejecting the null hypothesis when it is actually true (α). Type II Error: Accepting the null hypothesis when it is actually false (β).

Step Two: Select a Level of Significance.
Researcher Accepts Rejects Ho Ho

Null Hypothesis Ho is true

Correct decision Type II Error (b)

Type I error (a) Correct decision

Ho is false

Step Three: Select the test statistic
Test statistic: A value, determined from sample information, used to determine whether or not to reject the null hypothesis.

Examples: z, t, F, c2 z Distribution as a test statistic The z value is based
on the sampling distribution of X, which is normally distributed when the sample is reasonably large (recall Central Limit Theorem).

X ?? z? ?/ n

Step Four: Formulate the decision rule
Critical value: The dividing point between the region where the null hypothesis is rejected and the region where it is not rejected.

Sampling Distribution Of the Statistic z, a Right-Tailed Test (Alternative hypothesis: H1: μ > μ 0 ), 0 .05 Level of Significance

Do not reject [Probability =.95]

Region of rejection [Probability=.05]

0

1.65
Critical value

Decision Rule

Reject the null hypothesis and accept the alternate hypothesis if

Computed -z < Critical -z

or
Computed z > Critical z

Step Four: Formulate t

he decision rule (using p-value).
p-Value: The probability, assuming that the null hypothesis is
true, of the obtaining the sample results

Decision Rule

If the p-Value is larger than or equal to the significance level, a, H0 is not rejected. If the p-Value is smaller than the significance level, a, H0 is rejected.

Step Five: Make a decision

One-Tailed Tests of Significance
The alternate hypothesis, H1, states a direction (μ > μ0 or μ < μ0 )

Sampling Distribution

of the Statistic z, a Right-Tailed Test (Alternative hypothesis: H1: μ > μ0 ), 0.05 Level of Significance

Reject H0
Do not reject [Probability =.95]

Region of rejection [Probability=.05]

a??????
0

1.65
Critical value

Two-Tailed Test of Significance
No direction is specified in the alternate hypothesis H1. (μ ≠μ0 )
Regions of Rejection for a Two-Tailed Test, .05 Level of Significance
Region of rejection [Probability=.025]
Do not reject [Probability =.95]

Region of rejection [Probability=.025]

-1.96
Critical value

0

1.96
Critical value

Selection of the test statistic for population mean
? Testing for the population mean from a large sample with population standard deviation known
z ? X ?? ? / n

?Testing for the Population Mean from large Sample with population Standard Deviation Unknown. As long as the sample size n > 30, z can be approximated using X ??
z? s/ n

? Testing for a Population Mean: Small Sample, Population Standard Deviation Unknown X ??
t? s/ n

Assumption of t tests: Data come from the normal distribution

Example: Suppose a car manufacturer claims a model gets 25 mpg. A consumer group asks 10 owners of this model to calculate their mpg and the mean value was 22 with a standard deviation of 1.5. Is the manufacturer's claim supported? Write a program to test this claim using R

## Compute the t statistic. Note we assume mu=25 under H_0 > xbar=22;s=1.5;n=10 > t = (xbar-25)/(s/sqrt(n)) >t [1] -6.324555 ## use pt to get the distribution function of t > pt(t,df=n-1) This is a small p-value (0.000068). [1] 6.846828e-05 The manufacturer's claim is suspicious.

One-Sample Mean Test
?

Test H0 : ? ? ?0 vs H1 : ? ? ?0 under a normal population N(? , ? 2 ) if ? 2 is unknown, the t-test
t? X ? ?0 s/ n

is recommended, where X is the sample mean and n is the sample size

t.test() In R
There is an example concerning daily energy intake in KJ for 11 women. The value are placed in a data vector
daily.intake<-c(5260,5470,5640,6180,6390,6515,6805,7515,7515,8230,8770)

Investigate whether the women’s energy intake deviates systematically From a recommended value of 7725KJ. Assuming that data comes from A normal distribution, the object is to test whether this distribution might have mean 7725. This is done with t.test, as follow t.test(daily.intake,mu=7725)

> t.test(daily.intake,mu=7725) One Sample t-test

data: daily.intake t = -2.8208, df = 10, p-value = 0.01814 alternative hypothesis: true m

ean is not equal to 7725 95 percent confidence interval: 5986.348 7520.925 sample estimates: mean of x 6753.636

x<-c(5260,5470,5640,6180,6390,6515,6805,7515,7515,8230,8770) n<-length(x) xbar<-mean(x) s<-sd(x)

tt<-(xbar-7725)/(s/sqrt(n)) # calculate the value of t statistics [1] -2.820754
p.value<-2*pt(tt,df=n-1) # calculate p-value [1] 0.01813724 alpha<-0.05 decision<-ifelse(p.value<alpha,'Reject','Not reject') #Make decision [1] "Reject" t.value<-qt(1-alpha/2,df=n-1) #Critical value decision<-ifelse(abs(tt)>t.value,'Reject','Not reject')

CI<-xbar+c(-t.value*s/sqrt(n),t.value*s/sqrt(n)) #confidence interval
[1] 5986.348 7520.925

Three arguments in function t.test() are relevant in onesample problems
t.test(x, mu=, alternative=“”,conf.level=) x: a numeric vector of data values mu: a number indicating the vaule of the mean under null hypothesis

alternative: a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less".
Conf.level: confidence level of the interval

One-Sample Proportion Test
The fraction or percentage that indicates the part of the population or sample having a particular trait of interest.
Number of successes in thesample p? Number sampled

The sample proportion is p and ? is the population proportion. Test Statistic for Testing a Single Population Proportion

z?

p ??

? (1 ? ? )
n

Example: NSC
For a Christmas and New Year’s week, the National Safety Council estimated that 500 people would be killed and 25,000 injured on the nation’s roads. The NSC claimed that 50% of the accidents would be caused by drunk driving. A sample of 120 accidents showed that 67 were caused by drunk driving. Use these data to test the NSC’s claim with a = 0.05.

Example: NSC
? ?

Hypothesis H0: p = .5 Test Statistic

H1: p ? 0.5

p0 (1 ? p0 ) .5(1 ? .5) ?p ? ? ? .045644 n 120
z? p ? p0

?p

(67 /120) ? .5 ? ? 1.278 .045644

?Rejection Rule

?Conclusion Do not reject H0. For z = 1.278, the p-value is .201. How to get the p-value using R? (1-Pnorm(1.278))*2

Reject H0 if z < -1.96 or z > 1.96

prop.test() in R
> prop.test(67,120,p=.5,correct =F) 1-sample proportions test without continuity correction data: 67 out of 120, null probability 0.5 X-squared = 1.6333, df = 1, p-value = 0.2012 alternative hypothesis: true p is not equal to 0.5 95 percent confidence interval: 0.4690452 0.6440025 sample estimates: p 0.5583333 X-squared = 1.6333= (1.278)2= z2 prop.test(x, p=, alternative=“”,conf.level=)