A range of plausible values for a population parameter (e.g., mean μ, proportion p) constructed from sample data.
Contrasts with a point estimate (single value).
Provides a measure of uncertainty or confidence.
Often reported as Confidence Intervals (CIs).
Confidence Interval (CI): Basic Concept
A CI for parameter θ is a random interval [L(X),U(X)], where X is sample data.
For a (1−α)×100% CI:
Pr(L(X)≤θ≤U(X))=1−α
This probability holds for repeated sampling.
Commonly, α=0.05 for a 95% CI.
Types of Confidence Intervals
For Mean (population standard deviation σ known):Xˉ±z∗⋅nσ
For Mean (population standard deviation σ unknown):xˉ±t∗⋅ns
(where t∗ is the critical value from the t-distribution)
For Proportion:p^±z∗⋅np^(1−p^)
(where p^ is the sample proportion, z∗ is the critical value from the standard normal distribution)
Example: CI for Mean (Normal Case, Known σ2)
Assumptions: Sample X=(X1,...,Xn) from Normal N(μ,σ2), σ2 is known.
Steps:
Compute sample mean: Xˉ=n1∑i=1nXi
Standard Error (SE): SE=nσ
(1−α)×100% Z-interval:
[Xˉ−zα/2nσ,Xˉ+zα/2nσ]
(e.g., for 95% CI, α=0.05, zα/2=z0.025≈1.96)
Example Calculation (Unknown σ)
Survey: n=100, xˉ=170 cm, s=10 cm. Find 95% CI for population mean μ.
Use t-distribution since σ is unknown.
SE=ns=10010=1
Degrees of freedom df=n−1=99.
For 95% CI, df=99, the critical t-value t∗≈1.984.
CI = xˉ±t∗⋅SE=170±1.984⋅1=170±1.984
CI = [168.016,171.984]
Interpretation of CI
If we repeat the experiment many times, about (1−α)×100% (e.g., 95%) of the computed intervals will contain the true parameter μ.
The parameter μ is fixed; the interval varies from sample to sample.
Choosing Z-Score vs T-Distribution
Use Z-score if: Sample size n≥30 OR population standard deviation σ is known.
Use T-distribution if: Sample size n<30 AND population standard deviation σ is unknown.
Pivotal Quantity
A function of sample data X1,...,Xn and unknown parameter(s) θ, say Q(X1,...,Xn;θ).
Its probability distribution is known and does not depend on the unknown parameter(s) θ.
Crucial for constructing exact CIs and hypothesis tests.
Examples of Pivotal Quantities
Normal Distribution, Known Variance σ2:
For estimating μ, the pivotal quantity is:
Q=σ/nXˉ−μ∼N(0,1)
Normal Distribution, Unknown Variance σ2:
For estimating μ, the pivotal quantity is:
Q=s/nXˉ−μ∼tn−1 (t-distribution with n−1 degrees of freedom)
Hypothesis Test
Introduction & Role
A fundamental statistical procedure to make inferences about population parameters based on sample data.
Involves formulating hypotheses and using sample data to decide whether to reject the null hypothesis.
Role: Provides a systematic, objective framework for making data-based decisions, quantifying evidence, controlling error probabilities, and standardizing scientific inquiry.
Basic Concepts
How Hypothesis Testing Works
Null Hypothesis (H0): The default assumption or statement being tested (e.g., “no effect”, “status quo”). Usually contains equality.
Formal: H0:θ=θ0 (Eq 1)
Alternative Hypothesis (Ha or H1): The claim you suspect might be true, contradicting H0.
Formal Types:
Ha:θ=θ0 (two-sided/two-tailed) (Eq 2)
Ha:θ>θ0 (right-sided/right-tailed) (Eq 3)
Ha:θ<θ0 (left-sided/left-tailed) (Eq 4)
Testing Process Overview
Collect data.
Calculate a test statistic (summarizes data relative to H0).
Make a decision: If the test statistic is “too unusual” assuming H0 is true, reject H0 in favor of H1. Otherwise, fail to reject H0.
Test Statistics
Numerical summary of sample data used for decision making.
Choice depends on parameter, assumed population distribution, and sample size.
Z-statistic (Mean test, σ known or large n):
Z=σ/nXˉ−μ0 (Eq 5)
T-statistic (Mean test, σ unknown, small n):
t=s/nXˉ−μ0 (Eq 6)
Tests for a Single Mean
Z-test (Known σ)
Assumptions:σ known; Population is Normal OR n is large (CLT applies).
Hypotheses:H0:μ=μ0 (Eq 7)
Ha:μ=μ0 (or μ>μ0, or μ<μ0) (Eq 8)
Test Statistic:Z=σ/nXˉ−μ0 (Eq 9)
Decision Rule (Example: Two-tailed): Reject H0 if ∣Z∣>Zα/2. Fail to reject H0 if ∣Z∣≤Zα/2.
Z-test Example (IQ Scores)
Scenario: Claim μ>82. σ=20. Sample: n=81,Xˉ=90. Use α=0.05.
Calculate Test Statistic: t′=102.222+102.41212.6−20.7=104.9284+105.8081−8.1=0.493+0.581−8.1=1.074−8.1≈1.036−8.1≈−7.82. (Eq 50-53)
Calculate Degrees of Freedom:
Let vN=sN2/nN≈0.493, vO=sO2/nO≈0.581.
df≈nN−1vN2+nO−1vO2(vN+vO)2=90.4932+90.5812(0.493+0.581)2=90.243+90.3381.0742=0.027+0.03761.153=0.06461.153≈17.85. Use df=17. (Eq 54-56)
Critical t-value: t0.05,17≈−1.740.
Decision: Since t′=−7.82<−1.740, reject H0.
Conclusion: Strong statistical evidence that the new drug reduces recovery time compared to the old drug. The difference (8.1 days) is also clinically significant. 95% CI for difference: [5.9, 10.3] days.