Chi-Square Tests
Chi-Square Lab: Tests of Independence & Goodness of Fit
1. Introduction
Chi-square tests are foundational tools for analyzing categorical data. Developed by Karl Pearson in 1900, they remain widely used to:
- Test independence between two categorical variables
- Assess goodness of fit between observed frequencies and a theoretical distribution
Independence Test:
\(H_0\): No association exists between variables.
\(H_a\): Variables are dependent.Goodness of Fit:
\(H_0\): Observed frequencies match expected distribution.
\(H_a\): Observed frequencies deviate.
Assumptions:
- Independent observations (no repeated measures).
- Expected counts ≥ 5 (if violated, use Fisher’s exact test).
2. Part I: Test of Independence
Example 1: Titanic Survival by Passenger Class
Test if survival on the Titanic was independent of passenger class (1st, 2nd, 3rd, Crew).
Interpretation:
- The extremely small p-value (\(p < 0.001\)) provides overwhelming evidence to reject \(H_0\).
- Survival rates differed significantly by class. First-class passengers had higher observed survival counts (203 survived vs. 122 perished) compared to expectations under independence (expected survival = 105).
- Third-class passengers and crew showed the largest discrepancies:
- 3rd Class: 178 survived (expected = 228)
- Crew: 212 survived (expected = 286)
Conclusion: Socioeconomic status (class) strongly influenced survival likelihood.
3. Part II: Goodness of Fit
Example 2: Diamond Color Distribution
Test if diamond colors in the diamonds dataset match a vendor’s claim (30% G, 20% E, 20% F, 15% H, 15% others).
Interpretation:
- The astronomical test statistic (\(χ² = 10,642\)) and tiny p-value (\(p < 0.001\)) reject \(H_0\).
- Notable deviations:
- Color E: Observed = 9,797 (expected = 6,799*0.2 = 6,799), overrepresented.
- Color G: Observed = 11,292 (expected = 6,799*0.3 = 6,799), vastly overrepresented.
- Color J: Observed = 2,808 (expected = 6,799*0.05 = 340), underrepresented.
- Color E: Observed = 9,797 (expected = 6,799*0.2 = 6,799), overrepresented.
Conclusion: The vendor’s claimed distribution is invalid. Colors E and G are more common than advertised; J is rarer.
4. Part III: Small Samples & Fisher’s Exact Test
Example 3: Arthritis Treatment Efficacy
Test if a small-sample drug trial (hypothetical data) shows association between treatment and improvement.
Interpretation:
- The significant p-value (\(p = 0.035\)) suggests rejecting \(H_0\) at α = 0.05.
- The odds ratio indicates drug recipients had 15× higher odds of improvement than placebo.
- Caution: The extremely wide confidence interval (1.01–1049.79) reflects low precision due to small sample size (\(n = 16\)).
Conclusion: While statistically significant, practical conclusions require larger studies due to uncertainty in effect size.