Chi-Square Tests

Chi-Square Lab: Tests of Independence & Goodness of Fit

1. Introduction

Chi-square tests are foundational tools for analyzing categorical data. Developed by Karl Pearson in 1900, they remain widely used to:

  1. Test independence between two categorical variables
  2. Assess goodness of fit between observed frequencies and a theoretical distribution
  • Independence Test:
    \(H_0\): No association exists between variables.
    \(H_a\): Variables are dependent.

  • Goodness of Fit:
    \(H_0\): Observed frequencies match expected distribution.
    \(H_a\): Observed frequencies deviate.

Assumptions:

  1. Independent observations (no repeated measures).
  2. Expected counts ≥ 5 (if violated, use Fisher’s exact test).

2. Part I: Test of Independence

Example 1: Titanic Survival by Passenger Class

Test if survival on the Titanic was independent of passenger class (1st, 2nd, 3rd, Crew).

Interpretation:
- The extremely small p-value (\(p < 0.001\)) provides overwhelming evidence to reject \(H_0\).
- Survival rates differed significantly by class. First-class passengers had higher observed survival counts (203 survived vs. 122 perished) compared to expectations under independence (expected survival = 105).
- Third-class passengers and crew showed the largest discrepancies:
- 3rd Class: 178 survived (expected = 228)
- Crew: 212 survived (expected = 286)

Conclusion: Socioeconomic status (class) strongly influenced survival likelihood.

3. Part II: Goodness of Fit

Example 2: Diamond Color Distribution

Test if diamond colors in the diamonds dataset match a vendor’s claim (30% G, 20% E, 20% F, 15% H, 15% others).

Interpretation:

  • The astronomical test statistic (\(χ² = 10,642\)) and tiny p-value (\(p < 0.001\)) reject \(H_0\).
  • Notable deviations:
    • Color E: Observed = 9,797 (expected = 6,799*0.2 = 6,799), overrepresented.
    • Color G: Observed = 11,292 (expected = 6,799*0.3 = 6,799), vastly overrepresented.
    • Color J: Observed = 2,808 (expected = 6,799*0.05 = 340), underrepresented.

Conclusion: The vendor’s claimed distribution is invalid. Colors E and G are more common than advertised; J is rarer.

4. Part III: Small Samples & Fisher’s Exact Test

Example 3: Arthritis Treatment Efficacy

Test if a small-sample drug trial (hypothetical data) shows association between treatment and improvement.

Interpretation:
- The significant p-value (\(p = 0.035\)) suggests rejecting \(H_0\) at α = 0.05.
- The odds ratio indicates drug recipients had 15× higher odds of improvement than placebo.
- Caution: The extremely wide confidence interval (1.01–1049.79) reflects low precision due to small sample size (\(n = 16\)).

Conclusion: While statistically significant, practical conclusions require larger studies due to uncertainty in effect size.