16. Analysis of contingency tables#

本节需要的包:

require(s20x)
Hide code cell output
Loading required package: s20x

16.1. Introduction#

分类统计表

AP.df <- read.table("../data/AttendPass.txt", header = T)
head(AP.df, 8)
A data.frame: 8 × 3
SubjectPassAttend
<int><chr><chr>
11passattend
22passattend
33passattend
44passattend
55passattend
66failnot.attend
77passnot.attend
88failattend
AP.tbl <- with(AP.df, table(Attend, Pass))
AP.tbl
            Pass
Attend       fail pass
  attend       17   83
  not.attend   27   19
plot(AP.tbl, main = "", las = 1)
_images/3aad3f2db152416de692ae86a2bfa15aee3ee4d8c11fc041e46d59e435cb149e.png
barplot(t(AP.tbl), legend = T)
_images/6cab23a35f03d19fe94348eade2bfaeb2c3a064d594cfe20afa6cb8dff754aab.png

16.2. The binomial approach to contingency table analysis#

Freqs.df <- data.frame(
    Attend = c("not.attend", "attend"),
    Fail = c(27, 17), Pass = c(19, 83)
)
Freqs.df <- transform(Freqs.df, Attend = factor(Attend))
Freqs.df
A data.frame: 2 × 3
AttendFailPass
<fct><dbl><dbl>
not.attend2719
attend 1783
AP.binom <- glm(cbind(Pass, Fail) ~ Attend, data = Freqs.df, family = binomial)
summary(AP.binom)
Call:
glm(formula = cbind(Pass, Fail) ~ Attend, family = binomial, 
    data = Freqs.df)

Deviance Residuals: 
[1]  0  0

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)        1.5856     0.2662   5.956 2.58e-09 ***
Attendnot.attend  -1.9370     0.4007  -4.834 1.34e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance:  2.5162e+01  on 1  degrees of freedom
Residual deviance: -3.5527e-15  on 0  degrees of freedom
AIC: 12.756

Number of Fisher Scoring iterations: 3
exp(confint(AP.binom))[2, ]
Waiting for profiling to be done...
2.5 %
0.0642983964738621
97.5 %
0.311134072466483

16.3. The Poisson approach to contingency table analysis#

library(dplyr)
AP.df <- read.table("../data/AttendPass.txt", header = T)
AP.df <- transform(AP.df, Pass = factor(Pass), Attend = factor(Attend))
Freqs2.df <- AP.df %>%
    group_by(Attend, Pass) %>%
    summarize(freq = n()) %>%
    data.frame()
Freqs2.df
Error in library(dplyr): there is no package called ‘dplyr’
Traceback:

1. library(dplyr)
Freqs2.df$Attend <- relevel(Freqs2.df$Attend, ref = "not.attend")
AP.pois <- glm(freq ~ Attend * Pass, family = poisson, data = Freqs2.df)
summary(AP.pois)
A matrix: 4 × 4 of type dbl
EstimateStd. Errorz valuePr(>|z|)
(Intercept) 3.29583690.192450117.1256719.550242e-66
Attendattend-0.46262350.3096136-1.4941971.351243e-01
Passpass-0.35139790.2994472-1.1734892.405999e-01
Attendattend:Passpass 1.93702520.4006749 4.8344071.335434e-06
Attendattend:Passpass: 6.93808049535602

Null deviance 叫做零模型自由度,Residual deviance 叫做残差自由度。

残差自由度等于零时,参数个数等于该数据的行数。由此可以推出原数据有 4 行。

exp(confint(AP.pois))[4, ]
Waiting for profiling to be done...
2.5 %
3.21404850350281
97.5 %
15.552487384471

16.4. Equivalence of the binomial and Poisson approaches#

Freqs.df
exp(confint(AP.pois))[4, ]
A data.frame: 2 × 3
AttendFailPass
<fct><dbl><dbl>
not.attend2719
attend 1783
Waiting for profiling to be done...
2.5 %
3.21404850350281
97.5 %
15.552487384471
predict(AP.pois, type = "response")
1
17.0000000000001
2
83.0000000000001
3
27
4
19
Freqs.df
coef(summary(AP.pois))
exp(coef(AP.pois))[4]
A data.frame: 2 × 3
AttendFailPass
<fct><dbl><dbl>
not.attend2719
attend 1783
A matrix: 4 × 4 of type dbl
EstimateStd. Errorz valuePr(>|z|)
(Intercept) 3.29583690.192450117.1256719.550242e-66
Attendattend-0.46262350.3096136-1.4941971.351243e-01
Passpass-0.35139790.2994472-1.1734892.405999e-01
Attendattend:Passpass 1.93702520.4006749 4.8344071.335434e-06
Attendattend:Passpass: 6.93808049535602
options(digits = 4)
OR <- 27 * 83 / (17 * 19)
OR