This illustration is based on the following paper and its supplementary material:
Bernal, J. L., Cummins, S., & Gasparrini, A. (2017). Interrupted time series regression for the evaluation of public health interventions: a tutorial. International journal of epidemiology, 46(1), 348-355.
https://academic.oup.com/ije/article/46/1/348/2622842
library(lmtest)
library(vcd)
library(Epi)
library(tsModel)
library(splines)
library(tidyverse)
data <- read.csv("sicily.csv")
data %>%
knitr::kable()
year | month | aces | time | smokban | pop | stdpop |
---|---|---|---|---|---|---|
2002 | 1 | 728 | 1 | 0 | 364277.4 | 379875.3 |
2002 | 2 | 659 | 2 | 0 | 364277.4 | 376495.5 |
2002 | 3 | 791 | 3 | 0 | 364277.4 | 377040.8 |
2002 | 4 | 734 | 4 | 0 | 364277.4 | 377116.4 |
2002 | 5 | 757 | 5 | 0 | 364277.4 | 377383.4 |
2002 | 6 | 726 | 6 | 0 | 364277.4 | 374113.1 |
2002 | 7 | 760 | 7 | 0 | 364277.4 | 379513.3 |
2002 | 8 | 740 | 8 | 0 | 364277.4 | 376295.5 |
2002 | 9 | 720 | 9 | 0 | 364277.4 | 374653.2 |
2002 | 10 | 814 | 10 | 0 | 364277.4 | 378485.6 |
2002 | 11 | 795 | 11 | 0 | 364277.4 | 375955.5 |
2002 | 12 | 858 | 12 | 0 | 364277.4 | 378349.7 |
2003 | 1 | 887 | 13 | 0 | 363350.8 | 376762.4 |
2003 | 2 | 766 | 14 | 0 | 363350.8 | 379032.3 |
2003 | 3 | 851 | 15 | 0 | 363350.8 | 379360.4 |
2003 | 4 | 769 | 16 | 0 | 363350.8 | 376162.0 |
2003 | 5 | 781 | 17 | 0 | 363350.8 | 377972.4 |
2003 | 6 | 756 | 18 | 0 | 363350.8 | 381830.7 |
2003 | 7 | 766 | 19 | 0 | 363350.8 | 379888.6 |
2003 | 8 | 752 | 20 | 0 | 363350.8 | 380872.2 |
2003 | 9 | 765 | 21 | 0 | 363350.8 | 380966.9 |
2003 | 10 | 831 | 22 | 0 | 363350.8 | 381240.4 |
2003 | 11 | 879 | 23 | 0 | 363350.8 | 382104.9 |
2003 | 12 | 928 | 24 | 0 | 363350.8 | 381802.7 |
2004 | 1 | 914 | 25 | 0 | 364700.4 | 381656.3 |
2004 | 2 | 808 | 26 | 0 | 364700.4 | 383680.0 |
2004 | 3 | 937 | 27 | 0 | 364700.4 | 383504.2 |
2004 | 4 | 840 | 28 | 0 | 364700.4 | 386462.9 |
2004 | 5 | 916 | 29 | 0 | 364700.4 | 383783.1 |
2004 | 6 | 828 | 30 | 0 | 364700.4 | 380836.8 |
2004 | 7 | 845 | 31 | 0 | 364700.4 | 383483.0 |
2004 | 8 | 818 | 32 | 0 | 364700.4 | 380906.2 |
2004 | 9 | 860 | 33 | 0 | 364700.4 | 382926.8 |
2004 | 10 | 839 | 34 | 0 | 364700.4 | 384052.4 |
2004 | 11 | 887 | 35 | 0 | 364700.4 | 384449.6 |
2004 | 12 | 886 | 36 | 0 | 364700.4 | 383428.4 |
2005 | 1 | 831 | 37 | 1 | 364420.8 | 388153.2 |
2005 | 2 | 796 | 38 | 1 | 364420.8 | 388373.2 |
2005 | 3 | 833 | 39 | 1 | 364420.8 | 386470.1 |
2005 | 4 | 820 | 40 | 1 | 364420.8 | 386033.2 |
2005 | 5 | 877 | 41 | 1 | 364420.8 | 383686.4 |
2005 | 6 | 758 | 42 | 1 | 364420.8 | 385509.3 |
2005 | 7 | 767 | 43 | 1 | 364420.8 | 385901.9 |
2005 | 8 | 738 | 44 | 1 | 364420.8 | 386516.6 |
2005 | 9 | 781 | 45 | 1 | 364420.8 | 388436.5 |
2005 | 10 | 843 | 46 | 1 | 364420.8 | 383255.2 |
2005 | 11 | 850 | 47 | 1 | 364420.8 | 390148.7 |
2005 | 12 | 908 | 48 | 1 | 364420.8 | 385874.9 |
2006 | 1 | 1021 | 49 | 1 | 363832.6 | 391613.6 |
2006 | 2 | 859 | 50 | 1 | 363832.6 | 391750.4 |
2006 | 3 | 976 | 51 | 1 | 363832.6 | 394005.6 |
2006 | 4 | 888 | 52 | 1 | 363832.6 | 391364.9 |
2006 | 5 | 962 | 53 | 1 | 363832.6 | 391664.6 |
2006 | 6 | 838 | 54 | 1 | 363832.6 | 389022.3 |
2006 | 7 | 810 | 55 | 1 | 363832.6 | 391878.5 |
2006 | 8 | 876 | 56 | 1 | 363832.6 | 388575.3 |
2006 | 9 | 843 | 57 | 1 | 363832.6 | 392989.0 |
2006 | 10 | 936 | 58 | 1 | 363832.6 | 390018.8 |
2006 | 11 | 912 | 59 | 1 | 363832.6 | 390712.3 |
This dataset includes the following variables:
Examining the data is an important first step. Looking at the pre-intervention trend can give an indication of how stable the trend is over time, whether a linear model is likely to be appropriate, and whether there appears to be a seasonal trend.
data$rate <- with(data, aces/stdpop*10^5)
plot(data$rate,type="n",ylim=c(00,300),xlab="Year", ylab="Std rate x 10,000", bty="l",xaxt="n")
rect(36,0,60,300,col=grey(0.9),border=F)
points(data$rate[data$smokban==0],cex=0.7)
axis(1,at=0:5*12,labels=F)
axis(1,at=0:4*12+6,tick=F,labels=2002:2006)
title("Sicily, 2002-2006")
data %>%
gather(Variable, Value) %>%
group_by(Variable) %>%
summarize(n = n(),
Mean = mean(Value),
SD = sd(Value),
Median = median(Value),
IQR = IQR(Value),
Min = min(Value),
Max = max(Value)) %>%
knitr::kable()
Variable | n | Mean | SD | Median | IQR | Min | Max |
---|---|---|---|---|---|---|---|
aces | 59 | 8.290508e+02 | 72.4139022 | 831.0000 | 111.50000 | 659.0000 | 1021.0000 |
month | 59 | 6.406780e+00 | 3.4347044 | 6.0000 | 5.50000 | 1.0000 | 12.0000 |
pop | 59 | 3.641212e+05 | 481.2691260 | 364277.4000 | 588.20000 | 363350.8000 | 364700.4000 |
rate | 59 | 2.160942e+02 | 17.3402530 | 215.4118 | 26.38489 | 175.0353 | 260.7162 |
smokban | 59 | 3.898305e-01 | 0.4918981 | 0.0000 | 1.00000 | 0.0000 | 1.0000 |
stdpop | 59 | 3.834644e+05 | 5245.7738584 | 383428.4000 | 7898.05000 | 374113.1000 | 394005.6000 |
time | 59 | 3.000000e+01 | 17.1755640 | 30.0000 | 29.00000 | 1.0000 | 59.0000 |
year | 59 | 2.003966e+03 | 1.4138002 | 2004.0000 | 2.00000 | 2002.0000 | 2006.0000 |
data %>%
mutate(Period = case_when(smokban == 0 ~ "1. Before Ban",
smokban == 1 ~ "2. After Ban")) %>%
select(Period, aces, rate) %>%
gather(Variable, Value, -Period) %>%
group_by(Period, Variable) %>%
summarize(n = n(),
Mean = mean(Value),
SD = sd(Value),
Median = median(Value),
IQR = IQR(Value),
Min = min(Value),
Max = max(Value)) %>%
knitr::kable()
Period | Variable | n | Mean | SD | Median | IQR | Min | Max |
---|---|---|---|---|---|---|---|---|
1. Before Ban | aces | 36 | 810.8611 | 67.76521 | 811.0000 | 99.25000 | 659.0000 | 937.0000 |
1. Before Ban | rate | 36 | 213.2719 | 16.81020 | 213.1061 | 24.62515 | 175.0353 | 244.3259 |
2. After Ban | aces | 23 | 857.5217 | 71.62394 | 843.0000 | 83.00000 | 738.0000 | 1021.0000 |
2. After Ban | rate | 23 | 220.5118 | 17.59869 | 217.8656 | 21.43912 | 190.9362 | 260.7162 |
In step 2 (main paper) we chose a step change model and we also used a Poisson model as we are using count data. In order to do this we model the count data directly (rather than the rate which doesn’t follow a Poisson distribution), using the population (log transformed) as an offset variable in order to transform back to rates.
model1 <- glm(aces ~ offset(log(stdpop)) + smokban + time, family=poisson, data)
summary(model1)
Call:
glm(formula = aces ~ offset(log(stdpop)) + smokban + time, family = poisson,
data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.4336 -1.4432 -0.4431 1.2735 5.0701
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.2431509 0.0111894 -557.955 < 2e-16 ***
smokban -0.1116284 0.0172248 -6.481 9.13e-11 ***
time 0.0049450 0.0004992 9.905 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 308.52 on 58 degrees of freedom
Residual deviance: 197.31 on 56 degrees of freedom
AIC: 708.03
Number of Fisher Scoring iterations: 3
summary(model1)$dispersion
[1] 1
round(ci.lin(model1,Exp=T),3)
Estimate StdErr z P exp(Est.) 2.5% 97.5%
(Intercept) -6.243 0.011 -557.955 0 0.002 0.002 0.002
smokban -0.112 0.017 -6.481 0 0.894 0.865 0.925
time 0.005 0.000 9.905 0 1.005 1.004 1.006
datanew <- data.frame(stdpop = mean(data$stdpop),
smokban = rep(c(0, 1), c(360, 240)),
time = 1:600/10,
month = rep(1:120/10, 5))
pred1 <- predict(model1, type="response", datanew) / mean(data$stdpop) * 10^5
plot(data$rate, type="n", ylim=c(0,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 0, 60, 300, col=grey(0.9), border=F)
points(data$rate,cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines((1:600/10), pred1, col=2)
title("Sicily, 2002-2006")
datanew <- data.frame(stdpop=mean(data$stdpop),smokban=0,time=1:600/10,
month=rep(1:120/10,5))
pred1b <- predict(model1, datanew, type="response") / mean(data$stdpop) * 10^5
plot(data$rate, type="n", ylim=c(0,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 0, 60, 300, col=grey(0.9), border=F)
points(data$rate,cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines((1:600/10), pred1, col=2)
lines(datanew$time, pred1b, col=2, lty=2)
title("Sicily, 2002-2006")
datanew <- data.frame(stdpop = mean(data$stdpop),
smokban = rep(c(0, 1), c(360, 240)),
time = 1:600/10,
month = rep(1:120/10, 5))
In the model above we have not allowed for overdispersion - in order to do this we can use a quasipoisson model, which allows the variance to be proportional rather than equal to the mean.
model2 <- glm(aces ~ offset(log(stdpop)) + smokban + time,
family=quasipoisson, data)
summary(model2)
Call:
glm(formula = aces ~ offset(log(stdpop)) + smokban + time, family = quasipoisson,
data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.4336 -1.4432 -0.4431 1.2735 5.0701
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.2431509 0.0211096 -295.749 < 2e-16 ***
smokban -0.1116284 0.0324959 -3.435 0.00112 **
time 0.0049450 0.0009418 5.250 2.43e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 3.559195)
Null deviance: 308.52 on 58 degrees of freedom
Residual deviance: 197.31 on 56 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 3
summary(model2)$dispersion
[1] 3.559195
round(ci.lin(model2,Exp=T),3)
Estimate StdErr z P exp(Est.) 2.5% 97.5%
(Intercept) -6.243 0.021 -295.749 0.000 0.002 0.002 0.002
smokban -0.112 0.032 -3.435 0.001 0.894 0.839 0.953
time 0.005 0.001 5.250 0.000 1.005 1.003 1.007
res2 <- residuals(model2,type="deviance")
plot(data$time, res2, ylim=c(-5,10), pch=19, cex=0.7, col=grey(0.6), main="Residuals over time", ylab="Deviance residuals", xlab="Date")
abline(h=0, lty=2, lwd=2)
acf(res2)
pacf(res2)
There are various ways of adjusting for seasonality - here we use harmonic terms specifying the number of sin and cosine pairs to include (in this case 2) and the length of the period (12 months)
model3 <- glm(aces ~ offset(log(stdpop)) + smokban + time + harmonic(month,2,12),
family=quasipoisson, data)
summary(model3)
Call:
glm(formula = aces ~ offset(log(stdpop)) + smokban + time + harmonic(month,
2, 12), family = quasipoisson, data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.5292 -1.2634 -0.0972 1.2571 3.6882
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.2523723 0.0175689 -355.877 < 2e-16 ***
smokban -0.1220868 0.0270311 -4.517 3.64e-05 ***
time 0.0053892 0.0007955 6.774 1.13e-08 ***
harmonic(month, 2, 12)1 0.0370967 0.0100241 3.701 0.000520 ***
harmonic(month, 2, 12)2 -0.0182122 0.0096435 -1.889 0.064537 .
harmonic(month, 2, 12)3 0.0381608 0.0096848 3.940 0.000244 ***
harmonic(month, 2, 12)4 0.0147634 0.0097221 1.519 0.134935
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 2.262186)
Null deviance: 308.52 on 58 degrees of freedom
Residual deviance: 117.27 on 52 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 3
summary(model3)$dispersion
[1] 2.262186
round(ci.lin(model3,Exp=T),3)
Estimate StdErr z P exp(Est.) 2.5% 97.5%
(Intercept) -6.252 0.018 -355.877 0.000 0.002 0.002 0.002
smokban -0.122 0.027 -4.517 0.000 0.885 0.839 0.933
time 0.005 0.001 6.774 0.000 1.005 1.004 1.007
harmonic(month, 2, 12)1 0.037 0.010 3.701 0.000 1.038 1.018 1.058
harmonic(month, 2, 12)2 -0.018 0.010 -1.889 0.059 0.982 0.964 1.001
harmonic(month, 2, 12)3 0.038 0.010 3.940 0.000 1.039 1.019 1.059
harmonic(month, 2, 12)4 0.015 0.010 1.519 0.129 1.015 0.996 1.034
ci.lin(model3,Exp=T)["smokban",5:7]
exp(Est.) 2.5% 97.5%
0.8850715 0.8394010 0.9332270
exp(coef(model3)["time"]*12)
time
1.066808
res3 <- residuals(model3,type="deviance")
plot(res3,ylim=c(-5,10),pch=19,cex=0.7,col=grey(0.6),main="Residuals over time",
ylab="Deviance residuals",xlab="Date")
abline(h=0,lty=2,lwd=2)
acf(res3)
pacf(res3)
pred3 <- predict(model3, type="response", datanew) / mean(data$stdpop) * 10^5
plot(data$rate, type="n", ylim=c(120,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 120, 60, 300, col=grey(0.9), border=F)
points(data$rate, cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines(1:600/10, pred3, col=2)
title("Sicily, 2002-2006")
It is sometimes difficult to clearly see the change graphically in the seasonally adjusted model, therefore it can be useful to plot a straight line representing a ‘deseasonalised’ trend this can be done by predicting all the observations for the same month, in this case we use June.
pred3b <- predict(model3, type="response", transform(datanew,month=6)) / mean(data$stdpop) * 10^5
plot(data$rate, type="n", ylim=c(120,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 120, 60, 300, col=grey(0.9), border=F)
points(data$rate, cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines(1:600/10, pred3, col=2)
lines(1:600/10, pred3b, col=2, lty=2)
title("Sicily, 2002-2006")
model4 <- glm(aces ~ offset(log(stdpop)) + smokban*time + harmonic(month,2,12),
family=quasipoisson, data)
summary(model4)
Call:
glm(formula = aces ~ offset(log(stdpop)) + smokban * time + harmonic(month,
2, 12), family = quasipoisson, data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.7028 -1.1977 -0.1371 1.1837 3.6172
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.2469039 0.0188768 -330.930 < 2e-16 ***
smokban -0.1852194 0.0835610 -2.217 0.031138 *
time 0.0051026 0.0008745 5.835 3.72e-07 ***
harmonic(month, 2, 12)1 0.0383261 0.0101687 3.769 0.000427 ***
harmonic(month, 2, 12)2 -0.0176326 0.0096966 -1.818 0.074872 .
harmonic(month, 2, 12)3 0.0383558 0.0097142 3.948 0.000242 ***
harmonic(month, 2, 12)4 0.0149653 0.0097524 1.535 0.131081
smokban:time 0.0014822 0.0018546 0.799 0.427879
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 2.274563)
Null deviance: 308.52 on 58 degrees of freedom
Residual deviance: 115.81 on 51 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 3
round(ci.lin(model4,Exp=T),3)
Estimate StdErr z P exp(Est.) 2.5% 97.5%
(Intercept) -6.247 0.019 -330.930 0.000 0.002 0.002 0.002
smokban -0.185 0.084 -2.217 0.027 0.831 0.705 0.979
time 0.005 0.001 5.835 0.000 1.005 1.003 1.007
harmonic(month, 2, 12)1 0.038 0.010 3.769 0.000 1.039 1.019 1.060
harmonic(month, 2, 12)2 -0.018 0.010 -1.818 0.069 0.983 0.964 1.001
harmonic(month, 2, 12)3 0.038 0.010 3.948 0.000 1.039 1.020 1.059
harmonic(month, 2, 12)4 0.015 0.010 1.535 0.125 1.015 0.996 1.035
smokban:time 0.001 0.002 0.799 0.424 1.001 0.998 1.005
pred4b <- predict(model4, type="response", transform(datanew, month=6)) / mean(data$stdpop) * 10^5
plot(data$rate, type="n", ylim=c(120,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 120, 60, 300, col=grey(0.9), border=F)
points(data$rate, cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines(1:600/10, pred3b, col=2)
lines(1:600/10, pred4b, col=4)
title("Sicily, 2002-2006")
legend("topleft",
c("Step-change only", "Step-change + change-in-slope"),
lty=1, col=c(2,4), inset=0.05, bty="n", cex=0.7)
Test if the change-in-slope improve the fit the selected test here is an F-test, which accounts for the overdispersion, while in other cases a likelihood ratio or wald test can be applied
anova(model3, model4, test="F") %>%
knitr::kable()
Resid. Df | Resid. Dev | Df | Deviance | F | Pr(>F) |
---|---|---|---|---|---|
52 | 117.2661 | NA | NA | NA | NA |
51 | 115.8131 | 1 | 1.452971 | 0.6387915 | 0.4278555 |