This illustration is based on the following paper and its supplementary material:
Bernal, J. L., Cummins, S., & Gasparrini, A. (2017). Interrupted time series regression for the evaluation of public health interventions: a tutorial. International journal of epidemiology, 46(1), 348-355.
https://academic.oup.com/ije/article/46/1/348/2622842
library(lmtest)
library(vcd)
library(Epi)
library(tsModel)
library(splines)
library(tidyverse)data <- read.csv("sicily.csv")
data %>%
  knitr::kable()| year | month | aces | time | smokban | pop | stdpop | 
|---|---|---|---|---|---|---|
| 2002 | 1 | 728 | 1 | 0 | 364277.4 | 379875.3 | 
| 2002 | 2 | 659 | 2 | 0 | 364277.4 | 376495.5 | 
| 2002 | 3 | 791 | 3 | 0 | 364277.4 | 377040.8 | 
| 2002 | 4 | 734 | 4 | 0 | 364277.4 | 377116.4 | 
| 2002 | 5 | 757 | 5 | 0 | 364277.4 | 377383.4 | 
| 2002 | 6 | 726 | 6 | 0 | 364277.4 | 374113.1 | 
| 2002 | 7 | 760 | 7 | 0 | 364277.4 | 379513.3 | 
| 2002 | 8 | 740 | 8 | 0 | 364277.4 | 376295.5 | 
| 2002 | 9 | 720 | 9 | 0 | 364277.4 | 374653.2 | 
| 2002 | 10 | 814 | 10 | 0 | 364277.4 | 378485.6 | 
| 2002 | 11 | 795 | 11 | 0 | 364277.4 | 375955.5 | 
| 2002 | 12 | 858 | 12 | 0 | 364277.4 | 378349.7 | 
| 2003 | 1 | 887 | 13 | 0 | 363350.8 | 376762.4 | 
| 2003 | 2 | 766 | 14 | 0 | 363350.8 | 379032.3 | 
| 2003 | 3 | 851 | 15 | 0 | 363350.8 | 379360.4 | 
| 2003 | 4 | 769 | 16 | 0 | 363350.8 | 376162.0 | 
| 2003 | 5 | 781 | 17 | 0 | 363350.8 | 377972.4 | 
| 2003 | 6 | 756 | 18 | 0 | 363350.8 | 381830.7 | 
| 2003 | 7 | 766 | 19 | 0 | 363350.8 | 379888.6 | 
| 2003 | 8 | 752 | 20 | 0 | 363350.8 | 380872.2 | 
| 2003 | 9 | 765 | 21 | 0 | 363350.8 | 380966.9 | 
| 2003 | 10 | 831 | 22 | 0 | 363350.8 | 381240.4 | 
| 2003 | 11 | 879 | 23 | 0 | 363350.8 | 382104.9 | 
| 2003 | 12 | 928 | 24 | 0 | 363350.8 | 381802.7 | 
| 2004 | 1 | 914 | 25 | 0 | 364700.4 | 381656.3 | 
| 2004 | 2 | 808 | 26 | 0 | 364700.4 | 383680.0 | 
| 2004 | 3 | 937 | 27 | 0 | 364700.4 | 383504.2 | 
| 2004 | 4 | 840 | 28 | 0 | 364700.4 | 386462.9 | 
| 2004 | 5 | 916 | 29 | 0 | 364700.4 | 383783.1 | 
| 2004 | 6 | 828 | 30 | 0 | 364700.4 | 380836.8 | 
| 2004 | 7 | 845 | 31 | 0 | 364700.4 | 383483.0 | 
| 2004 | 8 | 818 | 32 | 0 | 364700.4 | 380906.2 | 
| 2004 | 9 | 860 | 33 | 0 | 364700.4 | 382926.8 | 
| 2004 | 10 | 839 | 34 | 0 | 364700.4 | 384052.4 | 
| 2004 | 11 | 887 | 35 | 0 | 364700.4 | 384449.6 | 
| 2004 | 12 | 886 | 36 | 0 | 364700.4 | 383428.4 | 
| 2005 | 1 | 831 | 37 | 1 | 364420.8 | 388153.2 | 
| 2005 | 2 | 796 | 38 | 1 | 364420.8 | 388373.2 | 
| 2005 | 3 | 833 | 39 | 1 | 364420.8 | 386470.1 | 
| 2005 | 4 | 820 | 40 | 1 | 364420.8 | 386033.2 | 
| 2005 | 5 | 877 | 41 | 1 | 364420.8 | 383686.4 | 
| 2005 | 6 | 758 | 42 | 1 | 364420.8 | 385509.3 | 
| 2005 | 7 | 767 | 43 | 1 | 364420.8 | 385901.9 | 
| 2005 | 8 | 738 | 44 | 1 | 364420.8 | 386516.6 | 
| 2005 | 9 | 781 | 45 | 1 | 364420.8 | 388436.5 | 
| 2005 | 10 | 843 | 46 | 1 | 364420.8 | 383255.2 | 
| 2005 | 11 | 850 | 47 | 1 | 364420.8 | 390148.7 | 
| 2005 | 12 | 908 | 48 | 1 | 364420.8 | 385874.9 | 
| 2006 | 1 | 1021 | 49 | 1 | 363832.6 | 391613.6 | 
| 2006 | 2 | 859 | 50 | 1 | 363832.6 | 391750.4 | 
| 2006 | 3 | 976 | 51 | 1 | 363832.6 | 394005.6 | 
| 2006 | 4 | 888 | 52 | 1 | 363832.6 | 391364.9 | 
| 2006 | 5 | 962 | 53 | 1 | 363832.6 | 391664.6 | 
| 2006 | 6 | 838 | 54 | 1 | 363832.6 | 389022.3 | 
| 2006 | 7 | 810 | 55 | 1 | 363832.6 | 391878.5 | 
| 2006 | 8 | 876 | 56 | 1 | 363832.6 | 388575.3 | 
| 2006 | 9 | 843 | 57 | 1 | 363832.6 | 392989.0 | 
| 2006 | 10 | 936 | 58 | 1 | 363832.6 | 390018.8 | 
| 2006 | 11 | 912 | 59 | 1 | 363832.6 | 390712.3 | 
This dataset includes the following variables:
Examining the data is an important first step. Looking at the pre-intervention trend can give an indication of how stable the trend is over time, whether a linear model is likely to be appropriate, and whether there appears to be a seasonal trend.
data$rate <- with(data, aces/stdpop*10^5)
plot(data$rate,type="n",ylim=c(00,300),xlab="Year", ylab="Std rate x 10,000", bty="l",xaxt="n")
rect(36,0,60,300,col=grey(0.9),border=F)
points(data$rate[data$smokban==0],cex=0.7)
axis(1,at=0:5*12,labels=F)
axis(1,at=0:4*12+6,tick=F,labels=2002:2006)
title("Sicily, 2002-2006")data %>%
  gather(Variable, Value) %>%
  group_by(Variable) %>%
  summarize(n = n(),
            Mean = mean(Value),
            SD = sd(Value),
            Median = median(Value),
            IQR = IQR(Value),
            Min = min(Value),
            Max = max(Value)) %>%
  knitr::kable()| Variable | n | Mean | SD | Median | IQR | Min | Max | 
|---|---|---|---|---|---|---|---|
| aces | 59 | 8.290508e+02 | 72.4139022 | 831.0000 | 111.50000 | 659.0000 | 1021.0000 | 
| month | 59 | 6.406780e+00 | 3.4347044 | 6.0000 | 5.50000 | 1.0000 | 12.0000 | 
| pop | 59 | 3.641212e+05 | 481.2691260 | 364277.4000 | 588.20000 | 363350.8000 | 364700.4000 | 
| rate | 59 | 2.160942e+02 | 17.3402530 | 215.4118 | 26.38489 | 175.0353 | 260.7162 | 
| smokban | 59 | 3.898305e-01 | 0.4918981 | 0.0000 | 1.00000 | 0.0000 | 1.0000 | 
| stdpop | 59 | 3.834644e+05 | 5245.7738584 | 383428.4000 | 7898.05000 | 374113.1000 | 394005.6000 | 
| time | 59 | 3.000000e+01 | 17.1755640 | 30.0000 | 29.00000 | 1.0000 | 59.0000 | 
| year | 59 | 2.003966e+03 | 1.4138002 | 2004.0000 | 2.00000 | 2002.0000 | 2006.0000 | 
data %>%
  mutate(Period = case_when(smokban == 0 ~ "1. Before Ban",
                            smokban == 1 ~ "2. After Ban")) %>%
  select(Period, aces, rate) %>%
  gather(Variable, Value, -Period) %>%
  group_by(Period, Variable) %>%
    summarize(n = n(),
            Mean = mean(Value),
            SD = sd(Value),
            Median = median(Value),
            IQR = IQR(Value),
            Min = min(Value),
            Max = max(Value)) %>%
  knitr::kable()| Period | Variable | n | Mean | SD | Median | IQR | Min | Max | 
|---|---|---|---|---|---|---|---|---|
| 1. Before Ban | aces | 36 | 810.8611 | 67.76521 | 811.0000 | 99.25000 | 659.0000 | 937.0000 | 
| 1. Before Ban | rate | 36 | 213.2719 | 16.81020 | 213.1061 | 24.62515 | 175.0353 | 244.3259 | 
| 2. After Ban | aces | 23 | 857.5217 | 71.62394 | 843.0000 | 83.00000 | 738.0000 | 1021.0000 | 
| 2. After Ban | rate | 23 | 220.5118 | 17.59869 | 217.8656 | 21.43912 | 190.9362 | 260.7162 | 
In step 2 (main paper) we chose a step change model and we also used a Poisson model as we are using count data. In order to do this we model the count data directly (rather than the rate which doesn’t follow a Poisson distribution), using the population (log transformed) as an offset variable in order to transform back to rates.
model1 <- glm(aces ~ offset(log(stdpop)) + smokban + time, family=poisson, data)
summary(model1)
Call:
glm(formula = aces ~ offset(log(stdpop)) + smokban + time, family = poisson, 
    data = data)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.4336  -1.4432  -0.4431   1.2735   5.0701  
Coefficients:
              Estimate Std. Error  z value Pr(>|z|)    
(Intercept) -6.2431509  0.0111894 -557.955  < 2e-16 ***
smokban     -0.1116284  0.0172248   -6.481 9.13e-11 ***
time         0.0049450  0.0004992    9.905  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
    Null deviance: 308.52  on 58  degrees of freedom
Residual deviance: 197.31  on 56  degrees of freedom
AIC: 708.03
Number of Fisher Scoring iterations: 3summary(model1)$dispersion[1] 1round(ci.lin(model1,Exp=T),3)            Estimate StdErr        z P exp(Est.)  2.5% 97.5%
(Intercept)   -6.243  0.011 -557.955 0     0.002 0.002 0.002
smokban       -0.112  0.017   -6.481 0     0.894 0.865 0.925
time           0.005  0.000    9.905 0     1.005 1.004 1.006datanew <- data.frame(stdpop = mean(data$stdpop),
                      smokban = rep(c(0, 1), c(360, 240)),
                      time = 1:600/10,
                      month = rep(1:120/10, 5))
pred1 <- predict(model1, type="response", datanew) / mean(data$stdpop) * 10^5
plot(data$rate, type="n", ylim=c(0,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 0, 60, 300, col=grey(0.9), border=F)
points(data$rate,cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines((1:600/10), pred1, col=2)
title("Sicily, 2002-2006")datanew <- data.frame(stdpop=mean(data$stdpop),smokban=0,time=1:600/10,
  month=rep(1:120/10,5))
pred1b <- predict(model1, datanew, type="response") / mean(data$stdpop) * 10^5
plot(data$rate, type="n", ylim=c(0,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 0, 60, 300, col=grey(0.9), border=F)
points(data$rate,cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines((1:600/10), pred1, col=2)
lines(datanew$time, pred1b, col=2, lty=2)
title("Sicily, 2002-2006")datanew <- data.frame(stdpop = mean(data$stdpop),
                      smokban = rep(c(0, 1), c(360, 240)),
                      time = 1:600/10,
                      month = rep(1:120/10, 5))In the model above we have not allowed for overdispersion - in order to do this we can use a quasipoisson model, which allows the variance to be proportional rather than equal to the mean.
model2 <- glm(aces ~ offset(log(stdpop)) + smokban + time, 
              family=quasipoisson, data)
summary(model2)
Call:
glm(formula = aces ~ offset(log(stdpop)) + smokban + time, family = quasipoisson, 
    data = data)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.4336  -1.4432  -0.4431   1.2735   5.0701  
Coefficients:
              Estimate Std. Error  t value Pr(>|t|)    
(Intercept) -6.2431509  0.0211096 -295.749  < 2e-16 ***
smokban     -0.1116284  0.0324959   -3.435  0.00112 ** 
time         0.0049450  0.0009418    5.250 2.43e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 3.559195)
    Null deviance: 308.52  on 58  degrees of freedom
Residual deviance: 197.31  on 56  degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 3summary(model2)$dispersion[1] 3.559195round(ci.lin(model2,Exp=T),3)            Estimate StdErr        z     P exp(Est.)  2.5% 97.5%
(Intercept)   -6.243  0.021 -295.749 0.000     0.002 0.002 0.002
smokban       -0.112  0.032   -3.435 0.001     0.894 0.839 0.953
time           0.005  0.001    5.250 0.000     1.005 1.003 1.007res2 <- residuals(model2,type="deviance")
plot(data$time, res2, ylim=c(-5,10), pch=19, cex=0.7, col=grey(0.6), main="Residuals over time", ylab="Deviance residuals", xlab="Date")
abline(h=0, lty=2, lwd=2)acf(res2)pacf(res2)There are various ways of adjusting for seasonality - here we use harmonic terms specifying the number of sin and cosine pairs to include (in this case 2) and the length of the period (12 months)
model3 <- glm(aces ~ offset(log(stdpop)) + smokban + time + harmonic(month,2,12), 
              family=quasipoisson, data)
summary(model3)
Call:
glm(formula = aces ~ offset(log(stdpop)) + smokban + time + harmonic(month, 
    2, 12), family = quasipoisson, data = data)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.5292  -1.2634  -0.0972   1.2571   3.6882  
Coefficients:
                          Estimate Std. Error  t value Pr(>|t|)    
(Intercept)             -6.2523723  0.0175689 -355.877  < 2e-16 ***
smokban                 -0.1220868  0.0270311   -4.517 3.64e-05 ***
time                     0.0053892  0.0007955    6.774 1.13e-08 ***
harmonic(month, 2, 12)1  0.0370967  0.0100241    3.701 0.000520 ***
harmonic(month, 2, 12)2 -0.0182122  0.0096435   -1.889 0.064537 .  
harmonic(month, 2, 12)3  0.0381608  0.0096848    3.940 0.000244 ***
harmonic(month, 2, 12)4  0.0147634  0.0097221    1.519 0.134935    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 2.262186)
    Null deviance: 308.52  on 58  degrees of freedom
Residual deviance: 117.27  on 52  degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 3summary(model3)$dispersion[1] 2.262186round(ci.lin(model3,Exp=T),3)                        Estimate StdErr        z     P exp(Est.)  2.5% 97.5%
(Intercept)               -6.252  0.018 -355.877 0.000     0.002 0.002 0.002
smokban                   -0.122  0.027   -4.517 0.000     0.885 0.839 0.933
time                       0.005  0.001    6.774 0.000     1.005 1.004 1.007
harmonic(month, 2, 12)1    0.037  0.010    3.701 0.000     1.038 1.018 1.058
harmonic(month, 2, 12)2   -0.018  0.010   -1.889 0.059     0.982 0.964 1.001
harmonic(month, 2, 12)3    0.038  0.010    3.940 0.000     1.039 1.019 1.059
harmonic(month, 2, 12)4    0.015  0.010    1.519 0.129     1.015 0.996 1.034ci.lin(model3,Exp=T)["smokban",5:7]exp(Est.)      2.5%     97.5% 
0.8850715 0.8394010 0.9332270 exp(coef(model3)["time"]*12)    time 
1.066808 res3 <- residuals(model3,type="deviance")
plot(res3,ylim=c(-5,10),pch=19,cex=0.7,col=grey(0.6),main="Residuals over time",
  ylab="Deviance residuals",xlab="Date")
abline(h=0,lty=2,lwd=2)acf(res3)pacf(res3)pred3 <- predict(model3, type="response", datanew) / mean(data$stdpop) * 10^5
plot(data$rate, type="n", ylim=c(120,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 120, 60, 300, col=grey(0.9), border=F)
points(data$rate, cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines(1:600/10, pred3, col=2)
title("Sicily, 2002-2006")It is sometimes difficult to clearly see the change graphically in the seasonally adjusted model, therefore it can be useful to plot a straight line representing a ‘deseasonalised’ trend this can be done by predicting all the observations for the same month, in this case we use June.
pred3b <- predict(model3, type="response", transform(datanew,month=6)) / mean(data$stdpop) * 10^5plot(data$rate, type="n", ylim=c(120,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 120, 60, 300, col=grey(0.9), border=F)
points(data$rate, cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines(1:600/10, pred3, col=2)
lines(1:600/10, pred3b, col=2, lty=2)
title("Sicily, 2002-2006")model4 <- glm(aces ~ offset(log(stdpop)) + smokban*time + harmonic(month,2,12),
  family=quasipoisson, data)
summary(model4)
Call:
glm(formula = aces ~ offset(log(stdpop)) + smokban * time + harmonic(month, 
    2, 12), family = quasipoisson, data = data)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.7028  -1.1977  -0.1371   1.1837   3.6172  
Coefficients:
                          Estimate Std. Error  t value Pr(>|t|)    
(Intercept)             -6.2469039  0.0188768 -330.930  < 2e-16 ***
smokban                 -0.1852194  0.0835610   -2.217 0.031138 *  
time                     0.0051026  0.0008745    5.835 3.72e-07 ***
harmonic(month, 2, 12)1  0.0383261  0.0101687    3.769 0.000427 ***
harmonic(month, 2, 12)2 -0.0176326  0.0096966   -1.818 0.074872 .  
harmonic(month, 2, 12)3  0.0383558  0.0097142    3.948 0.000242 ***
harmonic(month, 2, 12)4  0.0149653  0.0097524    1.535 0.131081    
smokban:time             0.0014822  0.0018546    0.799 0.427879    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 2.274563)
    Null deviance: 308.52  on 58  degrees of freedom
Residual deviance: 115.81  on 51  degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 3round(ci.lin(model4,Exp=T),3)                        Estimate StdErr        z     P exp(Est.)  2.5% 97.5%
(Intercept)               -6.247  0.019 -330.930 0.000     0.002 0.002 0.002
smokban                   -0.185  0.084   -2.217 0.027     0.831 0.705 0.979
time                       0.005  0.001    5.835 0.000     1.005 1.003 1.007
harmonic(month, 2, 12)1    0.038  0.010    3.769 0.000     1.039 1.019 1.060
harmonic(month, 2, 12)2   -0.018  0.010   -1.818 0.069     0.983 0.964 1.001
harmonic(month, 2, 12)3    0.038  0.010    3.948 0.000     1.039 1.020 1.059
harmonic(month, 2, 12)4    0.015  0.010    1.535 0.125     1.015 0.996 1.035
smokban:time               0.001  0.002    0.799 0.424     1.001 0.998 1.005pred4b <- predict(model4, type="response", transform(datanew, month=6)) / mean(data$stdpop) * 10^5
plot(data$rate, type="n", ylim=c(120,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 120, 60, 300, col=grey(0.9), border=F)
points(data$rate, cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines(1:600/10, pred3b, col=2)
lines(1:600/10, pred4b, col=4)
title("Sicily, 2002-2006")
legend("topleft", 
       c("Step-change only", "Step-change + change-in-slope"),
       lty=1, col=c(2,4), inset=0.05, bty="n", cex=0.7)Test if the change-in-slope improve the fit the selected test here is an F-test, which accounts for the overdispersion, while in other cases a likelihood ratio or wald test can be applied
anova(model3, model4, test="F") %>%
  knitr::kable()| Resid. Df | Resid. Dev | Df | Deviance | F | Pr(>F) | 
|---|---|---|---|---|---|
| 52 | 117.2661 | NA | NA | NA | NA | 
| 51 | 115.8131 | 1 | 1.452971 | 0.6387915 | 0.4278555 |