This illustration is based on the following paper and its supplementary material:

Bernal, J. L., Cummins, S., & Gasparrini, A. (2017). Interrupted time series regression for the evaluation of public health interventions: a tutorial. International journal of epidemiology, 46(1), 348-355.

https://academic.oup.com/ije/article/46/1/348/2622842

Load Required R Packages

library(lmtest)
library(vcd)
library(Epi)
library(tsModel)
library(splines)
library(tidyverse)

Read in Comma-Delimited Data

data <- read.csv("sicily.csv")
data %>%
  knitr::kable()
year month aces time smokban pop stdpop
2002 1 728 1 0 364277.4 379875.3
2002 2 659 2 0 364277.4 376495.5
2002 3 791 3 0 364277.4 377040.8
2002 4 734 4 0 364277.4 377116.4
2002 5 757 5 0 364277.4 377383.4
2002 6 726 6 0 364277.4 374113.1
2002 7 760 7 0 364277.4 379513.3
2002 8 740 8 0 364277.4 376295.5
2002 9 720 9 0 364277.4 374653.2
2002 10 814 10 0 364277.4 378485.6
2002 11 795 11 0 364277.4 375955.5
2002 12 858 12 0 364277.4 378349.7
2003 1 887 13 0 363350.8 376762.4
2003 2 766 14 0 363350.8 379032.3
2003 3 851 15 0 363350.8 379360.4
2003 4 769 16 0 363350.8 376162.0
2003 5 781 17 0 363350.8 377972.4
2003 6 756 18 0 363350.8 381830.7
2003 7 766 19 0 363350.8 379888.6
2003 8 752 20 0 363350.8 380872.2
2003 9 765 21 0 363350.8 380966.9
2003 10 831 22 0 363350.8 381240.4
2003 11 879 23 0 363350.8 382104.9
2003 12 928 24 0 363350.8 381802.7
2004 1 914 25 0 364700.4 381656.3
2004 2 808 26 0 364700.4 383680.0
2004 3 937 27 0 364700.4 383504.2
2004 4 840 28 0 364700.4 386462.9
2004 5 916 29 0 364700.4 383783.1
2004 6 828 30 0 364700.4 380836.8
2004 7 845 31 0 364700.4 383483.0
2004 8 818 32 0 364700.4 380906.2
2004 9 860 33 0 364700.4 382926.8
2004 10 839 34 0 364700.4 384052.4
2004 11 887 35 0 364700.4 384449.6
2004 12 886 36 0 364700.4 383428.4
2005 1 831 37 1 364420.8 388153.2
2005 2 796 38 1 364420.8 388373.2
2005 3 833 39 1 364420.8 386470.1
2005 4 820 40 1 364420.8 386033.2
2005 5 877 41 1 364420.8 383686.4
2005 6 758 42 1 364420.8 385509.3
2005 7 767 43 1 364420.8 385901.9
2005 8 738 44 1 364420.8 386516.6
2005 9 781 45 1 364420.8 388436.5
2005 10 843 46 1 364420.8 383255.2
2005 11 850 47 1 364420.8 390148.7
2005 12 908 48 1 364420.8 385874.9
2006 1 1021 49 1 363832.6 391613.6
2006 2 859 50 1 363832.6 391750.4
2006 3 976 51 1 363832.6 394005.6
2006 4 888 52 1 363832.6 391364.9
2006 5 962 53 1 363832.6 391664.6
2006 6 838 54 1 363832.6 389022.3
2006 7 810 55 1 363832.6 391878.5
2006 8 876 56 1 363832.6 388575.3
2006 9 843 57 1 363832.6 392989.0
2006 10 936 58 1 363832.6 390018.8
2006 11 912 59 1 363832.6 390712.3

This dataset includes the following variables:

  • year
  • month
  • time = elapsed time since the start of the study
  • aces = count of acute coronary episodes in Sicily per month (the outcome)
  • smokban = smoking ban (the intervention) coded 0 before intervention, 1 after
  • pop = the population of Sicily (in 10000s)
  • stdpop = age standardised population

Descriptive Analysis

Examining the data is an important first step. Looking at the pre-intervention trend can give an indication of how stable the trend is over time, whether a linear model is likely to be appropriate, and whether there appears to be a seasonal trend.

Scatter plot

data$rate <- with(data, aces/stdpop*10^5)

plot(data$rate,type="n",ylim=c(00,300),xlab="Year", ylab="Std rate x 10,000", bty="l",xaxt="n")
rect(36,0,60,300,col=grey(0.9),border=F)
points(data$rate[data$smokban==0],cex=0.7)
axis(1,at=0:5*12,labels=F)
axis(1,at=0:4*12+6,tick=F,labels=2002:2006)
title("Sicily, 2002-2006")

Summary Statistics

data %>%
  gather(Variable, Value) %>%
  group_by(Variable) %>%
  summarize(n = n(),
            Mean = mean(Value),
            SD = sd(Value),
            Median = median(Value),
            IQR = IQR(Value),
            Min = min(Value),
            Max = max(Value)) %>%
  knitr::kable()
Variable n Mean SD Median IQR Min Max
aces 59 8.290508e+02 72.4139022 831.0000 111.50000 659.0000 1021.0000
month 59 6.406780e+00 3.4347044 6.0000 5.50000 1.0000 12.0000
pop 59 3.641212e+05 481.2691260 364277.4000 588.20000 363350.8000 364700.4000
rate 59 2.160942e+02 17.3402530 215.4118 26.38489 175.0353 260.7162
smokban 59 3.898305e-01 0.4918981 0.0000 1.00000 0.0000 1.0000
stdpop 59 3.834644e+05 5245.7738584 383428.4000 7898.05000 374113.1000 394005.6000
time 59 3.000000e+01 17.1755640 30.0000 29.00000 1.0000 59.0000
year 59 2.003966e+03 1.4138002 2004.0000 2.00000 2002.0000 2006.0000

Acute Coronary Event Rates Before and After Smoking Ban

data %>%
  mutate(Period = case_when(smokban == 0 ~ "1. Before Ban",
                            smokban == 1 ~ "2. After Ban")) %>%
  select(Period, aces, rate) %>%
  gather(Variable, Value, -Period) %>%
  group_by(Period, Variable) %>%
    summarize(n = n(),
            Mean = mean(Value),
            SD = sd(Value),
            Median = median(Value),
            IQR = IQR(Value),
            Min = min(Value),
            Max = max(Value)) %>%
  knitr::kable()
Period Variable n Mean SD Median IQR Min Max
1. Before Ban aces 36 810.8611 67.76521 811.0000 99.25000 659.0000 937.0000
1. Before Ban rate 36 213.2719 16.81020 213.1061 24.62515 175.0353 244.3259
2. After Ban aces 23 857.5217 71.62394 843.0000 83.00000 738.0000 1021.0000
2. After Ban rate 23 220.5118 17.59869 217.8656 21.43912 190.9362 260.7162

Poisson Regression Model

In step 2 (main paper) we chose a step change model and we also used a Poisson model as we are using count data. In order to do this we model the count data directly (rather than the rate which doesn’t follow a Poisson distribution), using the population (log transformed) as an offset variable in order to transform back to rates.

model1 <- glm(aces ~ offset(log(stdpop)) + smokban + time, family=poisson, data)
summary(model1)

Call:
glm(formula = aces ~ offset(log(stdpop)) + smokban + time, family = poisson, 
    data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.4336  -1.4432  -0.4431   1.2735   5.0701  

Coefficients:
              Estimate Std. Error  z value Pr(>|z|)    
(Intercept) -6.2431509  0.0111894 -557.955  < 2e-16 ***
smokban     -0.1116284  0.0172248   -6.481 9.13e-11 ***
time         0.0049450  0.0004992    9.905  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 308.52  on 58  degrees of freedom
Residual deviance: 197.31  on 56  degrees of freedom
AIC: 708.03

Number of Fisher Scoring iterations: 3
summary(model1)$dispersion
[1] 1
round(ci.lin(model1,Exp=T),3)
            Estimate StdErr        z P exp(Est.)  2.5% 97.5%
(Intercept)   -6.243  0.011 -557.955 0     0.002 0.002 0.002
smokban       -0.112  0.017   -6.481 0     0.894 0.865 0.925
time           0.005  0.000    9.905 0     1.005 1.004 1.006
  • The exponentiated intercept is the ACE rate in December 2001, and in this example is about 200 per 10,000.
  • The exponentiated smokban effect is a rate ratio, and in this example the ban multiplied the ACE rate by 0.89 (a reduction).
  • The expected ACE rate after the ban would be 0.89 * 200 = 178 per 10,000.

Add predictions from the Poisson model to the graph

datanew <- data.frame(stdpop = mean(data$stdpop),
                      smokban = rep(c(0, 1), c(360, 240)),
                      time = 1:600/10,
                      month = rep(1:120/10, 5))

pred1 <- predict(model1, type="response", datanew) / mean(data$stdpop) * 10^5

plot(data$rate, type="n", ylim=c(0,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 0, 60, 300, col=grey(0.9), border=F)
points(data$rate,cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines((1:600/10), pred1, col=2)
title("Sicily, 2002-2006")

To plot the counterfactual scenario we create a data frame as if smokban (the intervention) had never been implemented

datanew <- data.frame(stdpop=mean(data$stdpop),smokban=0,time=1:600/10,
  month=rep(1:120/10,5))

pred1b <- predict(model1, datanew, type="response") / mean(data$stdpop) * 10^5

plot(data$rate, type="n", ylim=c(0,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 0, 60, 300, col=grey(0.9), border=F)
points(data$rate,cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines((1:600/10), pred1, col=2)
lines(datanew$time, pred1b, col=2, lty=2)
title("Sicily, 2002-2006")

Return the data frame to the scenario including the intervention

datanew <- data.frame(stdpop = mean(data$stdpop),
                      smokban = rep(c(0, 1), c(360, 240)),
                      time = 1:600/10,
                      month = rep(1:120/10, 5))

Methodological Issues

Overdispersion: Quasi-Poisson model

In the model above we have not allowed for overdispersion - in order to do this we can use a quasipoisson model, which allows the variance to be proportional rather than equal to the mean.

model2 <- glm(aces ~ offset(log(stdpop)) + smokban + time, 
              family=quasipoisson, data)
summary(model2)

Call:
glm(formula = aces ~ offset(log(stdpop)) + smokban + time, family = quasipoisson, 
    data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.4336  -1.4432  -0.4431   1.2735   5.0701  

Coefficients:
              Estimate Std. Error  t value Pr(>|t|)    
(Intercept) -6.2431509  0.0211096 -295.749  < 2e-16 ***
smokban     -0.1116284  0.0324959   -3.435  0.00112 ** 
time         0.0049450  0.0009418    5.250 2.43e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasipoisson family taken to be 3.559195)

    Null deviance: 308.52  on 58  degrees of freedom
Residual deviance: 197.31  on 56  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 3
summary(model2)$dispersion
[1] 3.559195
round(ci.lin(model2,Exp=T),3)
            Estimate StdErr        z     P exp(Est.)  2.5% 97.5%
(Intercept)   -6.243  0.021 -295.749 0.000     0.002 0.002 0.002
smokban       -0.112  0.032   -3.435 0.001     0.894 0.839 0.953
time           0.005  0.001    5.250 0.000     1.005 1.003 1.007

Model checking and autocorrelation

Check the residuals by plotting against time

res2 <- residuals(model2,type="deviance")

plot(data$time, res2, ylim=c(-5,10), pch=19, cex=0.7, col=grey(0.6), main="Residuals over time", ylab="Deviance residuals", xlab="Date")

abline(h=0, lty=2, lwd=2)

Further check for autocorrelation by examining the autocorrelation and partial autocorrelation functions

acf(res2)

pacf(res2)

Adjusting for seasonality

There are various ways of adjusting for seasonality - here we use harmonic terms specifying the number of sin and cosine pairs to include (in this case 2) and the length of the period (12 months)

model3 <- glm(aces ~ offset(log(stdpop)) + smokban + time + harmonic(month,2,12), 
              family=quasipoisson, data)
summary(model3)

Call:
glm(formula = aces ~ offset(log(stdpop)) + smokban + time + harmonic(month, 
    2, 12), family = quasipoisson, data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.5292  -1.2634  -0.0972   1.2571   3.6882  

Coefficients:
                          Estimate Std. Error  t value Pr(>|t|)    
(Intercept)             -6.2523723  0.0175689 -355.877  < 2e-16 ***
smokban                 -0.1220868  0.0270311   -4.517 3.64e-05 ***
time                     0.0053892  0.0007955    6.774 1.13e-08 ***
harmonic(month, 2, 12)1  0.0370967  0.0100241    3.701 0.000520 ***
harmonic(month, 2, 12)2 -0.0182122  0.0096435   -1.889 0.064537 .  
harmonic(month, 2, 12)3  0.0381608  0.0096848    3.940 0.000244 ***
harmonic(month, 2, 12)4  0.0147634  0.0097221    1.519 0.134935    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasipoisson family taken to be 2.262186)

    Null deviance: 308.52  on 58  degrees of freedom
Residual deviance: 117.27  on 52  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 3
summary(model3)$dispersion
[1] 2.262186
round(ci.lin(model3,Exp=T),3)
                        Estimate StdErr        z     P exp(Est.)  2.5% 97.5%
(Intercept)               -6.252  0.018 -355.877 0.000     0.002 0.002 0.002
smokban                   -0.122  0.027   -4.517 0.000     0.885 0.839 0.933
time                       0.005  0.001    6.774 0.000     1.005 1.004 1.007
harmonic(month, 2, 12)1    0.037  0.010    3.701 0.000     1.038 1.018 1.058
harmonic(month, 2, 12)2   -0.018  0.010   -1.889 0.059     0.982 0.964 1.001
harmonic(month, 2, 12)3    0.038  0.010    3.940 0.000     1.039 1.019 1.059
harmonic(month, 2, 12)4    0.015  0.010    1.519 0.129     1.015 0.996 1.034

EFFECTS

ci.lin(model3,Exp=T)["smokban",5:7]
exp(Est.)      2.5%     97.5% 
0.8850715 0.8394010 0.9332270 
  • Taking into account seasonality of the ACE rate, the smoking ban effect was a rate ratio of 0.885 (slightly bigger reduction)

TREND

exp(coef(model3)["time"]*12)
    time 
1.066808 
  • Taking into account seasonality of the ACE rate, the ACE rate was being multiplied by about 1.07 each year, a long-term temporal trend.

We again check the model and autocorrelation functions

res3 <- residuals(model3,type="deviance")
plot(res3,ylim=c(-5,10),pch=19,cex=0.7,col=grey(0.6),main="Residuals over time",
  ylab="Deviance residuals",xlab="Date")
abline(h=0,lty=2,lwd=2)

acf(res3)

pacf(res3)

Predict and plot of the seasonally adjusted model

pred3 <- predict(model3, type="response", datanew) / mean(data$stdpop) * 10^5

plot(data$rate, type="n", ylim=c(120,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 120, 60, 300, col=grey(0.9), border=F)
points(data$rate, cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines(1:600/10, pred3, col=2)
title("Sicily, 2002-2006")

It is sometimes difficult to clearly see the change graphically in the seasonally adjusted model, therefore it can be useful to plot a straight line representing a ‘deseasonalised’ trend this can be done by predicting all the observations for the same month, in this case we use June.

pred3b <- predict(model3, type="response", transform(datanew,month=6)) / mean(data$stdpop) * 10^5

This can then be added to the plot as a dashed line

plot(data$rate, type="n", ylim=c(120,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 120, 60, 300, col=grey(0.9), border=F)
points(data$rate, cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines(1:600/10, pred3, col=2)
lines(1:600/10, pred3b, col=2, lty=2)
title("Sicily, 2002-2006")

Additional Material

Add a change-in-slope

We parameterize it as an interaction between time and the ban indicator

model4 <- glm(aces ~ offset(log(stdpop)) + smokban*time + harmonic(month,2,12),
  family=quasipoisson, data)
summary(model4)

Call:
glm(formula = aces ~ offset(log(stdpop)) + smokban * time + harmonic(month, 
    2, 12), family = quasipoisson, data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.7028  -1.1977  -0.1371   1.1837   3.6172  

Coefficients:
                          Estimate Std. Error  t value Pr(>|t|)    
(Intercept)             -6.2469039  0.0188768 -330.930  < 2e-16 ***
smokban                 -0.1852194  0.0835610   -2.217 0.031138 *  
time                     0.0051026  0.0008745    5.835 3.72e-07 ***
harmonic(month, 2, 12)1  0.0383261  0.0101687    3.769 0.000427 ***
harmonic(month, 2, 12)2 -0.0176326  0.0096966   -1.818 0.074872 .  
harmonic(month, 2, 12)3  0.0383558  0.0097142    3.948 0.000242 ***
harmonic(month, 2, 12)4  0.0149653  0.0097524    1.535 0.131081    
smokban:time             0.0014822  0.0018546    0.799 0.427879    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasipoisson family taken to be 2.274563)

    Null deviance: 308.52  on 58  degrees of freedom
Residual deviance: 115.81  on 51  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 3
round(ci.lin(model4,Exp=T),3)
                        Estimate StdErr        z     P exp(Est.)  2.5% 97.5%
(Intercept)               -6.247  0.019 -330.930 0.000     0.002 0.002 0.002
smokban                   -0.185  0.084   -2.217 0.027     0.831 0.705 0.979
time                       0.005  0.001    5.835 0.000     1.005 1.003 1.007
harmonic(month, 2, 12)1    0.038  0.010    3.769 0.000     1.039 1.019 1.060
harmonic(month, 2, 12)2   -0.018  0.010   -1.818 0.069     0.983 0.964 1.001
harmonic(month, 2, 12)3    0.038  0.010    3.948 0.000     1.039 1.020 1.059
harmonic(month, 2, 12)4    0.015  0.010    1.535 0.125     1.015 0.996 1.035
smokban:time               0.001  0.002    0.799 0.424     1.001 0.998 1.005

Predict and plot the ‘deseasonalised’ trend and compare it with the step-change only model

pred4b <- predict(model4, type="response", transform(datanew, month=6)) / mean(data$stdpop) * 10^5
plot(data$rate, type="n", ylim=c(120,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 120, 60, 300, col=grey(0.9), border=F)
points(data$rate, cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines(1:600/10, pred3b, col=2)
lines(1:600/10, pred4b, col=4)
title("Sicily, 2002-2006")
legend("topleft", 
       c("Step-change only", "Step-change + change-in-slope"),
       lty=1, col=c(2,4), inset=0.05, bty="n", cex=0.7)

Test if the change-in-slope improve the fit the selected test here is an F-test, which accounts for the overdispersion, while in other cases a likelihood ratio or wald test can be applied

anova(model3, model4, test="F") %>%
  knitr::kable()
Resid. Df Resid. Dev Df Deviance F Pr(>F)
52 117.2661 NA NA NA NA
51 115.8131 1 1.452971 0.6387915 0.4278555
  • Not surprisingly, the p-value is similar to that of the interaction term.
  • The ban reduced the ACE rate, but did not change the longer-term temporal trend in the rate (it continued to grow over time).