This illustration is based on the following paper and its supplementary material:

Bernal, J. L., Cummins, S., & Gasparrini, A. (2017). Interrupted time series regression for the evaluation of public health interventions: a tutorial. International journal of epidemiology, 46(1), 348-355.

https://academic.oup.com/ije/article/46/1/348/2622842

Load Required R Packages

library(lmtest)
library(vcd)
library(Epi)
library(tsModel)
library(splines)
library(tidyverse)

Read in Comma-Delimited Data

data <- read.csv("sicily.csv")
data %>%
  knitr::kable()

year	month	aces	time	smokban	pop	stdpop
2002	1	728	1	0	364277.4	379875.3
2002	2	659	2	0	364277.4	376495.5
2002	3	791	3	0	364277.4	377040.8
2002	4	734	4	0	364277.4	377116.4
2002	5	757	5	0	364277.4	377383.4
2002	6	726	6	0	364277.4	374113.1
2002	7	760	7	0	364277.4	379513.3
2002	8	740	8	0	364277.4	376295.5
2002	9	720	9	0	364277.4	374653.2
2002	10	814	10	0	364277.4	378485.6
2002	11	795	11	0	364277.4	375955.5
2002	12	858	12	0	364277.4	378349.7
2003	1	887	13	0	363350.8	376762.4
2003	2	766	14	0	363350.8	379032.3
2003	3	851	15	0	363350.8	379360.4
2003	4	769	16	0	363350.8	376162.0
2003	5	781	17	0	363350.8	377972.4
2003	6	756	18	0	363350.8	381830.7
2003	7	766	19	0	363350.8	379888.6
2003	8	752	20	0	363350.8	380872.2
2003	9	765	21	0	363350.8	380966.9
2003	10	831	22	0	363350.8	381240.4
2003	11	879	23	0	363350.8	382104.9
2003	12	928	24	0	363350.8	381802.7
2004	1	914	25	0	364700.4	381656.3
2004	2	808	26	0	364700.4	383680.0
2004	3	937	27	0	364700.4	383504.2
2004	4	840	28	0	364700.4	386462.9
2004	5	916	29	0	364700.4	383783.1
2004	6	828	30	0	364700.4	380836.8
2004	7	845	31	0	364700.4	383483.0
2004	8	818	32	0	364700.4	380906.2
2004	9	860	33	0	364700.4	382926.8
2004	10	839	34	0	364700.4	384052.4
2004	11	887	35	0	364700.4	384449.6
2004	12	886	36	0	364700.4	383428.4
2005	1	831	37	1	364420.8	388153.2
2005	2	796	38	1	364420.8	388373.2
2005	3	833	39	1	364420.8	386470.1
2005	4	820	40	1	364420.8	386033.2
2005	5	877	41	1	364420.8	383686.4
2005	6	758	42	1	364420.8	385509.3
2005	7	767	43	1	364420.8	385901.9
2005	8	738	44	1	364420.8	386516.6
2005	9	781	45	1	364420.8	388436.5
2005	10	843	46	1	364420.8	383255.2
2005	11	850	47	1	364420.8	390148.7
2005	12	908	48	1	364420.8	385874.9
2006	1	1021	49	1	363832.6	391613.6
2006	2	859	50	1	363832.6	391750.4
2006	3	976	51	1	363832.6	394005.6
2006	4	888	52	1	363832.6	391364.9
2006	5	962	53	1	363832.6	391664.6
2006	6	838	54	1	363832.6	389022.3
2006	7	810	55	1	363832.6	391878.5
2006	8	876	56	1	363832.6	388575.3
2006	9	843	57	1	363832.6	392989.0
2006	10	936	58	1	363832.6	390018.8
2006	11	912	59	1	363832.6	390712.3

This dataset includes the following variables:

year
month
time = elapsed time since the start of the study
aces = count of acute coronary episodes in Sicily per month (the outcome)
smokban = smoking ban (the intervention) coded 0 before intervention, 1 after
pop = the population of Sicily (in 10000s)
stdpop = age standardised population

Descriptive Analysis

Examining the data is an important first step. Looking at the pre-intervention trend can give an indication of how stable the trend is over time, whether a linear model is likely to be appropriate, and whether there appears to be a seasonal trend.

Scatter plot

data$rate <- with(data, aces/stdpop*10^5)

plot(data$rate,type="n",ylim=c(00,300),xlab="Year", ylab="Std rate x 10,000", bty="l",xaxt="n")
rect(36,0,60,300,col=grey(0.9),border=F)
points(data$rate[data$smokban==0],cex=0.7)
axis(1,at=0:5*12,labels=F)
axis(1,at=0:4*12+6,tick=F,labels=2002:2006)
title("Sicily, 2002-2006")

Summary Statistics

data %>%
  gather(Variable, Value) %>%
  group_by(Variable) %>%
  summarize(n = n(),
            Mean = mean(Value),
            SD = sd(Value),
            Median = median(Value),
            IQR = IQR(Value),
            Min = min(Value),
            Max = max(Value)) %>%
  knitr::kable()

Variable	n	Mean	SD	Median	IQR	Min	Max
aces	59	8.290508e+02	72.4139022	831.0000	111.50000	659.0000	1021.0000
month	59	6.406780e+00	3.4347044	6.0000	5.50000	1.0000	12.0000
pop	59	3.641212e+05	481.2691260	364277.4000	588.20000	363350.8000	364700.4000
rate	59	2.160942e+02	17.3402530	215.4118	26.38489	175.0353	260.7162
smokban	59	3.898305e-01	0.4918981	0.0000	1.00000	0.0000	1.0000
stdpop	59	3.834644e+05	5245.7738584	383428.4000	7898.05000	374113.1000	394005.6000
time	59	3.000000e+01	17.1755640	30.0000	29.00000	1.0000	59.0000
year	59	2.003966e+03	1.4138002	2004.0000	2.00000	2002.0000	2006.0000

Acute Coronary Event Rates Before and After Smoking Ban

data %>%
  mutate(Period = case_when(smokban == 0 ~ "1. Before Ban",
                            smokban == 1 ~ "2. After Ban")) %>%
  select(Period, aces, rate) %>%
  gather(Variable, Value, -Period) %>%
  group_by(Period, Variable) %>%
    summarize(n = n(),
            Mean = mean(Value),
            SD = sd(Value),
            Median = median(Value),
            IQR = IQR(Value),
            Min = min(Value),
            Max = max(Value)) %>%
  knitr::kable()

Period	Variable	n	Mean	SD	Median	IQR	Min	Max
1. Before Ban	aces	36	810.8611	67.76521	811.0000	99.25000	659.0000	937.0000
1. Before Ban	rate	36	213.2719	16.81020	213.1061	24.62515	175.0353	244.3259
2. After Ban	aces	23	857.5217	71.62394	843.0000	83.00000	738.0000	1021.0000
2. After Ban	rate	23	220.5118	17.59869	217.8656	21.43912	190.9362	260.7162

Poisson Regression Model

In step 2 (main paper) we chose a step change model and we also used a Poisson model as we are using count data. In order to do this we model the count data directly (rather than the rate which doesn’t follow a Poisson distribution), using the population (log transformed) as an offset variable in order to transform back to rates.

model1 <- glm(aces ~ offset(log(stdpop)) + smokban + time, family=poisson, data)
summary(model1)


Call:
glm(formula = aces ~ offset(log(stdpop)) + smokban + time, family = poisson, 
    data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.4336  -1.4432  -0.4431   1.2735   5.0701  

Coefficients:
              Estimate Std. Error  z value Pr(>|z|)    
(Intercept) -6.2431509  0.0111894 -557.955  < 2e-16 ***
smokban     -0.1116284  0.0172248   -6.481 9.13e-11 ***
time         0.0049450  0.0004992    9.905  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 308.52  on 58  degrees of freedom
Residual deviance: 197.31  on 56  degrees of freedom
AIC: 708.03

Number of Fisher Scoring iterations: 3

summary(model1)$dispersion

[1] 1

round(ci.lin(model1,Exp=T),3)

            Estimate StdErr        z P exp(Est.)  2.5% 97.5%
(Intercept)   -6.243  0.011 -557.955 0     0.002 0.002 0.002
smokban       -0.112  0.017   -6.481 0     0.894 0.865 0.925
time           0.005  0.000    9.905 0     1.005 1.004 1.006

The exponentiated intercept is the ACE rate in December 2001, and in this example is about 200 per 10,000.
The exponentiated smokban effect is a rate ratio, and in this example the ban multiplied the ACE rate by 0.89 (a reduction).
The expected ACE rate after the ban would be 0.89 * 200 = 178 per 10,000.

Add predictions from the Poisson model to the graph

datanew <- data.frame(stdpop = mean(data$stdpop),
                      smokban = rep(c(0, 1), c(360, 240)),
                      time = 1:600/10,
                      month = rep(1:120/10, 5))

pred1 <- predict(model1, type="response", datanew) / mean(data$stdpop) * 10^5

plot(data$rate, type="n", ylim=c(0,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 0, 60, 300, col=grey(0.9), border=F)
points(data$rate,cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines((1:600/10), pred1, col=2)
title("Sicily, 2002-2006")

To plot the counterfactual scenario we create a data frame as if smokban (the intervention) had never been implemented

datanew <- data.frame(stdpop=mean(data$stdpop),smokban=0,time=1:600/10,
  month=rep(1:120/10,5))

pred1b <- predict(model1, datanew, type="response") / mean(data$stdpop) * 10^5

plot(data$rate, type="n", ylim=c(0,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 0, 60, 300, col=grey(0.9), border=F)
points(data$rate,cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines((1:600/10), pred1, col=2)
lines(datanew$time, pred1b, col=2, lty=2)
title("Sicily, 2002-2006")

Return the data frame to the scenario including the intervention

datanew <- data.frame(stdpop = mean(data$stdpop),
                      smokban = rep(c(0, 1), c(360, 240)),
                      time = 1:600/10,
                      month = rep(1:120/10, 5))

Methodological Issues

Overdispersion: Quasi-Poisson model

In the model above we have not allowed for overdispersion - in order to do this we can use a quasipoisson model, which allows the variance to be proportional rather than equal to the mean.

model2 <- glm(aces ~ offset(log(stdpop)) + smokban + time, 
              family=quasipoisson, data)
summary(model2)


Call:
glm(formula = aces ~ offset(log(stdpop)) + smokban + time, family = quasipoisson, 
    data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.4336  -1.4432  -0.4431   1.2735   5.0701  

Coefficients:
              Estimate Std. Error  t value Pr(>|t|)    
(Intercept) -6.2431509  0.0211096 -295.749  < 2e-16 ***
smokban     -0.1116284  0.0324959   -3.435  0.00112 ** 
time         0.0049450  0.0009418    5.250 2.43e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasipoisson family taken to be 3.559195)

    Null deviance: 308.52  on 58  degrees of freedom
Residual deviance: 197.31  on 56  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 3

summary(model2)$dispersion

[1] 3.559195

round(ci.lin(model2,Exp=T),3)

            Estimate StdErr        z     P exp(Est.)  2.5% 97.5%
(Intercept)   -6.243  0.021 -295.749 0.000     0.002 0.002 0.002
smokban       -0.112  0.032   -3.435 0.001     0.894 0.839 0.953
time           0.005  0.001    5.250 0.000     1.005 1.003 1.007

Model checking and autocorrelation

Check the residuals by plotting against time

res2 <- residuals(model2,type="deviance")

plot(data$time, res2, ylim=c(-5,10), pch=19, cex=0.7, col=grey(0.6), main="Residuals over time", ylab="Deviance residuals", xlab="Date")

abline(h=0, lty=2, lwd=2)

Further check for autocorrelation by examining the autocorrelation and partial autocorrelation functions

acf(res2)

pacf(res2)

Adjusting for seasonality

There are various ways of adjusting for seasonality - here we use harmonic terms specifying the number of sin and cosine pairs to include (in this case 2) and the length of the period (12 months)

model3 <- glm(aces ~ offset(log(stdpop)) + smokban + time + harmonic(month,2,12), 
              family=quasipoisson, data)
summary(model3)


Call:
glm(formula = aces ~ offset(log(stdpop)) + smokban + time + harmonic(month, 
    2, 12), family = quasipoisson, data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.5292  -1.2634  -0.0972   1.2571   3.6882  

Coefficients:
                          Estimate Std. Error  t value Pr(>|t|)    
(Intercept)             -6.2523723  0.0175689 -355.877  < 2e-16 ***
smokban                 -0.1220868  0.0270311   -4.517 3.64e-05 ***
time                     0.0053892  0.0007955    6.774 1.13e-08 ***
harmonic(month, 2, 12)1  0.0370967  0.0100241    3.701 0.000520 ***
harmonic(month, 2, 12)2 -0.0182122  0.0096435   -1.889 0.064537 .  
harmonic(month, 2, 12)3  0.0381608  0.0096848    3.940 0.000244 ***
harmonic(month, 2, 12)4  0.0147634  0.0097221    1.519 0.134935    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasipoisson family taken to be 2.262186)

    Null deviance: 308.52  on 58  degrees of freedom
Residual deviance: 117.27  on 52  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 3

summary(model3)$dispersion

[1] 2.262186

round(ci.lin(model3,Exp=T),3)

                        Estimate StdErr        z     P exp(Est.)  2.5% 97.5%
(Intercept)               -6.252  0.018 -355.877 0.000     0.002 0.002 0.002
smokban                   -0.122  0.027   -4.517 0.000     0.885 0.839 0.933
time                       0.005  0.001    6.774 0.000     1.005 1.004 1.007
harmonic(month, 2, 12)1    0.037  0.010    3.701 0.000     1.038 1.018 1.058
harmonic(month, 2, 12)2   -0.018  0.010   -1.889 0.059     0.982 0.964 1.001
harmonic(month, 2, 12)3    0.038  0.010    3.940 0.000     1.039 1.019 1.059
harmonic(month, 2, 12)4    0.015  0.010    1.519 0.129     1.015 0.996 1.034

EFFECTS

ci.lin(model3,Exp=T)["smokban",5:7]

exp(Est.)      2.5%     97.5% 
0.8850715 0.8394010 0.9332270

Taking into account seasonality of the ACE rate, the smoking ban effect was a rate ratio of 0.885 (slightly bigger reduction)

TREND

exp(coef(model3)["time"]*12)

    time 
1.066808

Taking into account seasonality of the ACE rate, the ACE rate was being multiplied by about 1.07 each year, a long-term temporal trend.

We again check the model and autocorrelation functions

res3 <- residuals(model3,type="deviance")
plot(res3,ylim=c(-5,10),pch=19,cex=0.7,col=grey(0.6),main="Residuals over time",
  ylab="Deviance residuals",xlab="Date")
abline(h=0,lty=2,lwd=2)

acf(res3)

pacf(res3)

Predict and plot of the seasonally adjusted model

pred3 <- predict(model3, type="response", datanew) / mean(data$stdpop) * 10^5

plot(data$rate, type="n", ylim=c(120,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 120, 60, 300, col=grey(0.9), border=F)
points(data$rate, cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines(1:600/10, pred3, col=2)
title("Sicily, 2002-2006")

It is sometimes difficult to clearly see the change graphically in the seasonally adjusted model, therefore it can be useful to plot a straight line representing a ‘deseasonalised’ trend this can be done by predicting all the observations for the same month, in this case we use June.

pred3b <- predict(model3, type="response", transform(datanew,month=6)) / mean(data$stdpop) * 10^5

This can then be added to the plot as a dashed line

plot(data$rate, type="n", ylim=c(120,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 120, 60, 300, col=grey(0.9), border=F)
points(data$rate, cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines(1:600/10, pred3, col=2)
lines(1:600/10, pred3b, col=2, lty=2)
title("Sicily, 2002-2006")

Additional Material

Add a change-in-slope

We parameterize it as an interaction between time and the ban indicator

model4 <- glm(aces ~ offset(log(stdpop)) + smokban*time + harmonic(month,2,12),
  family=quasipoisson, data)
summary(model4)


Call:
glm(formula = aces ~ offset(log(stdpop)) + smokban * time + harmonic(month, 
    2, 12), family = quasipoisson, data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.7028  -1.1977  -0.1371   1.1837   3.6172  

Coefficients:
                          Estimate Std. Error  t value Pr(>|t|)    
(Intercept)             -6.2469039  0.0188768 -330.930  < 2e-16 ***
smokban                 -0.1852194  0.0835610   -2.217 0.031138 *  
time                     0.0051026  0.0008745    5.835 3.72e-07 ***
harmonic(month, 2, 12)1  0.0383261  0.0101687    3.769 0.000427 ***
harmonic(month, 2, 12)2 -0.0176326  0.0096966   -1.818 0.074872 .  
harmonic(month, 2, 12)3  0.0383558  0.0097142    3.948 0.000242 ***
harmonic(month, 2, 12)4  0.0149653  0.0097524    1.535 0.131081    
smokban:time             0.0014822  0.0018546    0.799 0.427879    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasipoisson family taken to be 2.274563)

    Null deviance: 308.52  on 58  degrees of freedom
Residual deviance: 115.81  on 51  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 3

round(ci.lin(model4,Exp=T),3)

                        Estimate StdErr        z     P exp(Est.)  2.5% 97.5%
(Intercept)               -6.247  0.019 -330.930 0.000     0.002 0.002 0.002
smokban                   -0.185  0.084   -2.217 0.027     0.831 0.705 0.979
time                       0.005  0.001    5.835 0.000     1.005 1.003 1.007
harmonic(month, 2, 12)1    0.038  0.010    3.769 0.000     1.039 1.019 1.060
harmonic(month, 2, 12)2   -0.018  0.010   -1.818 0.069     0.983 0.964 1.001
harmonic(month, 2, 12)3    0.038  0.010    3.948 0.000     1.039 1.020 1.059
harmonic(month, 2, 12)4    0.015  0.010    1.535 0.125     1.015 0.996 1.035
smokban:time               0.001  0.002    0.799 0.424     1.001 0.998 1.005

Predict and plot the ‘deseasonalised’ trend and compare it with the step-change only model

pred4b <- predict(model4, type="response", transform(datanew, month=6)) / mean(data$stdpop) * 10^5
plot(data$rate, type="n", ylim=c(120,300), xlab="Year", ylab="Std rate x 10,000", bty="l", xaxt="n")
rect(36, 120, 60, 300, col=grey(0.9), border=F)
points(data$rate, cex=0.7)
axis(1, at=0:5*12, labels=F)
axis(1, at=0:4*12+6, tick=F, labels=2002:2006)
lines(1:600/10, pred3b, col=2)
lines(1:600/10, pred4b, col=4)
title("Sicily, 2002-2006")
legend("topleft", 
       c("Step-change only", "Step-change + change-in-slope"),
       lty=1, col=c(2,4), inset=0.05, bty="n", cex=0.7)

Test if the change-in-slope improve the fit the selected test here is an F-test, which accounts for the overdispersion, while in other cases a likelihood ratio or wald test can be applied

anova(model3, model4, test="F") %>%
  knitr::kable()

Resid. Df	Resid. Dev	Df	Deviance	F	Pr(>F)
52	117.2661	NA	NA	NA	NA
51	115.8131	1	1.452971	0.6387915	0.4278555

Not surprisingly, the p-value is similar to that of the interaction term.
The ban reduced the ACE rate, but did not change the longer-term temporal trend in the rate (it continued to grow over time).

Interrupted Times Series Regression

Chuck Cleland

June 08, 2018