In a time series, change point detection tries to identify abrupt changes. Different approaches have been proposed and here I show one of them based on a simple rolling t-test.

For this example, I use ggplot2 and so you need to enter:

rm(list = ls()) 
if (!require(ggplot2)) install.packages('ggplot2')
## Loading required package: ggplot2

Let's generate some data : 50 points where the first (last) 25 points originate from a normal distribution with mean 2 (8) and variance 0.25 (0.25). The change point therefore occurs at index 26 (or 25 depending on how you define it).

# generate data
set.seed(10)
data<-c(rnorm(25,2,0.5),rnorm(25,8,0.5))

There are many papers discussing change point detection with a t-test. Here, I have chosen to cite a paper from Seol et al. that basically applies this method to Magnetic Tweezers time traces. The idea is quite simple: for each point (i), we select 2 different windows of length (n-1):

x i - n , . . . , x i - 1 x i + 1 , . . . , x i + n

We then perform a t-test on this 2 samples and return 0 if the p-value smaller than some chosen value (say 0.05) or the test's statistics (its absolute value) otherwise. In my opinion the paper of Seol and collaborators is over-complicating things (giving analytic formulas for the cumulative distribution) but it shows some graphics that help the reader to understand.

The t-test is performed for all possible datapoints (given that both windows have a size of n-1). Note that I perform here an unequal variances t-test (the default argument is var.equal = FALSE) because we expect large differences in variances (when close to the change point).

# test
alpha<-0.05
n<-10 # length of window 
t<-1:(length(data)-2*n+2) # define a vector for sapply (rolling)
values<-sapply(t, function(z){
  w1 <- data[z:(n + z-2)] # window1
  w2 <- data[(n+z):(2*n + z-2)] #window2
  res<-t.test(w1, w2)  # 0 for H0, t for H1 (see below)
  if (res$p.value< alpha) {
    t<-abs(res$statistic)
  }
  else {
    t<-0
  }
})

That is basically it. We have calculated 32 values (50 minus 2 times n-1, where n is 10) that correspond to indexes:

values_x<-seq(n,length(data)-n+1,1)
values_x
##  [1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
## [26] 35 36 37 38 39 40 41

A graph is always good to see what's happening. Because I will make 3 graphs using the same script, I first create a function that creates 1 graphs composed of 2 panels: the first one displays the data (with 2 rectangles indicating the 2 windows for index i); the second one shows the output of the sapply calculation (for index i and below) :

# a simple function to plot
my_plot<-function(i)
{
  par(mfrow=c(2,1))
  i<-i-n+1
  j<-i-2
  plot(data,pch=19,cex=1,cex.main=1.5, cex.lab=1.5, cex.axis=1.5)
  rect(j+2,-n,n+j,100, col= 'blue', density=5)
  rect(n+j+2,-50,2*n+j,100, col= 'black', density=5)
  plot(values_x[1:(i)],values[1:(i)],
       ylim=c(0,10),xlim=c(0,50),
       pch=19,cex=1, cex.main=1.5, cex.lab=1.5, cex.axis=1.5,
       xlab="index",ylab="test statistics (H1, abs)")
  par(mfrow=c(1,1))
}
my_plot(10)

my_plot(20)

my_plot(41)

As you can see, the change point corresponds to the maximum of the test's statistics (as defined above).

values_x[which.max(values)]
## [1] 25

Note (i) that some averaging is performed in the original paper (I did not do it here as I guess it is always best to work on raw data) and (ii) that data have to be normally distributed (t-test assumptions).


Work done with Iryna Riabko