In a time series, change point detection tries to identify abrupt changes. Different approaches have been proposed and here I show one of them based on maximum likelihood estimates. I also recommend to have a look at this website (which is a wonderful introduction to change point detection using Likelihood ratio tests), this R package from R. Killick and I.A. Eckley and this paper (for a real-word application).

Data

Let's start and define some parameters (I will come back to this latter) and generate some data.

rm(list = ls())
# parameters
penalty<-10 # Delta AIC penalty
display<-1 # display all graphs during calculations (good to see what is going on)
sd<-1 # sd is known and set here as 1
# data
set.seed(50)
values<-c(rnorm(200,10,sd),rnorm(200,2.1,sd))

We can have a look at the data and see that there is a change point at 200 (or 201 depending on how you define it).

plot(values)


Maximum likelihood estimates

Assuming no change point in the time series, and i.i.d. observations (originating from a a normally distributed population with mean (expectation) ${\mu }_{0}$ and a known standard deviation $\sigma$), we can compute the product of individual density function as (see for instance this Wikipedia page):

${L}_{0}\left({\mu }_{0}\right)=\frac{1}{{\sqrt{2\pi {\sigma }^{2}}}^{N}}\prod _{i=1}^{N}\mathrm{exp}\left[-\frac{\left({x}_{i}-{\mu }_{0}{\right)}^{2}}{2{\sigma }^{2}}\right]$

which is nothing else than the likelihood to have no changepoint (all observations are i.i.d.) given the observations. Similarly, we can calculate

${L}_{1}\left({\mu }_{1},{\mu }_{2}\right)=\frac{1}{{\sqrt{2\pi {\sigma }^{2}}}^{N}}\prod _{i=1}^{\tau }\mathrm{exp}\left[-\frac{\left({x}_{i}-{\mu }_{1}{\right)}^{2}}{2{\sigma }^{2}}\right]x\prod _{j=\tau +1}^{N}\mathrm{exp}\left[-\frac{\left({x}_{j}-{\mu }_{2}{\right)}^{2}}{2{\sigma }^{2}}\right]$

which is the likelihood to have a changepoint at $\tau$ (given the observations). Here, we assume the data to have the same (known) standard deviation $\sigma$ but the first points (up to $\tau$) originate from a population with mean ${\mu }_{1}$ and the remaining points originate from a population with mean ${\mu }_{2}$. Obviously ${L}_{1}$ also depends on $\tau$.

We can then calculate maximum likelihood estimates:

${\stackrel{^}{L}}_{0}={L}_{0}\left({\stackrel{^}{\mu }}_{0}\right)$ and ${\stackrel{^}{L}}_{1}={L}_{1}\left({\stackrel{^}{\mu }}_{1},{\stackrel{^}{\mu }}_{2}\right)$

using, for instance :

${\stackrel{^}{\mu }}_{2}=\frac{1}{\left(N-\tau \right)}\sum _{j=\tau +1}^{N}{x}_{j}$

which is an (unbiased) maximum likelihood estimator for ${\mu }_{2}$ (this is simply the geometrical mean and this has been obtained by calculating the partial derivative of the log-likelihood with respect to ${\mu }_{2}$) (see this page).

Comparing Maximum likelihood estimates (MLE)

We cannot compare the MLE directly as we have to take into account the number of estimated parameters $k$ (here 1 versus 2). One possibility is to calculate the difference in AIC (Akaike information criterion) estimates:

$AIC=2k-2ln\left(\stackrel{^}{L}\right)$

Here, we have to keep in mind that the value of $\tau$ is not known. This means that we have to calculate $N$ different ${\stackrel{^}{L}}_{1}$ values ($N$ different $AI{C}_{1}$) that corresponds to all possible $\tau$ values. Among those values, we will chose the lowest one and compare it to $AI{C}_{0}$.

Let's do this in R !

#main function
likelihood <- function (data,display){
# Log likelihood calculation
# R_1 is just the log of the cumulative product (w/o -1/(2*sd^2) )
n<-length(data)
R_1 <- sapply (1:n,function(t){
part_inf<-(data[1:t]-mean(data[1:t]))^2
part_sup<-(data[(t+1):n]-mean(data[(t+1):n]))^2
cumsum(part_inf)[length(part_inf)]+cumsum(part_sup)[length(part_sup)]
}  )
R_0 <- cumsum((data-mean(data))^2)[n]
#Log \hat{L}_1 (just need to add -(n/2)*log (2 * pi * sd^2)) and multiply -1/(2*sd^2)
norm_sum<-(-n/2)*(log(2*pi*sd^2))
Log_H1<-norm_sum+(-1/(2*sd^2)) * R_1 # This is the *log* MLE for Hypothesis 1
#Log \hat{L}_0 (just need to add -(n/2)*log (2 * pi * sd^2)) and multiply -1/(2*sd^2)
Log_H0<-norm_sum+(-1/(2*sd^2)) * R_0 # This is the *log* MLE for Hypothesis 0
#AIC_1 (2 parameters in the model, so k=2)
AIC_1 <- 2*2-2*Log_H1
#AIC_0 (1 parameter in the model, so k=1)
AIC_0 <- 2*1-2*Log_H0
# Penalty and display
if ((AIC_0-min(AIC_1,na.rm = TRUE)) <= penalty | which.min(AIC_1) ==1 | which.min(AIC_1) == n)
{hyp<-0
}else{hyp<-1
}
if (display==1) {
par(mfrow=c(2,1))
plot(data,xlab="Index (relative to chunk start)")
if (hyp ==0)
{title(main= paste("No changepoint"))
}else{title(main= paste("Change point :",which.min(AIC_1)))
abline(v=which.min(AIC_1), col="red",lty=2,lwd=3)
}
plot(AIC_1,ylab="Akaike information criterion",xlab="Index (relative to chunk start)")
abline(h=AIC_0, col="blue",lty=2,lwd=2)
}
return<-c(hyp,which.min(AIC_1))
}

As you can see the function above (likelihood) not only caculate AIC values but also allows to compare models based on their difference as discussed here and in this publication. Here, we will consider the alternative hypothesis (there is indeed a changepoint) to be valid is the difference betwenn the lowest $AI{C}_{1}$ and $AI{C}_{0}$ is 10 (see penalty) or above (for models with identical number of free parameters, this would correspond to a relative probability of 0.67% that the null hypothesis is a better description).

Let's apply this function to our values (using a display of 1 to see the plots)

return_lik<-likelihood(values,display)

As you can see, we have successfully found a change point at 200 (the lower graph shows all AIC values including the one calculated for no change point, blue dashed line). I will not compare this method the one I described in this post but the MLE method is actually able to detect minute change (with virtually no free parameters). Look at the result below that corresponds to a Signal to Noise Ratio of about 1.

set.seed(50)
values2<-c(rnorm(200,3.1,sd),rnorm(200,2.0,sd))
return_lik<-likelihood(values2,display)

That's it for now. Head to this post to learn how to identify multiple change points and states.

Thanks to Rebecca Killick for useful discussion

Work done with Iryna Riabko