As an example of non-linear fitting, I will perform a Michaelis Menten fit on some synthetic data. I am using 3 vectors, which represent the Substrate concentration (S), the rate of product formation (v) as well as the error on v (dv).
# data (use c to create a vector and combine elements of identical types) v<-c(0.004507692,0.004192308,0.00355384,0.002576923,0.001661538,0.001064286) S<-c(3.6000,1.8000,0.9000,0.4800,0.2250,0.1125) dv<-c(0.00012, 0.00008, 0.00012, 0.00010, 0.00007, 0.00005)
In the following, I assume the error on S to be negligible. Here, dv is the standard error of the mean. Having performed n measurements at a given substrate concentration, the standard error of the mean is the sample standard deviation divided by the square root of the sample size (n). In others words, a 0.95 estimate of the mean would be [v- 1.96*dv, v + 1.96*dv] if n is large enough.
The function nls is used to perform a non-linear fitting using Non-Linear Least Squares. One important assumption is that the errors are assumed to be normally distributed (be careful when transforming your data !). Non-Linear Least Squares tends to minimize the following quantity (a demonstration can be found in Econometrics by G. S. Maddala (1976) using a Maximum likelihood estimation):
where is the number of experimental points, are the actual experimental values (here v), are estimates (what we get from the fit) of those values and are the errors on the data points (here dv).
Performing the fit just consists of:
where we have to give initial parameters for the function (here an hyperbola with 2 parameters A and B). Note also the argument weight that represents the weight associated to the data points. This is just the denominator of the and this has a large effect on the sum. As a small error ( or dv) yields a large weight, the difference between the experimental point and the estimated one has to be small (otherwise the will increase) when the error is small. To plot the experimental data, add the error bars as well as the predicted y values (predict(nlmodel), ), I use:
# fit nlmodel <- nls(v ~ A*S/(B + S), start = list(A = 0.004, B = 1), weights = (1/dv^2))
# Plot plot(S,v) # scatter plot arrows(S, v-dv,S, v+dv, length=0.05, angle=90, code=3) # error bars (hack) lines(S,predict(nlmodel)) # add the predicted curve
The plot above does not look that nice. Instead of smoothing the fit, we will create a new vector that has far more data points using seq. First, we will extract the coefficients A and B.
# extract estimated coefficients and create extra values for the x and y axis A_fit<-summary(nlmodel)$coefficients # A B_fit<-summary(nlmodel)$coefficients # B S_xtra<-seq(min(S),max(S),length.out = 50) # create a dummy S vector (50 values here) vfit_xtra<- A_fit*S_xtra/(B_fit + S_xtra) # calculate new v values (according to those 50 new values) # new plot plot(S,v) arrows(S, v-dv,S, v+dv, length=0.05, angle=90, code=3) lines(S_xtra,vfit_xtra) grid() # add a grid
Having performed the fit, we need to see if the fit is good. Because we do not favour any of the data points, we expect the theoretical curve (that calculated from the estimates of A and B) to be at a distance of about of every experimental points. In that case: and so equals .
For simplicity (not to provide the number of data points), we might divide by . However, we also have to take into account that there are 2 degrees of freedom (2 fitting parameters A and B here) and so divide by minus 2 (here (length(v)-2)). This quantity is the reduced .
chi2R<-sum(with(my_data, (v-predict(nlmodel))^2/(dv^2))/(length(v)-2)) chi2R
##  0.9413936
A value close to 1 indicates that the theoretical model can successfully reproduce the experimental trends (which is the case here !). For non-linear models, I shall finally mention that the coefficient of determination (R-squared) should not be used to evaluate the goodness of fit (REF).Remark 1
The plot below shows the result of the previous fit (black) together with a second fit where I have decreased the error on the very last point (located at 3.6000 on the x axis) by a factor of about 10 (0.00012 versus 0.00001, red curve). As you can see (and as explained below), the last point has quite a big weight and so the prediced curve has to be as close as possible to the experimental data point).
There is a slight increase in the reduced (now at 1.35), which might be attributed to the fact that the predicted curve is farther away from the fourth experimental point.Remark 2
The plot below shows the result of the fit (black) where I have decreased the error on the very last point (located at 3.6000 on the x axis) by a factor of about 10 (0.00012 versus 0.00001) and have changed its value from 0.004507692 to 0.004007692. As before, this point has quite a big weight and so the prediced curve has to be as close as possible to the experimental data point. Here, howevever, the redcued reduced is about 12.5, which indicates that our theroretical model (hyperbola) cannot reproduce the experimental trends.
The plot below shows the result of the fit (black) where I have decreased the error on the very last point (located at 3.6000 on the x axis) by a factor of about 10 (0.00012 versus 0.00001) and have changed its value from 0.004507692 to 0.004007692. The difference, however, is that I have considered the errors unknown.
Because the error are unknown (but do exist), we have to assume that they are identical. An estimator for can be found (as shown in Econometrics by G. S. Maddala (1976) and many other books and lectures); there, however, we cannot calculate the goodness of fit using the reduced . All we can do is to compare a theoretical model versus some other ones (e.g. using a metric such as the Akaike information criterion). I will come back to this in a new blog post.
nlmodel <- nls(v ~ A*S/(B + S), start = list(A = 0.004, B = 1))
When having errror bars, you could also use ggplot2. This is a powerful library but not so easy to use for beginners. In case you would like to reproduce those plots, you need to enter:
if (!require(ggplot2)) install.packages('ggplot2') # This installs the library if not installed and this loads the library
## Loading required package: ggplot2
ggplot()+ geom_errorbar(data=my_data,aes(ymin=v-dv, ymax=v+dv,x=S), width=.1)+ geom_point(data=my_data,aes(y=v,x=S),size=3)+ geom_line(data = fit_xtra,aes(y=vfit_xtra,x=S_xtra),size=0.2)+theme_bw()