For this example, I use ggplot2 and so you need to enter:

rm(list = ls()) 
if (!require(ggplot2)) install.packages('ggplot2')
## Loading required package: ggplot2

This basically installs the Library ggplot2 if it is not in your system and then loads the required functions.

The next few lines are to generate artificial data points (about 10,000) using an arbitrary function (having 2 parameters a1 and a2 set at 1 and 50, respectively).

set.seed(1) # fix random
interval<-0.01 # Sampling frequency
max_F<-100 # max frequency
# define Lorentzian Function parameters
# Lorentzian function
y<- a1/((Freq)^2 + a2^2)

Here, I use a log-log plot and so add an extra argument.


To simulate a real experiment, I add some noise (using the function jitter but this could be any type of noise).


Obviously fitting over 10,000 points is a demanding task. Moreover, the distribution of the error is not known and so a Least-Square fit (that assumes normally distributed errors) cannot be used. To solve those issues, we will average over some points (say n). When n is sufficiently large (say 30), the distribution of the sample mean (calculated over those n points) follows a normal distribution (independent of the distribution of those n points). Look here for more details.

N ( μ , σ 2 / n ) An estimate for the μ is obtained from the mean of the n values. An estimate for σ2 is obtained from (see also here): i n ( x i x ¯ ) 2 n 1

Below, I just select n (50 here). Because n is not a multiple of the number of experimental data points, I have to truncate the data.

# block data (every n points)
reminder_F<-length(Freq)%%n # %% is the arithmetic operator for modulus
Freq<-Freq[1:length_N] # I use [ ] to select some elements of the vector
y_noise<-y_noise[1:length_N] # Do not use ( ) as this is not a function

Next, to estimate the mean, variance and x values (what I call Freq above) (over n points), I use:

temp_group<-(rep(1:(length_N/n), each=n))
Y <- data.frame(Freq, y_noise, temp_group )

mean_val<-tapply(Y$y_noise, Y$temp_group, FUN = mean)
sd_val<-tapply(Y$y_noise, Y$temp_group, FUN = sd)
Freq_N<-tapply(Y$Freq, Y$temp_group, FUN = mean)

Here, I use the function tapply, which allows to perform some operations (e.g. calculating the mean) on values (e.g. Y$y_noise) depending on a certain factor (Y$temp_group). Hence, those new vectors (e.g. mean_val) will have a length equal to the number of factor levels.

I then create a data.frame where I store all those three vectors and a vector for the error on the mean (assuming a 0.95 confidence interval).

IC<-1.96*sd_val/sqrt(n) # 1.96 as n is large

Finally, I fit thoses data using:

# Fit
my_fit<-nls(formula= mean_val ~  a1/((Freq_N)^2 + a2^2)
    , data=Y_N, start=list(a1=1,a2=1),
    weights = c(prop.table(my_weights)) )

Here, instead of using the weights directly (see for instance this page), I use the function prop.table.This allows to to normalise the weights (when they are large R gives an error). We can plot everything using:

# Plot
p1<-ggplot()+geom_point(data=Y_N, aes(x=Freq_N,y=mean_val))+
  geom_errorbar(data=Y_N,aes(ymin=mean_val-(1/1.96)*IC, ymax=mean_val+(1/1.96)*IC,x=Freq_N), width=.05,col='blue',lwd=1)+
  geom_line(aes(x=Freq,y=y_noise),lwd=0.1,col='grey',alpha = 1/1.5)+

The results of the fit (parameters estimates) can be obtained using summary(my_fit) and the quality of the fit from the reduced χ 2 (see also here):

## Formula: mean_val ~ a1/((Freq_N)^2 + a2^2)
## Parameters:
##     Estimate Std. Error t value Pr(>|t|)    
## a1  1.002145   0.001626   616.2   <2e-16 ***
## a2 50.073147   0.050559   990.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 1.083e-07 on 197 degrees of freedom
## Number of iterations to convergence: 10 
## Achieved convergence tolerance: 7.197e-07
chi2R<-sum(with(Y_N, my_weights*(mean_val-fit)^2/(length(mean_val)-2)))
## [1] 0.8725221