We have previously performed some 2-sample *t*-tests. When the p-value was smaller than 0.05, we have concluded that the 2 samples originate from 2 populations with equal means.
Obviously the probability to conclude that 2 samples (originating from populations with different means) have equal means depends on the
sample size, the actual difference in means and the type-I error. That is what we will investigate here.

Let us define some parameters:

```
delta<-0.5 # difference in means
sd<-1.5 # standard deviation in populations
mu1<-20 # mean of 1st population
mu2<-mu1+delta # mean of 2nd population
n<-40 # number (n) of elements in each sample
```

Under the null hypothesis, we assume both samples originate from 2 populations with equal means:

$${H}_{0}:{\mu}_{1}={\mu}_{2}$$ Because ( As for any test, I calculate a critical value (*stat_t*) above which I reject the null hypothesis (here with a type-I error of 0.05):

```
# Let us calculate some parameters using the formula
s_d<-sd*sqrt(2/n) # standard error of the mean difference
stat_c<-qnorm((1-0.05/2),0,s_d) # max value
stat_c
```

`## [1] 0.6573919`

Let us plot some stuffs:

- the distribution of the observed difference in means under the null hypothesis (black, equal means in both populations),
- the distribution of the observed difference in means assuming a difference in means (in populations) of $\delta $ (thick black line),
- the type-I error (α, gray area) [the probability to reject the null hypothesis given it is true, a false positive],
- the type-II error (β, black lines at (3/4)π) [the probability to reject the alternative hypothesis given it is true, a false negative] and
- two vertical lines that show the location of
*stat_c*and -*stat_c*(two-tailed test).

Here, I have defined the alternative hypothesis as: $${H}_{1}:{\mu}_{2}-{\mu}_{1}=\; \delta $$ where $\delta $ is set at 0.5 here.

```
# plot. Using non-standardized variables
x<-seq(-1.2,1.8,0.001)
y<-dnorm(x,0,s_d)
plot(x,y,type='l',ylab='density',xlab='difference in means',lwd=1.5,ylim=c(0,1.35))
abline(v=c(-stat_c,stat_c),lty=2,lwd=2)
abline(h=0)
# annotate
text(-0.35, 1, expression("H"[0]),cex=1.5)
text(-0.75,0.05,expression(alpha/2),cex=0.8)
text(+0.74,0.05,expression(alpha/2),cex=0.8)
text(0.95, 1,expression("H"[1]),cex=1.5)
text(0.3,0.4,expression(beta),cex=1.5)
text(0.8,0.4,expression(Power),cex=1)
text(delta/2,1.35,expression(mu[2]-mu[1]),cex=1)
segments(0, 1.3,delta,1.3)
segments(0, 1.1,0,1.3)
segments(delta, 1.1,delta,1.3)
# just fill some areas
polygon(c(x[x>=stat_c], stat_c), c(y[x>=stat_c], y[x==max(x)]), col="#BEBEBE80") # area under curve
polygon(c(-stat_c,x[x<=-stat_c]), c(y[x==min(x)], y[x<=-stat_c]), col="#BEBEBE80") # area under curve
x1<-x+(delta)
y1<-y
lines(x1,y1,lwd=2)
polygon(c(stat_c,x1[x1<=stat_c]), c(y[x1==min(x1)], y[x1<=stat_c]), border = gray(.1),density = 10,angle = 135,col = "#BEBEBE80")
polygon(c(x1[x1>=stat_c], stat_c), c(y[x1>=stat_c], y[x1==max(x1)]), border = gray(.1), density = 5, angle = 45,col ="#BEBEBE80")
```

We define the power as :

$$P=1-\beta $$This is the probability to reject the null hypothesis (because the difference in observed means is below stat_c) given the alternative hypothesis is true.
To calculate this value, we just have to use *pnorm* and set the mean at $\delta $.

`(1-pnorm(stat_c,delta,s_d)) # calculate Power assuming normal distribution`

`## [1] 0.3194448`

Here, given a type-I error of 0.05, a sample size of 40, a difference in means (in populations) of 0.5 and a variance (in populations) of 2.25 (square of 1.5), we will 'see' a difference in means (reject the null hypothesis) 32 percent of the time. In other words, we will incorrectly conclude that the means are different 68 percent of the time! This value can be also obtained using the function power.t.test:

`power.t.test(n, delta, sd)$power # calculate Power using a t.test`

`## [1] 0.3129206`

The difference between the 2 values can be attributed to the fact that we perform here a *t*-test and not a *z*-test. A third possibility consists in performing a simulation (might talke a few tens of seconds):

```
# mini-simulation
n_sim<-10^6
all<-sapply(1:n_sim,function(z){
abs((mean(rnorm(n,mu2,sd))-mean(rnorm(n,mu1,sd))))>stat_c
} )
sum(all/n_sim)
```

`## [1] 0.320129`

Here, we just have calculated 1,000,000 times the difference in observed means (for 2 datasets sampled from 2 normal distributions) and have checked if the difference (in absolute value) were larger than *stat_c*.

In summary, we shall always specify the statistical power when performing *t*-tests. Here, assuming that your test was non significant (because your p-value was, by chance, larger than .05), you should write: **I do not see any difference in means but I shall mention that my test is such that, in more than 30 percent of the times,
my test is not able to detect a difference smaller than 0.5**. Alternatively, you might state that the smallest difference you might detect for a power of 80 percent (which is considered as a good statistical power) is about 1 (see code below):

```
delta_c<-seq(0.1,2,0.1)
p_all<-power.t.test(n, delta_c, sd)$power
plot(delta_c,p_all,abline(h=0.8,lwd=2),ylab="Power [Type I-error: 0.05, n=40]",xlab="Difference in means")
grid()
```

Finally, a power-analysis also explains why the type-I errror is often chosen at 0.05: a value of 0.01 would increase the type-II error and, in turns, decrease statistical power.

Comments

Description