As usual, we check for some packages

# Check for Packages
if (!require(ggsignif)) install.packages('ggsignif')
## Loading required package: ggsignif
if (!require(ggplot2)) install.packages('ggplot2')
## Loading required package: ggplot2

I re-use the data from the previous post and create a data.frame (same code as before). There are, however, 4 additional lines of script. Using the function combn, I determine all possible combinations of 2 factor levels. Note that I have converted levels to numerics (e.g. "T1" is 1, "T2" is 2).

# data
T1<-c(20,17,18,22,20,22,24,25)
T2<-c(14,12,12,13,16,14,18,11,17,14)
T3<-c(19,20,18,21,21,17,22)
T4<-c(27,24,22,30,26,29,26,28)
# data.frame
values<-c(T1,T2,T3,T4)
exp<-as.factor(c(rep('T1',length(T1)),rep('T2',length(T2)),rep('T3',length(T3)),rep('T4',length(T4))))
all<-data.frame(values,exp) # just create a data.frame with values and factor


#  Combinations
all$factor_num<-as.numeric(all$exp) # I need to convert those levels to numeric 
v_factors<- 1:tail(all$factor_num,n=1)
all_pairwise<-combn(v_factors,2)
all_pairwise # this calculates all combinations 
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    1    1    2    2    3
## [2,]    2    3    4    3    4    4

There are indeed 6 possibilities (the first one is level T1 with level T2, the second one is level T1 with level T3). I need now to perform 6 t-tests. This is basically what the function main does: for z ranging from 1 to n_pairwise, it will extract (from the data.frame) correct values (e.g. values corresponding to T1 and T2 for the first t-test) and perform the test (I use an extra argument var.equal to perform a t-test and not a Welch's t-test). Note finally that I am using the function sapply to perform all those operations (instead of using a for loop).

n_pairwise<-length(all_pairwise[1,]) # will return 6 here

# The function below calculate all t.test (for all combinations)
main<-sapply(1:n_pairwise, function (z) {
  data1<-all_pairwise[1,z]
  data2<-all_pairwise[2,z]
  t.test(all[all[,3]==data1,1],all[all[,3]==data2,1],var.equal = TRUE)
}
)
main[1,]

Let us have a look at main. First we look at main[1,], which outputs all t-tests statistics.

## [[1]]
##        t 
## 5.792583 
## 
## [[2]]
##        t 
## 1.045278 
## 
## [[3]]
##         t 
## -4.075272 
## 
## [[4]]
##         t 
## -5.417693 
## 
## [[5]]
##         t 
## -10.73389 
## 
## [[6]]
##         t 
## -5.756966

main[,1] gives all output parameters for a given combination.

main[,1]
## $statistic
##        t 
## 5.792583 
## 
## $parameter
## df 
## 16 
## 
## $p.value
## [1] 2.749152e-05
## 
## $conf.int
## [1] 4.374814 9.425186
## attr(,"conf.level")
## [1] 0.95
## 
## $estimate
## mean of x mean of y 
##      21.0      14.1 
## 
## $null.value
## difference in means 
##                   0 
## 
## $stderr
## [1] 1.191179
## 
## $alternative
## [1] "two.sided"
## 
## $method
## [1] " Two Sample t-test"
## 
## $data.name
## [1] "all[all[, 3] == data1, 1] and all[all[, 3] == data2, 1]"

Notice the [[ ]], which indicates that main is actually a list (a structure that may contain elements of different types). FInally, we create a new data.frame (ALL), which will contain all results (in a readable format).

# We create a new data.frame where we put the results of mainALL<-as.data.frame(t(as.matrix(all_pairwise)))
ALL[,3]<-unlist(main[1,])
ALL[,4]<-unlist(main[3,])
ALL[,5]<-p.adjust(ALL[,4], method = "fdr", n = length(ALL[,4]))
colnames(ALL)<-c("Exp_1", "Exp_2","statistics","Naive p value","FDR")
ALL # Let's have a look 
##   Exp_1 Exp_2 statistics Naive p value          FDR
## 1     1     2   5.792583  2.749152e-05 8.247457e-05
## 2     1     3   1.045278  3.149419e-01 3.149419e-01
## 3     1     4  -4.075272  1.135702e-03 1.362842e-03
## 4     2     3  -5.417693  7.126807e-05 1.069021e-04
## 5     2     4 -10.733894  1.016466e-08 6.098795e-08
## 6     3     4  -5.756966  6.632425e-05 1.069021e-04

Look at the fifth column where we have calculated new p-values using a FDR correction (see also my previous post). You might notice that those values are slightly different from that calculated here. The reason is simple: pairwise.t.test has default parameters. Using:

# first let us re-do a pairwise calculation 
pairwise.t.test(all[, 1],all[, 2], p.adjust.method = "fdr", pool.sd=FALSE, var.equal = TRUE)
## 
##  Pairwise comparisons using t tests with non-pooled SD 
## 
## data:  all[, 1] and all[, 2] 
## 
##    T1      T2      T3     
## T2 8.2e-05 -       -      
## T3 0.31494 0.00011 -      
## T4 0.00136 6.1e-08 0.00011
## 
## P value adjustment method: fdr

We get identical values. Last but not least, we use the geom_signif function (with appropriate arguments) to plot corrected p-values (ALL[,5]) for all combinations (asplit(all_pairwise,2)).

#ggplot
p<-ggplot(all, aes(x=exp, y=values,)) +
       geom_boxplot( col= c("#00AFBB", "#E7B800", "#FC4E07","#080808"),size=1.) +
       geom_jitter()+
       geom_signif(annotations=signif(ALL[,5],digits=1), comparisons=asplit(all_pairwise,2),step_increase = 0.1)+
       labs(x="Treatment", y="PSA Concentration (ng/ml)",title="Pairwise comparisons using FDR correction for p-values")+
       theme_classic()
p # display

BTW, I also can save the plot in a svg format, using:

ggsave(file="test.svg", plot=p, width=10) # this saves automatically. You need install the svglite package
## Saving 10 x 7.61 in image
Comments

Description