As usual, we check for some packages
rm(list = ls())
# Check for Packages
if (!require(ggsignif)) install.packages('ggsignif')
## Loading required package: ggsignif
if (!require(ggplot2)) install.packages('ggplot2')
## Loading required package: ggplot2
I re-use the data from the previous post and create a data.frame (same code as before). There are, however, 4 additional lines of script. Using the function combn, I determine all possible combinations of 2 factor levels. Note that I have converted levels to numerics (e.g. "T1" is 1, "T2" is 2).
# data
T1<-c(20,17,18,22,20,22,24,25)
T2<-c(14,12,12,13,16,14,18,11,17,14)
T3<-c(19,20,18,21,21,17,22)
T4<-c(27,24,22,30,26,29,26,28)
# data.frame
values<-c(T1,T2,T3,T4)
exp<-as.factor(c(rep('T1',length(T1)),rep('T2',length(T2)),rep('T3',length(T3)),rep('T4',length(T4))))
all<-data.frame(values,exp) # just create a data.frame with values and factor
# Combinations
all$factor_num<-as.numeric(all$exp) # I need to convert those levels to numeric
v_factors<- 1:tail(all$factor_num,n=1)
all_pairwise<-combn(v_factors,2)
all_pairwise # this calculates all combinations
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 1 1 2 2 3
## [2,] 2 3 4 3 4 4
There are indeed 6 possibilities (the first one is level T1 with level T2, the second one is level T1 with level T3). I need now to perform 6 t-tests. This is basically what the function main does: for z ranging from 1 to n_pairwise, it will extract (from the data.frame) correct values (e.g. values corresponding to T1 and T2 for the first t-test) and perform the test (I use an extra argument var.equal to perform a t-test and not a Welch's t-test). Note finally that I am using the function sapply to perform all those operations (instead of using a for loop).
n_pairwise<-length(all_pairwise[1,]) # will return 6 here
# The function below calculate all t.test (for all combinations)
main<-sapply(1:n_pairwise, function (z) {
data1<-all_pairwise[1,z]
data2<-all_pairwise[2,z]
t.test(all[all[,3]==data1,1],all[all[,3]==data2,1],var.equal = TRUE)
}
)
main[1,]
Let us have a look at main. First we look at main[1,], which outputs all t-tests statistics.
## [[1]]
## t
## 5.792583
##
## [[2]]
## t
## 1.045278
##
## [[3]]
## t
## -4.075272
##
## [[4]]
## t
## -5.417693
##
## [[5]]
## t
## -10.73389
##
## [[6]]
## t
## -5.756966
main[,1] gives all output parameters for a given combination.
main[,1]
## $statistic
## t
## 5.792583
##
## $parameter
## df
## 16
##
## $p.value
## [1] 2.749152e-05
##
## $conf.int
## [1] 4.374814 9.425186
## attr(,"conf.level")
## [1] 0.95
##
## $estimate
## mean of x mean of y
## 21.0 14.1
##
## $null.value
## difference in means
## 0
##
## $stderr
## [1] 1.191179
##
## $alternative
## [1] "two.sided"
##
## $method
## [1] " Two Sample t-test"
##
## $data.name
## [1] "all[all[, 3] == data1, 1] and all[all[, 3] == data2, 1]"
Notice the [[ ]], which indicates that main is actually a list (a structure that may contain elements of different types). FInally, we create a new data.frame (ALL), which will contain all results (in a readable format).
# We create a new data.frame where we put the results of mainALL<-as.data.frame(t(as.matrix(all_pairwise)))
ALL[,3]<-unlist(main[1,])
ALL[,4]<-unlist(main[3,])
ALL[,5]<-p.adjust(ALL[,4], method = "fdr", n = length(ALL[,4]))
colnames(ALL)<-c("Exp_1", "Exp_2","statistics","Naive p value","FDR")
ALL # Let's have a look
## Exp_1 Exp_2 statistics Naive p value FDR
## 1 1 2 5.792583 2.749152e-05 8.247457e-05
## 2 1 3 1.045278 3.149419e-01 3.149419e-01
## 3 1 4 -4.075272 1.135702e-03 1.362842e-03
## 4 2 3 -5.417693 7.126807e-05 1.069021e-04
## 5 2 4 -10.733894 1.016466e-08 6.098795e-08
## 6 3 4 -5.756966 6.632425e-05 1.069021e-04
Look at the fifth column where we have calculated new p-values using a FDR correction (see also my previous post). You might notice that those values are slightly different from that calculated here. The reason is simple: pairwise.t.test has default parameters. Using:
# first let us re-do a pairwise calculation
pairwise.t.test(all[, 1],all[, 2], p.adjust.method = "fdr", pool.sd=FALSE, var.equal = TRUE)
##
## Pairwise comparisons using t tests with non-pooled SD
##
## data: all[, 1] and all[, 2]
##
## T1 T2 T3
## T2 8.2e-05 - -
## T3 0.31494 0.00011 -
## T4 0.00136 6.1e-08 0.00011
##
## P value adjustment method: fdr
We get identical values. Last but not least, we use the geom_signif function (with appropriate arguments) to plot corrected p-values (ALL[,5]) for all combinations (asplit(all_pairwise,2)).
#ggplot
p<-ggplot(all, aes(x=exp, y=values,)) +
geom_boxplot( col= c("#00AFBB", "#E7B800", "#FC4E07","#080808"),size=1.) +
geom_jitter()+
geom_signif(annotations=signif(ALL[,5],digits=1), comparisons=asplit(all_pairwise,2),step_increase = 0.1)+
labs(x="Treatment", y="PSA Concentration (ng/ml)",title="Pairwise comparisons using FDR correction for p-values")+
theme_classic()
p # display
BTW, I also can save the plot in a svg format, using:
ggsave(file="test.svg", plot=p, width=10) # this saves automatically. You need install the svglite package
## Saving 10 x 7.61 in image