I have the feeling that students sometimes think R is just for statistics. But R is also a perfect environment for organizing and displaying data without doing fancy statistics. That is what I show here using real data and taking advantage of the famous yet powerful package tidyverse.

rm(list = ls())
if (!require(tidyverse)) install.packages('tidyverse')
## Loading required package: tidyverse
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.4     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
Data

Here we will be looking at some csv file, which looks like this in Excel (see below for a screenshot). These data have been obtained by 6 different students performing 2 different experiments. Students 5 and 7 performed an experiment at high AMP (an allostreric activator) concentration (and so are labelled AMPP5 and AMPP7, where 'P' stands for plus). Students 1, 2, 3 and 4 performed an experiment at low AMP concentration (and so are labelled AMPM1, AMMPM2, AMPM3 and AMPM4, where 'M' stands for Minus). For those experiments they were measuring the initial speed of reaction (vi) for different substrate concentrations (Pi). Notice that some points are missing and so have NA values.

#################################
##########  data  ###############
#################################
all<-read_delim("https://biosoft-ipcms.fr/files/Files_BLOG/Blog_Data/ALL.csv",delim=";")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   Pi = col_double(),
##   AMPP5 = col_double(),
##   AMPP7 = col_double(),
##   AMPM2 = col_double(),
##   AMPM1 = col_double(),
##   AMPM3 = col_double(),
##   AMPM4 = col_double()
## )

I have exported this .csv file using a semicolon and so will pass this argument to the read_delim function. This function is from the readr package, which is part of the tidyverse package. Tidyverse includes a collection of packages that are designed for data science.

all
## # A tibble: 20 x 7

## Pi AMPP5 AMPP7 AMPM2 AMPM1 AMPM3 AMPM4 ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1 4 12 3 NA 3 11 ## 2 2 6 18 2 3 5 8 ## 3 3 7 10 4 5 7 28 ## 4 4 7 17 4 4 7 40 ## 5 5 8 28 7 NA 11 22 ## 6 6 9 22 5 NA 11 13 ## 7 7 13 24 8 9 13 15 ## 8 8 15 34 8 39 3 NA ## 9 9 16 41 10 NA 21 55 ## 10 10 26 33 11 14 4 54 ## 11 20 27 44 22 46 10 45 ## 12 30 28 58 38 37 9 48 ## 13 40 NA 92 44 62 15 70 ## 14 50 37 82 58 73 26 69 ## 15 60 39 76 74 70 16 82 ## 16 70 40 77 122 69 18 84 ## 17 80 41 78 92 NA 28 79 ## 18 90 41 NA 98 86 24 NA ## 19 100 41 85 97 99 21 113 ## 20 150 42 87 129 135 21 86

Re-oragnizing data

The first thing you might notice is that tidyverse is using tibbles not data.frames (as in base R). The way data are arranged in all is not that effective and so we will re-organize values (initial speed of reaction), keeping in mind that all students used the same substrate concentrations. To do that, we use gather and print the first 30 columns of the new tibble all_1.

#################################
##########  organize ############
#################################

# re-arrange columns so that student's values (vi) have the same Pi
all_1<-gather(all,key = student, value = vi, -Pi)
print(all_1, n = 30)
## # A tibble: 120 x 3

## Pi student vi ## <dbl> <chr> <dbl> ## 1 1 AMPP5 4 ## 2 2 AMPP5 6 ## 3 3 AMPP5 7 ## 4 4 AMPP5 7 ## 5 5 AMPP5 8 ## 6 6 AMPP5 9 ## 7 7 AMPP5 13 ## 8 8 AMPP5 15 ## 9 9 AMPP5 16 ## 10 10 AMPP5 26 ## 11 20 AMPP5 27 ## 12 30 AMPP5 28 ## 13 40 AMPP5 NA ## 14 50 AMPP5 37 ## 15 60 AMPP5 39 ## 16 70 AMPP5 40 ## 17 80 AMPP5 41 ## 18 90 AMPP5 41 ## 19 100 AMPP5 41 ## 20 150 AMPP5 42 ## 21 1 AMPP7 12 ## 22 2 AMPP7 18 ## 23 3 AMPP7 10 ## 24 4 AMPP7 17 ## 25 5 AMPP7 28 ## 26 6 AMPP7 22 ## 27 7 AMPP7 24 ## 28 8 AMPP7 34 ## 29 9 AMPP7 41 ## 30 10 AMPP7 33 ## # ... with 90 more rows

As you can see, gather also allow us to assign column names (e.g. student). Next, we will create a new column that specifies which experiment (AMMM or AMPP) was performed. It is easy to do that using mutate (here I also take a sub-string using substr).

# create a new column (we have 2 different experiments)
all_2<-mutate(all_1, exp =  substr(student,1,4))
all_2
## # A tibble: 120 x 4

## Pi student vi exp ## <dbl> <chr> <dbl> <chr> ## 1 1 AMPP5 4 AMPP ## 2 2 AMPP5 6 AMPP ## 3 3 AMPP5 7 AMPP ## 4 4 AMPP5 7 AMPP ## 5 5 AMPP5 8 AMPP ## 6 6 AMPP5 9 AMPP ## 7 7 AMPP5 13 AMPP ## 8 8 AMPP5 15 AMPP ## 9 9 AMPP5 16 AMPP ## 10 10 AMPP5 26 AMPP ## # ... with 110 more rows

I finally re-order columns depending on their types (just because it easier to read) using relocate.

# re-order
all_3<-relocate(all_2,where(is.character), .before = where(is.numeric))
print(all_3, n = 30)
## # A tibble: 120 x 4

## student exp Pi vi ## <chr> <chr> <dbl> <dbl> ## 1 AMPP5 AMPP 1 4 ## 2 AMPP5 AMPP 2 6 ## 3 AMPP5 AMPP 3 7 ## 4 AMPP5 AMPP 4 7 ## 5 AMPP5 AMPP 5 8 ## 6 AMPP5 AMPP 6 9 ## 7 AMPP5 AMPP 7 13 ## 8 AMPP5 AMPP 8 15 ## 9 AMPP5 AMPP 9 16 ## 10 AMPP5 AMPP 10 26 ## 11 AMPP5 AMPP 20 27 ## 12 AMPP5 AMPP 30 28 ## 13 AMPP5 AMPP 40 NA ## 14 AMPP5 AMPP 50 37 ## 15 AMPP5 AMPP 60 39 ## 16 AMPP5 AMPP 70 40 ## 17 AMPP5 AMPP 80 41 ## 18 AMPP5 AMPP 90 41 ## 19 AMPP5 AMPP 100 41 ## 20 AMPP5 AMPP 150 42 ## 21 AMPP7 AMPP 1 12 ## 22 AMPP7 AMPP 2 18 ## 23 AMPP7 AMPP 3 10 ## 24 AMPP7 AMPP 4 17 ## 25 AMPP7 AMPP 5 28 ## 26 AMPP7 AMPP 6 22 ## 27 AMPP7 AMPP 7 24 ## 28 AMPP7 AMPP 8 34 ## 29 AMPP7 AMPP 9 41 ## 30 AMPP7 AMPP 10 33 ## # ... with 90 more rows

This all works nicely but we could be more efficient using the pipe operator %>%, which allows to perform a sequence of operations on a primary object.

# one-ish line with pipe %>%
all_t <- all %>% # select tibble
  gather(key = student, value = vi, -Pi)  %>%  
  mutate(exp =  substr(student,1,4)) %>%  
  relocate(where(is.character), .before = where(is.numeric))

Both tibbles are indeed identical.

all_equal(all_t,all_3)
## [1] TRUE
Graphs

We will use ggplot2 to create graphs using the groups we have defined previously (students and experiments) [BTW it is also part of tidyverse]. There it is easy to assign specific colours and/or shapes depending on the groups and also to plot specific groups. Just look at the scripts below.

#################################
##########  graphs  #############
#################################

# plot: 2 experiments = 2 shapes, 6 students = 6 grey levels
p1<-ggplot(data=all_t) +
  geom_point(aes(x=Pi, y=vi, color=student , shape = exp),size=3) +
  scale_color_grey()+
  theme_light(base_size=14)
p1
## Warning: Removed 9 rows containing missing values (geom_point).

# plot: all students having performed exp AMPP
p2<-ggplot(filter(all_t, exp == 'AMPP')) +
  geom_point(aes(x=Pi, y=vi, color=student),size=3) +
  scale_color_grey()+
  theme_light(base_size=14)
p2
## Warning: Removed 2 rows containing missing values (geom_point).

# plot: all students having performed exp AMPM
p3<-ggplot(filter(all_t, exp == 'AMPM')) +
  geom_point(aes(x=Pi, y=vi, color=student),size=3) +
  scale_color_grey()+
  theme_light(base_size=14)
p3
## Warning: Removed 7 rows containing missing values (geom_point).

Basic Operations

First thing we can do is to average on multiple rows. There are different ways to do that. Here, I have chosen to use the rowwise function. As you can see below, it is easy to select all columns satisfying a certain condition or specific columns only.

#################################
##########  Basic Op ############
#################################

Mean_all <- all  %>% # select tibble all
  rowwise(Pi) %>% # row-by-row Operations (here I select Pi)
  mutate(MeanAMMM = mean(c_across(starts_with("AMPM")))) %>%
  mutate(MeanAMMP = mean(c_across(starts_with("AMPP"))))
Mean_all
## # A tibble: 20 x 9
## # Rowwise:  Pi

## Pi AMPP5 AMPP7 AMPM2 AMPM1 AMPM3 AMPM4 MeanAMMM MeanAMMP ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1 4 12 3 NA 3 11 NA 8 ## 2 2 6 18 2 3 5 8 4.5 12 ## 3 3 7 10 4 5 7 28 11 8.5 ## 4 4 7 17 4 4 7 40 13.8 12 ## 5 5 8 28 7 NA 11 22 NA 18 ## 6 6 9 22 5 NA 11 13 NA 15.5 ## 7 7 13 24 8 9 13 15 11.2 18.5 ## 8 8 15 34 8 39 3 NA NA 24.5 ## 9 9 16 41 10 NA 21 55 NA 28.5 ## 10 10 26 33 11 14 4 54 20.8 29.5 ## 11 20 27 44 22 46 10 45 30.8 35.5 ## 12 30 28 58 38 37 9 48 33 43 ## 13 40 NA 92 44 62 15 70 47.8 NA ## 14 50 37 82 58 73 26 69 56.5 59.5 ## 15 60 39 76 74 70 16 82 60.5 57.5 ## 16 70 40 77 122 69 18 84 73.2 58.5 ## 17 80 41 78 92 NA 28 79 NA 59.5 ## 18 90 41 NA 98 86 24 NA NA NA ## 19 100 41 85 97 99 21 113 82.5 63 ## 20 150 42 87 129 135 21 86 92.8 64.5

Mean_select <- all  %>% # select tibble all
  rowwise(Pi) %>% # row-by-row Operations (here I select Pi)
  mutate(MeanAMMM = mean(c(AMPM2,AMPM3,AMPP4))) %>%
  mutate(MeanAMMP = mean(c_across(starts_with("AMPP"))))
Mean_select
## # A tibble: 20 x 9
## # Rowwise:  Pi

## Pi AMPP5 AMPP7 AMPM2 AMPM1 AMPM3 AMPM4 MeanAMMM MeanAMMP ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1 4 12 3 NA 3 11 5.67 8 ## 2 2 6 18 2 3 5 8 5 12 ## 3 3 7 10 4 5 7 28 13 8.5 ## 4 4 7 17 4 4 7 40 17 12 ## 5 5 8 28 7 NA 11 22 13.3 18 ## 6 6 9 22 5 NA 11 13 9.67 15.5 ## 7 7 13 24 8 9 13 15 12 18.5 ## 8 8 15 34 8 39 3 NA NA 24.5 ## 9 9 16 41 10 NA 21 55 28.7 28.5 ## 10 10 26 33 11 14 4 54 23 29.5 ## 11 20 27 44 22 46 10 45 25.7 35.5 ## 12 30 28 58 38 37 9 48 31.7 43 ## 13 40 NA 92 44 62 15 70 43 NA ## 14 50 37 82 58 73 26 69 51 59.5 ## 15 60 39 76 74 70 16 82 57.3 57.5 ## 16 70 40 77 122 69 18 84 74.7 58.5 ## 17 80 41 78 92 NA 28 79 66.3 59.5 ## 18 90 41 NA 98 86 24 NA NA NA ## 19 100 41 85 97 99 21 113 77 63 ## 20 150 42 87 129 135 21 86 78.7 64.5

Fitting

Let's first reshape the tibble and select only averaged values.

#################################
##########  Fitting  ############
#################################

mean_reshape<-Mean_select%>% 
  select(starts_with("Mean"),Pi) %>% 
  gather(key = exp_type, value = vi,-Pi) 
print(mean_reshape,n=30)
## # A tibble: 40 x 3

## Pi exp_type vi ## <dbl> <chr> <dbl> ## 1 1 MeanAMMM 5.67 ## 2 2 MeanAMMM 5 ## 3 3 MeanAMMM 13 ## 4 4 MeanAMMM 17 ## 5 5 MeanAMMM 13.3 ## 6 6 MeanAMMM 9.67 ## 7 7 MeanAMMM 12 ## 8 8 MeanAMMM NA ## 9 9 MeanAMMM 28.7 ## 10 10 MeanAMMM 23 ## 11 20 MeanAMMM 25.7 ## 12 30 MeanAMMM 31.7 ## 13 40 MeanAMMM 43 ## 14 50 MeanAMMM 51 ## 15 60 MeanAMMM 57.3 ## 16 70 MeanAMMM 74.7 ## 17 80 MeanAMMM 66.3 ## 18 90 MeanAMMM NA ## 19 100 MeanAMMM 77 ## 20 150 MeanAMMM 78.7 ## 21 1 MeanAMMP 8 ## 22 2 MeanAMMP 12 ## 23 3 MeanAMMP 8.5 ## 24 4 MeanAMMP 12 ## 25 5 MeanAMMP 18 ## 26 6 MeanAMMP 15.5 ## 27 7 MeanAMMP 18.5 ## 28 8 MeanAMMP 24.5 ## 29 9 MeanAMMP 28.5 ## 30 10 MeanAMMP 29.5 ## # ... with 10 more rows

To fit multiple data at once, I will use simple data.frames (it is possible to it with tidyverse but I think it is a bit more difficult). I have chosen to fit the 2 averaged data with a hyperbola (which, by the way, is not the correct function for those data) and used the method I have described in a previous post. Also notice that I have used the geom_smooth function.

.
df<-data.frame(mean_reshape) # it is  easier to do it with data.frame
df$exp_type<-as.factor(df$exp_type) # Do not forget to convert char as factors
df<-na.omit(df)
fitexp<-nls(vi ~ (Pi*a[exp_type])/(Pi+b[exp_type]), data = df , start=list(a=c(1,1),b=c(1,1)))

p4<-ggplot(df) +
  geom_point(aes(x=Pi, y=vi, color=exp_type),size=3) +
  geom_smooth(aes(x=Pi, y=predict(fitexp), colour=exp_type), method = "gam")+
  scale_color_grey()+
  theme_light(base_size=14)
p4
## `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'