Posted on

A week ago, I posted a post attempting to speed up FDR filtering in R with Rcpp. However, that function requires manual sorting the of the p-values before throwing into the function.

As I did some research on sorting algorithms and index sorting in c++, I optimized my code with std::pair and std::sort in c++. And surprisingly (or not), this questions has been asked a lot of times on stackoverflow and other websites.

So testing again:

library(data.table)
library(dplyr)
library(rbenchmark)
library(Rcpp)

sourceCpp('~/scripts/R/Rcpp/FDRcontrol.cpp')

r_equivalent <- function(pv,alpha){
	padj = p.adjust(pv,method='BH')
	return(ifelse(padj < alpha,1,0))
}

dat <- fread('~/scripts/R/Rcpp/pvalues.tsv')

alpha <- 0.01
p <- dat$p



result <- dat %>% 
		mutate(r = r_equivalent(p,alpha),
				cpp = FDRcontrol(p,alpha))
result
##                   p r cpp
##     1: 2.305163e-03 1   1
##     2: 6.429712e-01 0   0
##     3: 1.505003e-06 1   1
##     4: 1.060704e-04 1   1
##     5: 2.014278e-06 1   1
##    ---                   
## 40502: 7.241562e-01 0   0
## 40503: 5.023386e-02 0   0
## 40504: 6.178446e-01 0   0
## 40505: 8.729983e-01 0   0
## 40506: 4.176789e-01 0   0
result %>%
	filter(r!=cpp)
## Empty data.table (0 rows) of 3 cols: p,r,cpp

Result is the same, so benchmarking as usual,

benchmark(r_equivalent(p,alpha),FDRcontrol(p,alpha))
##                     test replications elapsed relative user.self sys.self
## 2   FDRcontrol(p, alpha)          100   0.578     1.00      0.56    0.018
## 1 r_equivalent(p, alpha)          100   4.653     8.05      4.51    0.133
##   user.child sys.child
## 2          0         0
## 1          0         0

Apparently, R and matlab/octave did a great job on easing the job for user-end work. Some dat-to-day functions, such as sort and return the indices, are so easily done in high-level languages. As a biologist, I have never thought about many of these questions until I start to hack the codes. And it's always the best way to understand a concept when I start coding it out.