Tuesday, November 27, 2012
Fast Clustering in R and byte comilation.
Ran into a problem with having to cluster a large data set for creating heatmaps in R. Normally hclust() is not bad for small datasets (less then 1000 rows) but rapidly gets time consuming as the data size grows. A quick dig into the complexity reveals hclust() to be Θ(N³).
Found an R package (“fastcluster”) which does significantly better for most clustering methods(single, complete, avg, weighted) being Θ(N²). Now that’s a pretty decent improvement.
Still, with a dataset dim(45.000 , 200). I’m still looking at a few hours….:(
After some digging found that R released a byte compilation library in version 2.13.0. It essentially lets to compile you function into binary before execution which can speed things up to five fold.
http://dirk.eddelbuettel.com/blog/2011/04/12/
The implementation is pretty straight forward.
myFunction <- function();
library(compiler)
cmyFunction <- cmpfun(myFunction);
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment