Montag, 28. März 2011

doRedis: redis as dispatcher for parallel R

Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets. Clients are available for C, C++, C#, Objective-C, Clojure, Common Lisp, Erlang, Go, Haskell, Io, Lua, Perl, Python, PHP, R ruby, scala, smalltalk, tcl...

B. W. Lewis has developed a parallel extension for the foreach package that allows a cluster of workers to obtain workloads from a redis server.

This is the redis binding for R (rredis)

  • redisConnect() # connect to redis store
  • redisSet('x',runif(5)) # store a value
  • redisGet('x') # retrieve value from store
  • redisClose() # close connection
  • redisAuth(pwd) # simple authentication
  • redisConnect()
  • redisLPush('x',1) # push numbers into list
  • redisLPush('x',2)
  • redisLPush('x',3)
  • redisLRange('x',0,2) # retrieve list
using this redis interface for R, the fine R library doRedis allows redis to be a dispatcher for parallel R commands on a cluster. The usage is fairly easy and resembles closely the usage of the doSNOW clustering library:
  • first you have to start a redis server on one of the machines as cluster master.
  • then connect as many R workers as you want to the redis master
  • finally start an R interpreter and connect to the redis master and submit the parallel computation job using the foreach package
the workload is then distributed to the workers (which can reside on other machines than the master) and the result is gathered back to the interactive R interpreter. Here is an example how to use the package:

start the redis server:

./redis-server

start as many workers as you like

echo "require('doRedis');redisWorker('jobs')" > R --no-save -q &

start a R interpreter and connect to the redis server:

library(doRedis)
registerDoRedis('jobs')
foreach(j=1:1000, .combine=sum) %dopar% sum(runif(10000000))

if you want to minimize the communication with the redis server use setChunkSize to send out chunks to each R worker.

1 Kommentar:

NPHard hat gesagt…

I posted an vignette by B. W. Lewis for Rforeach and doredis on my blog.

http://bigcomputing.blogspot.com/2011/04/doredis-parallel-back-end-for-rforeach.html