GPU Parallel Processing

GPU Parallel Programming

One of the issues facing Epidemiology is the analysis of larget datsets.  As more and more data is being collected and this data is being collated, such as combining diagnostic codes, laboratory results, pharmacy records and vital signs (heart rate, blood pressure, etc) and many other sub-sets of data, it is becoming increasingly difficult to analyze this data using single thread/process/core computing.

GPU computing is one potential method that could be utilized to increase the time it takes when processing this data.  There are several areas of interest that we have:

  • What statistical and machine learning methods can be utilized with GPUs
  • Evaluation and Performance metrics of utilizing R with GPUs
  • What are the best Epidemiological and Health Services Research problems that are best suited to GPUs 
  • How best to develop code that utilizes the processors and memory associated with it.  

Nvidia Telsa (NvidiaEpi01)

Picture

We (Dr. Douglas R Hamilton, Dr. Chip Doig and Dean Yergens) were fortunate to receive a hardware donation of 2 Nvidia Telsa C2075 cards from Nvidia (described below).  

We configured a computer to host these two C2075 cards.  We decided to use a Nvidia branded case (pictured).  The hardware configuration is as follows:

  • NvidiaEpi01 (Intel i7 2600K 3.4ghz, 16gb memory, 1tb, 2xTesla C2075 (6gb + 448 cuda cores))

Two other computers have been built since.

  • NvidiaEpi02 (AMD 8-core, 32gb memory, 256ssd+3tb, GTX 260 (896mb + 192 cuda cores))
  • NvidiaEpi03 (AMD 8-core, 32gb memory, 256ssd+3tb, Tesla C1060 (4gb + 240 cuda cores) + Nvidia GeForce 640 (4gb + 384 cores))

Since, we were working with R we needed to set up the computer with Ubuntu Linux, as one of the R packages we wanted to use (GPUTools) was only supported in Linux.  


Nvidia Tesla C2075 cards 

Picture

The Nvidia Tesla C2075 card has the following configuration:

  • 448 Cuda Cores
  • 6gb memory

The C2075 is a Fermi architecture with 6GB memory, 14 multiprocessors, and 32 cores per processor (448 cuda cores).  It is connected by PCIe with a speed of  144GB/s.


R Packages that work with Nvidia Telsa Cards

  • GPUTools (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2796814/pdf/btp608.pdf)
  • CudaBayesReg (http://journal.r-project.org/archive/2010-2/RJournal_2010-2_Ferreira~da~Silva.pdf and http://www.jstatsoft.org/v44/i04/paper)
  • CUtil (http://web.warwick.ac.uk/statsdept/useR-2011/abstracts/290311-user2011_abstract_rev3.pdf)
  • ROpenCL (http://repos.openanalytics.eu/html/ROpenCL.html)

Other References:

  • http://web.warwick.ac.uk/statsdept/user2011/TalkSlides/Contributed/16Aug_1115_FocusI_3-HighPerfComp_1-Ligtenberg.pdf
  • High Performance Data Mining Using R on Heterogeneous Platforms (http://users.eecs.northwestern.edu/~choudhar/Publications/HighPerformanceDataMiningUsingRHeterogeneousPlatforms.pdf)
  • A Short Note on Gaussian Process Modeling for Large Datasets using Graphics Processing Units (http://arxiv.org/pdf/1203.1269v2.pdf)
  • OBANSoft: integrated software for Bayesian statistics and high performance computing with R (http://web.warwick.ac.uk/statsdept/useR-2011/abstracts/300311-user2011_obansoft.pdf)
  • Data Analysis using the R Project for Statistical Computing (https://www.nersc.gov/assets/DataAnalytics/2011/TutorialR2011.pdf)
  • Writing Efficient and Parallel Code in R (http://www.statistik.tu-dortmund.de/~ligges/useR2012/Writing_efficient_and_parallel_code_in_R.pdf)
R Function Package Package Function Description System
hclust GPUTools gpuHclust Linux
svm GPUTools gpuSvmTrain Support Vector Machine of package e1071 Linux
cor GPUTools gpuCor Linux
granger.test GPUTools gpuGranger granger.test of package MSBVAR Linux
GPUTools gpuMi Linux
rhierLinearModel CudaBayesReg rhierLinearModel Bayesm package Linux
ROpenCL Linux

GPU R Tutorials 

http://www.r-tutor.com/gpu-computing
note: Linux Only

Installing GPU Packages in R
http://www.r-tutor.com/gpu-computing/rpud-installation

Hierarchical Cluster Analysis
http://www.r-tutor.com/gpu-computing/clustering/hierarchical-cluster-analysis

SVM with R and GPU
http://www.r-tutor.com/gpu-computing/svm/rpusvm-1


GPU Databases


Other GPU Machine Learning (non-R)

GPUMLib 
GPUMLib is an open source Graphic Processing Unit Machine Learning Library.
http://gpumlib.sourceforge.net/
Version 0.1.9 includes:

  • Back-Propagation (BP)
  • Multiple Back-Propagation (MBP)
  • Non-Negative Matrix Factorization (NMF)
  • Radial Basis Function Networks (RBF)
  • Autonomous Training System (ATS) for creating BP and MBP networks
  • Neural Selective Input Model (NSIM) for BP and MPB (NSIM allows neural networks to handle missing values directly)
  • Restricted Boltzmann Machines (RBM)
  • Deep Belief Networks (DBN)

Python Random Forests on GPU (CudaTree)
http://blog.explainmydata.com/2013/10/training-random-forests-in-python-using.html

Anaconda Accelerate
Fast Python for GPUs and multi-core with NumbaPro and MKL Optimizations
https://store.continuum.io/cshop/accelerate/

Neural Networks on GPU

  • http://www.neuroinformatics2011.org/abstracts/speeding-25-fold-neural-network-simulations-with-gpu-processing
  • http://arstechnica.com/science/2011/07/running-high-performance-neural-networks-on-a-gamer-gpu