Transparent runtime parallelization of the R scripting language

Jiangtian Li, Xiaosong Ma, Srikanth Yoginath, Guruprasad Kora, Nagiza F. Samatova

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Scripting languages such as R and Matlab are widely used in scientific data processing. As the data volume and the complexity of analysis tasks both grow, sequential data processing using these tools often becomes the bottleneck in scientific workflows. We describe pR, a runtime framework for automatic and transparent parallelization of the popular R language used in statistical computing. Recognizing scripting languages' interpreted nature and data analysis codes' use pattern, we propose several novel techniques: (1) applying parallelizing compiler technology to runtime, whole-program dependence analysis of scripting languages, (2) incremental code analysis assisted with evaluation results, and (3) runtime parallelization of file accesses. Our framework does not require any modification to either the source code or the underlying R implementation. Experimental results demonstrate that pR can exploit both task and data parallelism transparently and overall has better performance as well as scalability compared to an existing parallel R package that requires code modification.

Original languageEnglish
Pages (from-to)157-168
Number of pages12
JournalJournal of Parallel and Distributed Computing
Volume71
Issue number2
DOIs
Publication statusPublished - 1 Feb 2011
Externally publishedYes

Fingerprint

Parallelization
Parallelizing Compilers
Statistical Computing
Data Parallelism
Scientific Workflow
Scalability
MATLAB
Data analysis
Language
Evaluation
Experimental Results
Demonstrate
Framework

Keywords

  • Incremental analysis
  • Runtime parallelization
  • Scripting languages

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture
  • Software
  • Theoretical Computer Science

Cite this

Transparent runtime parallelization of the R scripting language. / Li, Jiangtian; Ma, Xiaosong; Yoginath, Srikanth; Kora, Guruprasad; Samatova, Nagiza F.

In: Journal of Parallel and Distributed Computing, Vol. 71, No. 2, 01.02.2011, p. 157-168.

Research output: Contribution to journalArticle

Li, Jiangtian ; Ma, Xiaosong ; Yoginath, Srikanth ; Kora, Guruprasad ; Samatova, Nagiza F. / Transparent runtime parallelization of the R scripting language. In: Journal of Parallel and Distributed Computing. 2011 ; Vol. 71, No. 2. pp. 157-168.
@article{1cdfe28ee4864934b6fb35bab8ba98d6,
title = "Transparent runtime parallelization of the R scripting language",
abstract = "Scripting languages such as R and Matlab are widely used in scientific data processing. As the data volume and the complexity of analysis tasks both grow, sequential data processing using these tools often becomes the bottleneck in scientific workflows. We describe pR, a runtime framework for automatic and transparent parallelization of the popular R language used in statistical computing. Recognizing scripting languages' interpreted nature and data analysis codes' use pattern, we propose several novel techniques: (1) applying parallelizing compiler technology to runtime, whole-program dependence analysis of scripting languages, (2) incremental code analysis assisted with evaluation results, and (3) runtime parallelization of file accesses. Our framework does not require any modification to either the source code or the underlying R implementation. Experimental results demonstrate that pR can exploit both task and data parallelism transparently and overall has better performance as well as scalability compared to an existing parallel R package that requires code modification.",
keywords = "Incremental analysis, Runtime parallelization, Scripting languages",
author = "Jiangtian Li and Xiaosong Ma and Srikanth Yoginath and Guruprasad Kora and Samatova, {Nagiza F.}",
year = "2011",
month = "2",
day = "1",
doi = "10.1016/j.jpdc.2010.08.013",
language = "English",
volume = "71",
pages = "157--168",
journal = "Journal of Parallel and Distributed Computing",
issn = "0743-7315",
publisher = "Academic Press Inc.",
number = "2",

}

TY - JOUR

T1 - Transparent runtime parallelization of the R scripting language

AU - Li, Jiangtian

AU - Ma, Xiaosong

AU - Yoginath, Srikanth

AU - Kora, Guruprasad

AU - Samatova, Nagiza F.

PY - 2011/2/1

Y1 - 2011/2/1

N2 - Scripting languages such as R and Matlab are widely used in scientific data processing. As the data volume and the complexity of analysis tasks both grow, sequential data processing using these tools often becomes the bottleneck in scientific workflows. We describe pR, a runtime framework for automatic and transparent parallelization of the popular R language used in statistical computing. Recognizing scripting languages' interpreted nature and data analysis codes' use pattern, we propose several novel techniques: (1) applying parallelizing compiler technology to runtime, whole-program dependence analysis of scripting languages, (2) incremental code analysis assisted with evaluation results, and (3) runtime parallelization of file accesses. Our framework does not require any modification to either the source code or the underlying R implementation. Experimental results demonstrate that pR can exploit both task and data parallelism transparently and overall has better performance as well as scalability compared to an existing parallel R package that requires code modification.

AB - Scripting languages such as R and Matlab are widely used in scientific data processing. As the data volume and the complexity of analysis tasks both grow, sequential data processing using these tools often becomes the bottleneck in scientific workflows. We describe pR, a runtime framework for automatic and transparent parallelization of the popular R language used in statistical computing. Recognizing scripting languages' interpreted nature and data analysis codes' use pattern, we propose several novel techniques: (1) applying parallelizing compiler technology to runtime, whole-program dependence analysis of scripting languages, (2) incremental code analysis assisted with evaluation results, and (3) runtime parallelization of file accesses. Our framework does not require any modification to either the source code or the underlying R implementation. Experimental results demonstrate that pR can exploit both task and data parallelism transparently and overall has better performance as well as scalability compared to an existing parallel R package that requires code modification.

KW - Incremental analysis

KW - Runtime parallelization

KW - Scripting languages

UR - http://www.scopus.com/inward/record.url?scp=78650418266&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650418266&partnerID=8YFLogxK

U2 - 10.1016/j.jpdc.2010.08.013

DO - 10.1016/j.jpdc.2010.08.013

M3 - Article

VL - 71

SP - 157

EP - 168

JO - Journal of Parallel and Distributed Computing

JF - Journal of Parallel and Distributed Computing

SN - 0743-7315

IS - 2

ER -