Logo image
Towards unified secure on- and off-line analytics at scale
Journal article   Peer reviewed

Towards unified secure on- and off-line analytics at scale

P. Coetzee, M. Leeke and S. Jarvis
Parallel computing, Vol.40(10), pp.738-753
01/12/2014

Abstract

Computer Science Computer Science, Theory & Methods Science & Technology Technology
Data scientists have applied various analytic models and techniques to address the oft-cited problems of large volume, high velocity data rates and diversity in semantics. Such approaches have traditionally employed analytic techniques in a streaming or batch processing paradigm. This paper presents CRUCIBLE, a first-in-class framework for the analysis of large-scale datasets that exploits both streaming and batch paradigms in a unified manner. The CRUCIBLE framework includes a domain specific language for describing analyses as a set of communicating sequential processes, a common runtime model for analytic execution in multiple streamed and batch environments, and an approach to automating the management of cell-level security labelling that is applied uniformly across runtimes. This paper shows the applicability of CRUCIBLE to a variety of state-of-the-art analytic environments, and compares a range of runtime models for their scalability and performance against a series of native implementations. The work demonstrates the significant impact of runtime model selection, including improvements of between 2.3 x and 480x between runtime models, with an average performance gap of just 14x between CRUCIBLE and a suite of equivalent native implementations. (C) 2014 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/3.0/).
url
https://doi.org/10.1016/j.parco.2014.07.004View
Published (Version of record) Open

Metrics

1 Record Views

Details

Logo image

Usage Policy