Hybrid.poly: An Interactive Large-scale In-memory Analytical Polystore

Maksim Podkorytov, Dylan Soderman, Michael Gubanov

November 2017

Abstract:

Anecdotal evidence suggests that the variety of Big data is one of the most challenging problems in Computer Science research today [Stonebraker, 2012], [Ou et al., 2017], [Guo et al., 2016], [Bai et al., 2016]. First, Big data comes at us from a myriad of data sources, hence its shape and flavor differ. Second, hundreds of data management systems which work with Big data support different APIs and storage/indexing schemes while exposing data to the users through the data model lens, specific to each system. These differences can impede work for users who simply want an accessible interface which can handle relevant unstructured data that is stored within a back-end system. Naturally, such discrepancies in formats, sizes, and shapes can also complicate the development of analytical algorithms which could be implemented on top of large-scale, heterogeneous datasets. [Gubanov, 2017b] introduced a consolidated polystore engine, designed to seamlessly ingest and query any type of large-scale data. In this paper we describe a variety of complex analytical workloads that can be processed by such polystore as well as associated research challenges.

Full text:

Please refer to IEEE.

Maksim Podkorytov

Ph.D. Student

Learning to be a researcher