Handling Cassandra Datasets with Hadoop-Streaming

Akash Suryawanshi; Pratik Patil

doi:10.17148/IJARCCE.2017.61106

← Back to VOLUME 6, ISSUE 11, NOVEMBER 2017

Handling Cassandra Datasets with Hadoop-Streaming

Akash Suryawanshi, Pratik Patil

DOI: 10.17148/IJARCCE.2017.61106

Abstract: The dynamic change in the idea of both logical and mechanical datasets has been the main impetus behind the advancement and research interests in the NoSQL display. Inexactly organized information represents a test to conventional information store frameworks, and when working with the NoSQL demonstrates, these frameworks are regularly viewed as illogical and exorbitant. As the amount and nature of unstructured information develops, so does the interest for a preparing pipeline that is able to do consistently joining the NoSQL stockpiling model and a "Major Data" handling stage for example, MapReduce. In spite of the fact that MapReduce is the worldview of decision for information serious processing, Java-based systems such as Hadoop expect clients to compose MapReduce code in Java while Hadoop Streaming module enables clients to characterize non Java executables as guide and lessen operations. Whenever stood up to with inheritance C/C++ applications and other non-Java executables, there emerges a further need to permit NoSQL information stores get to the highlights of Hadoop Streaming. We introduce approaches in comprehending the test of coordinating NoSQL information stores with MapReduce under non-Java application situations, alongside points of interest and drawbacks of each approach. We look at Hadoop Streaming nearby our own particular spilling system, MARISSA, to indicate execution ramifications of coupling NoSQL information stores like Cassandra with MapReduce structures that typically depend on document framework based information stores. Our trials additionally incorporate Hadoop-C*, which is where a Hadoop group is co-situated with a Cassandra group keeping in mind the end goal to process information.

Keywords: Cassandra, HADOOP.

Downloads: Download PDF|DOI: 10.17148/IJARCCE.2017.61106

How to Cite:

[1] Akash Suryawanshi, Pratik Patil, “Handling Cassandra Datasets with Hadoop-Streaming,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2017.61106