Document Type
Honors Project - Open Access
Abstract
As the quantity of structured and unstructured data increases, data processing
experts have turned to systems that analyze data using many computers in parallel.
This study looks at two systems designed for these needs: MapReduce and parallel
databases. In the MapReduce programming model, users express their problem in
terms of a map function and a reduce function. Parallel databases organize data as a
system of tables representing entities and relationships between them. Previous
comparison studies have focused on performance, concluding that these two
systems are complimentary. Parallel databases scored high on performance and
MapReduce scored high on flexibility in handling unstructured data. Both systems
offer a querying language: Pig Latin for MapReduce systems and SQL for parallel
databases. This study compares the operations, query structure and support for
user defined functions in these languages. The findings offer data processing
experts insights into how data organization and querying structure affects data
analysis.
Recommended Citation
Mchome, Miriam Lawrence, "Comparison Study between MapReduce (MR) and Parallel Data Management Systems (DBMs) in Large Scale Data Analysis" (2011). Mathematics, Statistics, and Computer Science Honors Projects. 21.
https://digitalcommons.macalester.edu/mathcs_honors/21
Included in
© Copyright is owned by author of this document