Mathematics, Statistics, and Computer Science Honors Projects

Comparison Study between MapReduce (MR) and Parallel Data Management Systems (DBMs) in Large Scale Data Analysis

Miriam Lawrence Mchome, Macalester College

Document Type

Honors Project - Open Access

Abstract

As the quantity of structured and unstructured data increases, data processing
experts have turned to systems that analyze data using many computers in parallel.
This study looks at two systems designed for these needs: MapReduce and parallel
databases. In the MapReduce programming model, users express their problem in
terms of a map function and a reduce function. Parallel databases organize data as a
system of tables representing entities and relationships between them. Previous
comparison studies have focused on performance, concluding that these two
systems are complimentary. Parallel databases scored high on performance and
MapReduce scored high on flexibility in handling unstructured data. Both systems
offer a querying language: Pig Latin for MapReduce systems and SQL for parallel
databases. This study compares the operations, query structure and support for
user defined functions in these languages. The findings offer data processing
experts insights into how data organization and querying structure affects data
analysis.

Recommended Citation

Mchome, Miriam Lawrence, "Comparison Study between MapReduce (MR) and Parallel Data Management Systems (DBMs) in Large Scale Data Analysis" (2011). Mathematics, Statistics, and Computer Science Honors Projects. 21.
https://digitalcommons.macalester.edu/mathcs_honors/21

Download

Included in

Mathematics Commons

COinS

Mathematics, Statistics, and Computer Science Honors Projects

Comparison Study between MapReduce (MR) and Parallel Data Management Systems (DBMs) in Large Scale Data Analysis

Document Type

Abstract

Recommended Citation

Included in

Search

Author Corner

About

Browse

Links

Mathematics, Statistics, and Computer Science Honors Projects

Comparison Study between MapReduce (MR) and Parallel Data Management Systems (DBMs) in Large Scale Data Analysis

Authors

Document Type

Abstract

Recommended Citation

Included in

Share

Search

Author Corner

About

Browse

Links