Document Type

Honors Project - Open Access

Abstract

As the quantity of structured and unstructured data increases, data processing
experts have turned to systems that analyze data using many computers in parallel.
This study looks at two systems designed for these needs: MapReduce and parallel
databases. In the MapReduce programming model, users express their problem in
terms of a map function and a reduce function. Parallel databases organize data as a
system of tables representing entities and relationships between them. Previous
comparison studies have focused on performance, concluding that these two
systems are complimentary. Parallel databases scored high on performance and
MapReduce scored high on flexibility in handling unstructured data. Both systems
offer a querying language: Pig Latin for MapReduce systems and SQL for parallel
databases. This study compares the operations, query structure and support for
user defined functions in these languages. The findings offer data processing
experts insights into how data organization and querying structure affects data
analysis.

Included in

Mathematics Commons

Share

COinS
 
 

© Copyright is owned by author of this document