[go: nahoru, domu]

MapReduce: Difference between revisions

Content deleted Content added
→‎Lack of novelty: Non-parallel map/reduce in Common Lisp in 1984
→‎Restricted programming framework: Graph processing is a better example than plain old machine learning
Line 210:
 
===Restricted programming framework===
MapReduce tasks must be written as acyclic dataflow programs, i.e. a stateless mapper followed by a stateless reducer, that are executed by a batch job scheduler. This paradigm makes repeated querying of datasets difficult and imposes limitations that are felt in fields such as [[machineGraph learning(abstract data type)|graph]] processing<ref>{{cite conference |url=https://csc.csudh.edu/btang/seminar/papers/BigD399.pdf |title=Map-Based Graph Analysis on MapReduce |last1=Gupta |first1=Upa |last2=Fegaras |first2=Leonidas |date=2013-10-06 |publisher=[[IEEE]] |book-title=Proceedings: 2013 IEEE International Conference on Big Data |pages=24-30 |location=[[Santa Clara, California]] |conference=2013 IEEE International Conference on Big Data}}</ref> where iterative algorithms that revisit a single [[working set]] multiple times are the norm, as well as, in the presence of [[Hard disk drive|disk]]-based data with high [[Latency (engineering)#Mechanics|latency]], even the field of [[machine learning]] where multiple passes through the data are required even though algorithms can tolerate serial access to the data each pass.<ref>{{cite conference |first1=Matei| last1=Zaharia| first2=Mosharaf |last2=Chowdhury| first3=Michael| last3=Franklin| first4=Scott| last4=Shenker| first5=Ion| last5=Stoica |title=Spark: Cluster Computing with Working Sets |title-linkurl=https://amplab.cs.berkeley.edu/wp-content/uploads/2011/06/Spark (cluster computing framework)-Cluster-Computing-with-Working-Sets.pdf |conference=HotCloud 2010|date=June 2010}}</ref>
 
==See also==