你认为这个站点中提到的问题4的答案是什么?
答案是对还是错?
问题: 4
In the standard word count MapReduce algorithm, why might using a combiner reduce theoverall Job running time?
A. Because combiners perform local aggregation of word counts, thereby allowing the mappers to process input data faster.
B. Because combinersperform local aggregation of word counts, thereby reducing the number of mappers that need to run.
C. Because combiners perform local aggregation of word counts, and then transfer that data toreducers without writing the intermediate data to disk.
D. Because combiners perform local aggregation of word counts, thereby reducing the number of key-value pairs that need to be snuff let across the network to the reducers.
Answer:A和
问题: 3
What happens in a MapReduce job when you set the number of reducers to one?
A. A single reducer gathers and processes all the output from all the mappers. The output iswritten in as many separate files as there are mappers.
B. A single reducer gathers andprocesses all the output from all the mappers. The output iswritten to a single file in HDFS.
C. Setting the number of reducers to one creates a processing bottleneck, and since the number of reducers as specified by the programmer is used as a reference value only, the MapReduceruntime provides a default setting for the number of reducers.
D. Setting the number of reducers to one is invalid, and an exception is thrown.
Answer:A根据我对上述问题的理解
Question 4: D
Question 3: B更新
You have user profile records in your OLTP database,that you want to join with weblogs you have already ingested into HDFS.How will you obtain these user records?
Options
A. HDFS commands
B. Pig load
C. Sqoop import
D. Hive
Answer:B对于更新的问题,我的答案我对B和C表示怀疑
编辑
正确答案:Sqoop。
发布于 2014-09-29 11:43:41
据我理解,这两种答案都是错误的。
我对Combiner的工作不多,但无论在哪里,我都发现它在处理Mapper的输出。问题4的答案应该是D。
再一次从实际经验中,我发现输出文件的数量总是等于Reducer的数量,所以对问题3的答案应该是B。在使用MultipleOutputs时,情况可能不是这样,但这并不常见。
最后,我认为Apache不会在MapReduce上撒谎(异常确实会发生:)。这两个问题的答案都可以在他们的wiki页面中找到。看一看。
顺便说一下,我喜欢"100%的通行证保证或你的钱回来!“引用你提供的链接;-)
编辑
不确定更新部分中的问题,因为我对Pig & Sqoop知之甚少。但是,当然也可以通过在HDFS数据上创建外部表来使用Hive &然后加入。
更新
在用户milk3422 &所有者发表评论后,我进行了一些搜索,发现由于涉及另一个OLTP数据库,我认为Hive是最后一个问题的答案是错误的。正确的答案应该是C,因为Sqoop设计用于在HDFS和关系数据库之间传输数据。
发布于 2014-09-30 16:04:39
问题4和问题3的答案在我看来是正确的。对于问题4,它是相当合理的,因为在使用组合器时,映射输出被保存在集合n中,首先处理,然后缓冲区在满时被刷新。为了证明这一点,我将添加以下链接:http://wiki.apache.org/hadoop/HadoopMapReduce
在这里,它清楚地说明了为什么组合器将增加进程的速度。
另外,我认为q.3的答案也是正确的,因为一般来说,这是基本配置,然后是默认配置。为了证明这一点,我将添加另一个信息丰富的链接:https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-7/mapreduce-types
https://stackoverflow.com/questions/26098412
复制相似问题