问Hadoop概念
EN

Stack Overflow用户

提问于 2017-03-30 20:04:25

回答 1查看 42关注 0票数 0

我正在使用hadoop来处理使用HVPI的视频，HVPI是一个开源接口。但是，在isSplitableobContext (context, Path file)方法中，inputsplit的实现会返回一个false。默认情况下，此方法返回true，但在当前实现中，有理由返回false。如果此方法返回false，我将只有一个map任务。如果我没记错的话，hadoop会为每个输入拆分分配一个容器，该容器对应于执行map任务的网络的某个节点的计算资源，该节点最好包含将要处理的数据。如果我有一个false，我将只有一个输入拆分，因此只有一个映射任务，这个映射任务将只在集群节点上运行。最大的问题是唯一的map任务如何利用集群的所有cpu资源，而不仅仅是单个节点上的单个容器？

hadoop

mapreduce

回答 1

Stack Overflow用户

发布于 2017-04-13 21:39:26

请通过：

http://bytepadding.com/big-data/map-reduce/understanding-map-reduce-the-missing-guide/

Lets try to understand what is the problem . 
1. One takes a file and divides it into fileSplits. 
2. Each split is consumed by one mapper. 
3. How do you make sure a record in the file is not split across two file splits. 
4. A record cant be ignored nor read partially. 
5. A InputFormat takes care of carefully splitting the file and handling situations when a record is split at the boundary of file splits. 
6. Hadoop has varios inpuit formats like TextInputFormat, KeyValueTextInputFormat

尝试找到一种可用于视频文件的输入格式，或者自己编写一种。FileInputFormat是所有类的基类。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/43117370

复制

相似问题

问Hadoop概念
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Hadoop概念EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Hadoop概念
EN