Hadoop概念_Hadoop_Mapreduce - Fatal编程技术网

Hadoop概念

hadoop mapreduce

Hadoop概念,hadoop,mapreduce,Hadoop,Mapreduce,我正在使用hadoop处理一个视频，它使用的是开源接口HVPI。但是，更准确地说，在isSplitableobContext（context，Path file）方法中，inputsplit的实现返回一个false。默认情况下，此方法返回true，但在当前实现中，有理由返回false。如果此方法返回false我将只有一个映射任务。如果我没有记错的话，hadoop会为每个输入分配一个容器，该容器对应于执行映射任务的网络中某个节点的计算资源，该节点最好包含将要处理的数据。如果我有一个false，我只

我正在使用hadoop处理一个视频，它使用的是开源接口HVPI。但是，更准确地说，在

isSplitableobContext（context，Path file）

方法中，inputsplit的实现返回一个

false

。默认情况下，此方法返回

true

，但在当前实现中，有理由返回

false

。如果此方法返回

false

我将只有一个映射任务。如果我没有记错的话，hadoop会为每个输入分配一个容器，该容器对应于执行映射任务的网络中某个节点的计算资源，该节点最好包含将要处理的数据。如果我有一个

false

，我只会有一个输入拆分，因此只有一个映射任务，而这个映射任务将只在集群节点上运行。最大的问题是，一个唯一的映射任务如何利用集群的所有cpu资源，而不仅仅是单个节点上的单个容器？

请详细说明：

尝试找到一种可用于视频文件的输入格式，或者自己编写一种。FileInputFormat是所有文件的基类

Lets try to understand what is the problem . 
1. One takes a file and divides it into fileSplits. 
2. Each split is consumed by one mapper. 
3. How do you make sure a record in the file is not split across two file splits. 
4. A record cant be ignored nor read partially. 
5. A InputFormat takes care of carefully splitting the file and handling situations when a record is split at the boundary of file splits. 
6. Hadoop has varios inpuit formats like TextInputFormat, KeyValueTextInputFormat