Java HDFS目录中的文件计数_Java_Hadoop_Hdfs

Java HDFS目录中的文件计数

java hadoop

Java HDFS目录中的文件计数,java,hadoop,hdfs,Java,Hadoop,Hdfs,在Java代码中，我希望连接到HDFS中的一个目录，了解该目录中的文件数，获取它们的名称并希望读取它们。我已经可以读取文件了，但我不知道如何计算目录中的文件数，并像普通目录一样获得文件名为了读取，我使用DFSClient并将文件打开到InputStream中。count Usage: hadoop fs -count [-q] <paths> 退出代码： hadoop fs -count hdfs://nn1.example.com/file1 hdfs://nn2.exampl

在Java代码中，我希望连接到HDFS中的一个目录，了解该目录中的文件数，获取它们的名称并希望读取它们。我已经可以读取文件了，但我不知道如何计算目录中的文件数，并像普通目录一样获得文件名

为了读取，我使用DFSClient并将文件打开到InputStream中。

count

Usage: hadoop fs -count [-q] <paths>

退出代码：

hadoop fs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2
hadoop fs -count -q hdfs://nn1.example.com/file1

成功时返回0，错误时返回-1

您只需使用文件系统并迭代路径中的文件即可。下面是一些示例代码

int count = 0;
FileSystem fs = FileSystem.get(getConf());
boolean recursive = false;
RemoteIterator<LocatedFileStatus> ri = fs.listFiles(new Path("hdfs://my/path"), recursive);
while (ri.hasNext()){
    count++;
    ri.next();
}

int count=0；
FileSystem fs=FileSystem.get（getConf（））；
布尔递归=假；
RemoteIterator ri=fs.listFiles（新路径（“hdfs://my/path（递归）；
while（ri.hasNext（））{
计数++；
ri.next（）；
}

在命令行上，您可以按如下操作

 hdfs dfs -ls $parentdirectory | awk '{system("hdfs dfs -count " $6) }'

要进行快速简单的计数，您还可以尝试以下一行程序：

hdfs dfs -ls -R /path/to/your/directory/ | grep -E '^-' | wc -l
快速解释：

hdfs dfs -ls -R /path/to/your/directory/ | grep -E '^-' | wc -l

grep-E'^-'
或
egrep'^-'
：grep所有文件：文件以'-'开头，而文件夹以'd'开头

wc-l
：行计数。
hadoop fs-du[-s][-h][-x]URI[URI…]
显示给定目录中包含的文件和目录的大小，如果只是文件，则显示文件的长度
选项：

The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files. Without the -s option, calculation is done by going 1-level deep from the given path. The -h option will format file sizes in a “human-readable” fashion (e.g 64.0m instead of 67108864) The -x option will exclude snapshots from the result calculation. Without the -x option (default), the result is always calculated from all INodes, including all snapshots under the given path.

您可以使用以下命令检查该特定目录中的文件计数

hadoop fs-count/directoryPath/*| print$2 | wc-l

count：统计路径下的文件、目录和字节数

print$2：打印输出的第二列

wc-l：要检查行计数
可以递归执行。我如何递归执行@User2486495抱歉，我在帖子中没有提到这一点，我想用Java代码来实现。您好。如果答案不相关，只需编辑一个答案即可，而无需发布两个答案。在Stack Exchange站点中，多个过时的帖子不是一件好事，通常会被否决、标记或删除。这个答案符合这个标准。请考虑删除它或将它与另一个帖子合并。一个需要小心的GETCordTrimeSysGETFILCONECUTE（）-HDFS DFS - COUNT命令所使用的命令：它包括符号链接到计数中，这可能会导致文件数量不准确，这取决于你需要什么。请看，5年前，文件大小从来不是一个问题。必须使用$8作为awk命令，因为$6占据了时间（可能只是一个HDFS版本的东西）。问题是关于javaapi的。