File 有没有办法从Hadoop hdfs中列出文件，并只将文件名存储到本地，而不是实际的文件本身？_File_Hadoop_Hdfs

File 有没有办法从Hadoop hdfs中列出文件，并只将文件名存储到本地，而不是实际的文件本身？

file hadoop

File 有没有办法从Hadoop hdfs中列出文件，并只将文件名存储到本地，而不是实际的文件本身？,file,hadoop,hdfs,File,Hadoop,Hdfs,有没有办法从Hadoop hdfs中列出文件并只将文件名存储到本地例如：我有一个文件india_20210517_20210523.csv。我目前正在使用copytolocal命令将文件从hdfs复制到本地，但是将文件复制到本地非常耗时，因为文件太大。我所需要的只是要存储在.txt文件中的文件名，以便使用bash脚本执行剪切操作如果您想以编程方式执行此操作，请帮助我，您可以使用Hadoop中的和对象：列出（当前或其他）目标目录的内容检查此目录的每个记录是文件还是其他目录，以及将每个文

有没有办法从Hadoop hdfs中列出文件并只将文件名存储到本地

例如：

我有一个文件india_20210517_20210523.csv。我目前正在使用copytolocal命令将文件从hdfs复制到本地，但是将文件复制到本地非常耗时，因为文件太大。我所需要的只是要存储在.txt文件中的文件名，以便使用bash脚本执行剪切操作

如果您想以编程方式执行此操作，请帮助我，您可以使用Hadoop中的和对象：

列出（当前或其他）目标目录的内容

检查此目录的每个记录是文件还是其他目录，以及

将每个文件的名称作为新行写入本地存储的文件

此类应用程序的代码可以如下所示：

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;

import java.io.File;
import java.io.PrintWriter;


public class Dir_ls
{
    public static void main(String[] args) throws Exception 
    {
        // get input directory as a command-line argument
        Path inputDir = new Path(args[0]);  

        Configuration conf = new Configuration();

        FileSystem fs = FileSystem.get(conf);

        if(fs.exists(inputDir))
        {
            // list directory's contents
            FileStatus[] fileList = fs.listStatus(inputDir);

            // create file and its writer
            PrintWriter pw = new PrintWriter(new File("output.txt"));

            // scan each record of the contents of the input directory
            for(FileStatus file : fileList)
            {
                if(!file.isDirectory()) // only take into account files
                {
                    System.out.println(file.getPath().getName());
                    pw.write(file.getPath().getName() + "\n");
                }
            }

            pw.close();
        }
        else
            System.out.println("Directory named \"" + args[0] + "\" doesn't exist.");
    }
}

因此，如果我们想列出HDFS根目录（

）中的文件，并将其作为目录下的内容（注意我们都有目录和文本文件）：

这将是应用程序的命令行输出：

这将是在本地存储的

output.txt

文本文件中写入的内容：

最简单的方法是使用下面的命令。

hdfs dfs -ls /path/fileNames | awk '{print $8}' | xargs -n 1 basename > Output.txt

工作原理：

hdfs dfs -ls : This will list all the information about the path

希望这能回答您的问题。

您可以从hdfs list命令重定向输出，例如

hdfs-ls-C hdfs/path/You/want/files/from>file\u list。out

最近的Hadoop版本添加了只打印文件路径和名称的

-C

选项，因此不需要所有这些向导。

hdfs dfs -ls /path/fileNames | awk '{print $8}' | xargs -n 1 basename > Output.txt

xargs -n 1 basename : To get the file names alone excluding the path

> Output.txt : To store the file names to a text file