Hadoop 在Java中从HDFS中删除以特定名称开头的目录_Hadoop_Apache Spark_Hdfs

Hadoop 在Java中从HDFS中删除以特定名称开头的目录

hadoop apache-spark

Hadoop 在Java中从HDFS中删除以特定名称开头的目录,hadoop,apache-spark,hdfs,Hadoop,Apache Spark,Hdfs,我正在尝试使用以下代码从spark中删除配置单元阶段文件。此代码可以删除目录中的文件，但我想删除以“.hive-staging\u hive”开头的所有文件我可以知道如何删除以特定文本开头的目录吗 Configuration conf = new Configuration(); System.out.println("560"); Path output = new Path("hdfs://abcd/apps/hive/warehouse/

我正在尝试使用以下代码从spark中删除配置单元阶段文件。此代码可以删除目录中的文件，但我想删除以“.hive-staging\u hive”开头的所有文件

我可以知道如何删除以特定文本开头的目录吗

 Configuration conf = new Configuration();
            System.out.println("560");
            Path output = new Path("hdfs://abcd/apps/hive/warehouse/mytest.db/cdri/.hive-staging_hive_2017-06-08_20-45-20_776_7391890064363958834-1/");
            FileSystem hdfs = FileSystem.get(conf);

            System.out.println("564");

            // delete existing directory
            if (hdfs.exists(output)) {
                System.out.println("568");
                hdfs.delete(output, true);
                System.out.println("570");

            }

简单的方法是运行一个进程表单Java程序，并使用通配符删除目录中以“”开头的所有文件。hive-staging\u hive”

String command="hadoop fs -rm pathToDirectory/.hive-staging_hive*";
int exitValue;
try {
    Process process = Runtime.getRuntime().exec(command);
    process.waitFor();
    exitValue = process.exitValue();
}catch (Exception e) {
    System.out.println("Cannot run command");
    e.printStackTrace();
}

下一种方法是列出目录中的所有文件。筛选以“”.hive-staging\u hive“开头的文件并将其删除

Configuration conf = new Configuration();

Path path = new Path("hdfs://localhost:9000/tmp");

FileSystem fs = FileSystem.get(path.toUri(), conf);

FileStatus[] fileStatus = fs.listStatus(path);

List<FileStatus> filesToDelete = new ArrayList<FileStatus>();

for (FileStatus file: fileStatus) {

    if (file.getPath().getName().startsWith(".hive-staging_hive")){
        filesToDelete.add(file);
    }
}


for (int i=0; i<filesToDelete.size();i++){
    fs.delete(filesToDelete.get(i).getPath(), true);
}

Configuration conf=new Configuration（）；
路径路径=新路径（“hdfs://localhost:9000/tmp");
FileSystem fs=FileSystem.get（path.toUri（），conf）；
FileStatus[]FileStatus=fs.listStatus（路径）；
List filestodelite=new ArrayList（）；
for（FileStatus文件：FileStatus）{
if（file.getPath（）.getName（）.startsWith（“.hive-staging\u hive”））{
filesToDelete.add（文件）；
}
}
对于（int i=0；iThanks Shankar。您所说的第二种方法很有用。我试过了……但他们没有拉“.hive-staging\u hive”目录。我只收到常规分区目录。我能知道为什么它们不提取暂存目录吗？如果我尝试之前的第一种方法，我会得到“hadoop command not found”错误。我的Spark群集在hadoop群集之外。可能是我的Spark程序没有向hadoop群集提交“hadoop fs”命令。您好@AKC我已经更新了答案的第二部分，这应该是有效的。我也在本地进行了测试。它对我很有效。谢谢。在第二种方法中，你提到了如何使用它删除多个带有正则表达式（如_my）的文件。我想你可以用shell脚本轻松地完成这项工作。你接受bash解决方案吗？