如何使用ApacheSpark(Scala)提取受密码保护的zip文件?
我已经写了下面的代码来读取没有密码的zip文件,如下所示:如何使用ApacheSpark(Scala)提取受密码保护的zip文件?,scala,apache-spark,Scala,Apache Spark,我已经写了下面的代码来读取没有密码的zip文件,如下所示: val sparkConf = new SparkConf().setMaster("local[2]").setAppName("CSVProcessSPark"); //create a new spark config val sc = new SparkContext(sparkConf) sc
val sparkConf = new SparkConf().setMaster("local[2]").setAppName("CSVProcessSPark"); //create a new spark config
val sc = new SparkContext(sparkConf)
sc.binaryFiles("hdfs://localhost:8020/user/cloudera/HWZ.zip", 1) //make an RDD from *.zip files in HDFS
.flatMap((file: (String, PortableDataStream)) => { //flatmap to unzip each file
val zipStream = new ZipInputStream(file._2.open)
//open a java.util.zip.ZipInputStream
val entry = zipStream.getNextEntry() //get the first entry in the stream
val iter = Source.fromInputStream(zipStream).getLines //place entry lines into an iterator
iter.next //pop off the iterator's first line
iter //return the iterator
})
.saveAsTextFile("hdfs://localhost:8020/user/cloudera/result.csv")
我尝试在spark上下文本地属性中设置密码,但仍然无法读取受密码保护的zip文件
请为我提供一个使用ApacheSpark读取密码保护的zip文件的解决方案
提前谢谢