spark java api有hadoop MultipleOutputs/FSDataOutputStream这样的类吗？_Java_Hadoop_Apache Spark_Multipleoutputs

spark java api有hadoop MultipleOutputs/FSDataOutputStream这样的类吗？

java hadoop apache-spark

spark java api有hadoop MultipleOutputs/FSDataOutputStream这样的类吗？,java,hadoop,apache-spark,multipleoutputs,Java,Hadoop,Apache Spark,Multipleoutputs,我试图在reduce部分输出一些特定的记录，这取决于键值记录的值。在hadoop中，mapreduce可以使用如下代码 public void setup(Context context) throws IOException, InterruptedException { super.setup(context); Configuration conf = context.getConfiguration (); FileSystem fs = FileSystem.get (co

我试图在reduce部分输出一些特定的记录，这取决于键值记录的值。在hadoop中，mapreduce可以使用如下代码

public void setup(Context context) throws IOException, InterruptedException {
  super.setup(context);
  Configuration conf = context.getConfiguration ();
  FileSystem fs = FileSystem.get (conf);
  int taskID = context.getTaskAttemptID().getTaskID().getId();
  hdfsOutWriter = fs.create (new Path (fileName + taskID), true); // FSDataOutputStream
}
public void reduce(Text key, Iterable<Text> value, Context context) throws IOException, InterruptedException {
  boolean isSpecificRecord = false;
  ArrayList <String> valueList = new ArrayList <String> ();
  for (Text val : value) {
    String element = val.toString ();
    if (filterFunction (element)) return;
    if (specificFunction (element)) isSpecificRecord = true;
    valueList.add (element);
  }
  String returnValue = anyFunction (valueList);
  String specificInfo = anyFunction2 (valueList);
  if (isSpecificRecord) hdfsOutWriter.writeBytes (key.toString () + "\t" + specificInfo);
  context.write (key, new Text (returnValue));
}

public void设置（上下文上下文）抛出IOException、InterruptedException{
超级设置（上下文）；
conf=context.getConfiguration（）；
FileSystem fs=FileSystem.get（conf）；
int taskID=context.gettaskattentid（）.getTaskID（）.getId（）；
hdfsOutWriter=fs.create（新路径（文件名+任务ID），true）；//FSDataOutputStream
}
公共void reduce（文本键、Iterable值、上下文上下文）引发IOException、InterruptedException{
布尔值IsSpecificCrecord=false；
ArrayList valueList=新的ArrayList（）；
用于（文本值：值）{
String元素=val.toString（）；
if（filterFunction（element））返回；
如果（specificFunction（element））isSpecificRecord=true；
valueList.add（元素）；
}
字符串returnValue=anyFunction（valueList）；
字符串specificInfo=anyFunction2（valueList）；
if（isSpecificRecord）hdfsOutWriter.writeBytes（key.toString（）+“\t”+specificInfo）；
context.write（键，新文本（returnValue））；
}

我想在spark cluster上运行这个过程，spark java api可以像上面的代码那样运行吗？

只是一个如何模拟的想法：

yoursRDD.mapPartitions(iter => {
   val fs = FileSystem.get(new Configuration())
   val ds = fs.create(new Path("outfileName_" + TaskContext.get.partitionId))
   ds.writeBytes("Put yours results")
   ds.close()
   iter
})