Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/jsf-2/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java split方法如何创建PairFunction?_Java_Apache Spark - Fatal编程技术网

Java split方法如何创建PairFunction?

Java split方法如何创建PairFunction?,java,apache-spark,Java,Apache Spark,下面的代码中有一个PairFunction,它将数组列表拆分为键和值 private static PairFunction<String,String,Integer> getNameAndAgePair() { return (PairFunction<String,String,Integer>) s -> new Tuple2<>( s.split(" ")[0], Integer.valueOf(s.spl

下面的代码中有一个
PairFunction
,它将数组列表拆分为键和值

private static PairFunction<String,String,Integer> getNameAndAgePair() {
  return (PairFunction<String,String,Integer>) s -> 
    new Tuple2<>(
      s.split(" ")[0],
      Integer.valueOf(s.split(" ")[1]));
}
private static PairFunction getNameAndAgePair(){
返回(PairFunction)s->
新元组2(
s、 拆分(“”[0],
整数值(s.split(“”[1]);
}
请有人解释一下这到底是怎么回事。我无法理解这里的拆分,我知道他们正在尝试创建键和值,但为什么在这两个索引中都按空间拆分

Tuple2<>(s.split(" ")[0], Integer.valueOf(s.split(" ")[1]));}
Tuple2(s.split(“”[0],Integer.valueOf(s.split(“”[1]);)
这段代码将JavaRDD转换为JavaPairRDD,这段代码是用JavaSparkRDD编写的。基本上,这段代码介绍了如何从普通RDD创建PairRDD

        import java.util.Arrays;
        import java.util.List;

        import org.apache.spark.SparkConf;
        import org.apache.spark.api.java.JavaPairRDD;
        import org.apache.spark.api.java.JavaRDD;
        import org.apache.spark.api.java.JavaSparkContext;
        import org.apache.spark.api.java.function.PairFunction;

        import scala.Tuple2;

        public class PairRDDFromRegularRDD {

        public static void main(String[] args) {

            SparkConf conf = new SparkConf().setAppName("Pair Rdd From Regular 
            RDD").setMaster("local[*]");

            JavaSparkContext sc = new JavaSparkContext(conf);

            List<String> inputString = Arrays.asList("Lily 23", " jack 29", "mary 29", 
            "James 8");

            JavaRDD<String> regularRDDs = sc.parallelize(inputString);

            JavaPairRDD<String,Integer> pairRDD = 
            regularRDDs.mapToPair(getNameAndAgePair());
        }

        private static PairFunction<String,String,Integer> getNameAndAgePair() {
            // TODO Auto-generated method stub
            return (PairFunction<String,String,Integer>) s -> new Tuple2<>(s.split(" ") 
            [0],Integer.valueOf(s.split(" ")[1]));
        }

        }
导入java.util.array;
导入java.util.List;
导入org.apache.spark.SparkConf;
导入org.apache.spark.api.java.javapairdd;
导入org.apache.spark.api.java.JavaRDD;
导入org.apache.spark.api.java.JavaSparkContext;
导入org.apache.spark.api.java.function.PairFunction;
导入scala.Tuple2;
公共类pairrdFromRegulard{
公共静态void main(字符串[]args){
SparkConf conf=new SparkConf().setAppName(“从常规
RDD“).setMaster(“本地[*]”;
JavaSparkContext sc=新的JavaSparkContext(conf);
List inputString=Arrays.asList(“莉莉23”、“杰克29”、“玛丽29”,
“詹姆斯8”);
JavaRDD regularRDDs=sc.parallelize(inputString);
JavaPairRDD pairRDD=
regularRDDs.mapToPair(getNameAndAgePair());
}
私有静态PairFunction getNameAndAgePair(){
//TODO自动生成的方法存根
return(PairFunction)s->new Tuple2(s.split(“”)
[0],整数.valueOf(s.split(“”[1]);
}
}
我无法理解这里的拆分,我知道他们正在尝试创建键和值,但为什么在这两个索引中都按空间拆分

Tuple2<>(s.split(" ")[0], Integer.valueOf(s.split(" ")[1]));}
如果你不介意的话,我将用spark shell来解释

scala> val s = "hello world"
s: String = hello world

// In Java you'd use s.split(" ")[0]
scala> val key = s.split(" ")(0)
key: String = hello

scala> val value = s.split(" ")(1)
value: String = world
换句话说,
split
方法通过给定的分隔符(在您的例子中是一个空格)将字符串分割成块。访问第0个块和第1个块分别访问键和值的值


实际上,它更像是Java/Scala而不是Spark。

非常感谢@Jacek的回复。那么,如果是s.split(“,”),那么在这种情况下,答案是什么?那么根据你的说法,空间只是为了分开?如果我使用s.split(“”-->双引号内没有空格,是否会有任何差异如果您更改了分隔符,您将更改标记(输出)。