Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
从文本值创建DataFrame和JavaRDD_Java_Apache Spark_Rdd_Spark Dataframe - Fatal编程技术网

从文本值创建DataFrame和JavaRDD

从文本值创建DataFrame和JavaRDD,java,apache-spark,rdd,spark-dataframe,Java,Apache Spark,Rdd,Spark Dataframe,我正在用Java编写一个Spark应用程序,我想知道如何从文本值创建DataFrame和/或JavaRDD 例如,我有3个整数,比如说(784512,35,40),对应于字段/列(id,m\u count,f\u count)您想要创建一个JavaRDD并创建一个数据帧 JavaRDD rdd = sc.parallelize(Arrays.asList(1, 2, 3, 4)); 如果要创建具有三个值的平行对象列表,则需要: @Test

我正在用Java编写一个Spark应用程序,我想知道如何从文本值创建DataFrame和/或JavaRDD

例如,我有3个整数,比如说
(784512,35,40)
,对应于字段/列
(id,m\u count,f\u count)

您想要创建一个JavaRDD并创建一个数据帧

JavaRDD rdd = sc.parallelize(Arrays.asList(1, 2, 3, 4));
如果要创建具有三个值的平行对象列表,则需要:

  @Test                                                                                                 
  public void test() {                                                                   
      JavaSparkContext sc =  ...                                                    
      SQLContext sqlContext =  new SQLContext(sc);                                                      

      JavaRDD<Counter> counters = sc.parallelize(Arrays.asList(new Counter(784512, 35, 40)));           
      DataFrame countersDF = sqlContext.createDataFrame(counters, Counter.class);                       

      System.out.println(counters.collect());                                                           
      System.out.println(countersDF.collectAsList());                                                           
  }                                                                                                     



  public static class Counter implements Serializable{                                                         
      private final int id;                                                                             
      private final int m_count;                                                                        
      private final int f_count;                                                                        

      Counter(int id, int m_count, int f_count) {                                                       
          this.id = id;                                                                                 
          this.m_count = m_count;                                                                       
          this.f_count = f_count;                                                                       
      }                                                                                                 

      public String toString() {                                                                        
          return id + " " + m_count + " "  + f_count;                                                   
      }    
      // getters                                                                                          
  }                                                                                                     
@测试
公共无效测试(){
JavaSparkContext sc=。。。
SQLContext SQLContext=新的SQLContext(sc);
JavaRDD counters=sc.parallelize(Arrays.asList(新计数器(784512,35,40));
DataFrame CounterDF=sqlContext.createDataFrame(计数器,计数器.class);
System.out.println(counters.collect());
System.out.println(counterdf.collectAsList());
}                                                                                                     
公共静态类计数器实现可序列化{
私有最终int id;
私人最终国际货币单位计数;
私人最终整数f_计数;
计数器(整数id,整数m_计数,整数f_计数){
this.id=id;
此.m_计数=m_计数;
this.f_count=f_count;
}                                                                                                 
公共字符串toString(){
返回id+“”+m_计数+“”+f_计数;
}    
//吸气剂
}