Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/asp.net/32.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop 在Spark中将Json键转换为列_Hadoop_Apache Spark_Mapreduce_Emr - Fatal编程技术网

Hadoop 在Spark中将Json键转换为列

Hadoop 在Spark中将Json键转换为列,hadoop,apache-spark,mapreduce,emr,Hadoop,Apache Spark,Mapreduce,Emr,我编写了一段代码,读取数据并从元组中选取第二个元素。第二个元素恰好是JSON。 获取JSON的代码: import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.conf.Configuration; import com.amazon.traffic.emailautomation.cafe.purchasefilter.util.CodecAwareManifestFile

我编写了一段代码,读取数据并从元组中选取第二个元素。第二个元素恰好是JSON。 获取JSON的代码:

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.conf.Configuration;
import    com.amazon.traffic.emailautomation.cafe.purchasefilter.util.CodecAwareManifestFileSystem;
import com.amazon.traffic.emailautomation.cafe.purchasefilter.util.CodecAwareManifestInputFormat;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import amazon.emr.utils.manifest.input.ManifestItemFileSystem;
import amazon.emr.utils.manifest.input.ManifestInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat ;
import scala.Tuple2;

val configuration = new Configuration(sc.hadoopConfiguration);
ManifestItemFileSystem.setImplementation(configuration);
ManifestInputFormat.setInputFormatImpl(configuration, classOf[TextInputFormat]);
val linesRdd1 = sc.newAPIHadoopFile("location", classOf[ManifestInputFormat[LongWritable,Text]], classOf[LongWritable], classOf[Text], configuration).map(tuple2 =>  tuple2._2.toString());
以下是一个例子:

{"data":   {"marketplaceId":7,"customerId":123,"eventTime":1471206800000,"asin":"4567","type":"OWN","region":"NA"},"uploadedDate":1471338703958}

现在,我想创建一个数据框架,其中json键(如marketplaceId、customerId等)作为列,行具有其值。我不知道该怎么做?有人能帮我用指针来实现同样的目的吗?

您可以使用此链接创建一个scala对象来编组/解编组JSON

然后使用该对象在scala中使用case类读取JSON数据:

import org.apache.spark.{SparkConf, SparkContext}

object stackover {
  case class Data(
                   marketplaceId: Double,
                   customerId: Double,
                   eventTime: Double,
                   asin: String,
                   `type`: String,
                   region: String
                 )
  case class R00tJsonObject(
                            data: Data,
                            uploadedDate: Double
                           )

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf(true)
    conf.setAppName("example");
    conf.setMaster("local[*]")

    val sc = new SparkContext(conf)
    val data = sc.textFile("test1.json")
    val parsed = data.map(row => JsonUtil.readValue[R00tJsonObject](row))

    parsed.map(rec => (rec.data, rec.uploadedDate, rec.data.customerId, 
rec.data.marketplaceId)).collect.foreach(println)
 }
 }
输出:

(Data(7.0,123.0,1.4712068E12,4567,OWN,NA),1.471338703958E12,123.0,7.0)