Apache spark 如何传递值列表，json pyspark_Apache Spark_Apache Spark Sql_Pyspark_Sparklines

Apache spark 如何传递值列表，json pyspark

apache-spark pyspark

Apache spark 如何传递值列表，json pyspark,apache-spark,apache-spark-sql,pyspark,sparklines,Apache Spark,Apache Spark Sql,Pyspark,Sparklines,它工作正常。但是有一系列的值列表1=[“姓名”、“年龄”、“基因”、“xyz”、…] 当我路过的时候 >>> from pyspark.sql import SQLContext >>> sqlContext = SQLContext(sc) >>> rdd =sqlContext.jsonFile("tmp.json") >>> rdd_new= rdd.map(lambda x:x.name,x.age)

它工作正常。但是有一系列的值列表1=[“姓名”、“年龄”、“基因”、“xyz”、…] 当我路过的时候

 >>> from pyspark.sql import SQLContext
 >>> sqlContext = SQLContext(sc)
 >>> rdd =sqlContext.jsonFile("tmp.json") 
 >>> rdd_new= rdd.map(lambda x:x.name,x.age)

我认为您需要的是传递要选择的字段的名称。在这种情况下，请参见以下内容：

 For each_value in list1:
     `rdd_new=rdd.map(lambda x:x.each_value)` I am getting error

这是通过数据帧完成的。注意arg list的传递方式。有关更多信息，请参见此

能否打印错误？顺便说一句，您想做什么。我们有list1=[“name”，“age”，“gene”，“xyz”，…]，我想为list1.e rdd_new=rdd.map（lambda x:x.name，x.ag，x.gene，…）传递动态，我们想使用collect，然后l1=[“number”，“string”]s1=r1。选择（*l1）s1.collect（）我不太明白。

r1 = ssc.jsonFile("test.json")
    r1.printSchema()
    r1.show()

    l1 = ['number','string']
    s1 = r1.select(*l1)
    s1.printSchema()
    s1.show()

root
 |-- array: array (nullable = true)
 |    |-- element: long (containsNull = true)
 |-- boolean: boolean (nullable = true)
 |-- null: string (nullable = true)
 |-- number: long (nullable = true)
 |-- object: struct (nullable = true)
 |    |-- a: string (nullable = true)
 |    |-- c: string (nullable = true)
 |    |-- e: string (nullable = true)
 |-- string: string (nullable = true)

array                boolean null number object  string     
ArrayBuffer(1, 2, 3) true    null 123    [b,d,f] Hello World
root
 |-- number: long (nullable = true)
 |-- string: string (nullable = true)

number string     
123    Hello World