Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/300.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用pyspark从对象JSON的嵌套数组访问对象的keyof_Python_Json_Pyspark - Fatal编程技术网

Python 使用pyspark从对象JSON的嵌套数组访问对象的keyof

Python 使用pyspark从对象JSON的嵌套数组访问对象的keyof,python,json,pyspark,Python,Json,Pyspark,我有一个名为Class.JSON的JSON文件,希望在满足一定条件的情况下计算所有数据 Class.json { "class": [ { "class_id": "1", "data": { "lesson3": { "id": 3, "schedule": [ { "schedule_id": "1", "schedule

我有一个名为Class.JSON的JSON文件,希望在满足一定条件的情况下计算所有数据

Class.json

{
  "class": [
    {
      "class_id": "1",
      "data": {
        "lesson3": {
          "id": 3,
          "schedule": [
            {
              "schedule_id": "1",
              "schedule_date": "2017-07-11",
              "lesson_price": "USD 25",
              "status": "ONGOING"
            },
            {
              "schedule_id": "2",
              "schedule_date": "2016-09-24",
              "lesson_price": "USD 15",
              "status": "OPEN REGISTRATION"
            }
          ]
        },
        "lesson4": {
          "id": 4,
          "schedule": [
            {
              "schedule_id": "1",
              "schedule_date": "2016-12-17",
              "lesson_price": "USD 19",
              "status": "ONGOING"
            },
            {
              "schedule_id": "2",
              "schedule_date": "2015-11-12",
              "lesson_price": "USD 29",
              "status": "ONGOING"
            },
            {
              "schedule_id": "3",
              "schedule_date": "2015-11-10",
              "lesson_price": "USD 14",
              "status": "ON SCHEDULE"
            }
          ]
        }
      }
    },
    {
      "class_id": "2",
      "data": {
        "lesson1": {
          "id": 1,
          "schedule": [
            {
              "schedule_id": "1",
              "schedule_date": "2017-05-21",
              "lesson_price": "USD 50",
              "status": "CANCELLED"
            }
          ]
        },
        "lesson2": {
          "id": 2,
          "schedule": [
            {
              "schedule_id": "1",
              "schedule_date": "2017-06-04",
              "lesson_price": "USD10",
              "status": "FINISHED"
            },
            {
              "schedule_id": "5",
              "schedule_date": "2018-03-01",
              "lesson_price": "USD12",
              "status": "CLOSED"
            }
          ]
        }
      }
    }
  ]
}
我试过了

df = spark.read.json("class.json", multiLine=True)
df.show()
它显示:

+--------------------+
|               class|
+--------------------+
|[[1, [,, [3, [[US...|
+--------------------+
那么为了访问阵列,我尝试了这个
try=df.select(“class”).map(lambda s:s['data'])

但出现错误
AttributeError:“DataFrame”对象没有属性“map”

或者执行
df['class'][0]['data']
get
Column

目标:

  • 统计状态为“正在进行”且计划日期在2017-01年之前的计划
  • 2017-01年前的平均课程价格
如何使用pyspark