Python遍历字典列表_Python_Python 3.x_Dataframe

Python遍历字典列表

python python-3.x dataframe

Python遍历字典列表,python,python-3.x,dataframe,Python,Python 3.x,Dataframe,我有下面的字典清单- results = [ {'type': 'check_datatype', 'kwargs': {'table': 'cars', 'columns': ['car_id','index'], 'd_type': 'str'}, 'datasource_path': '/cars_dataset_ok/', 'Result': False}, {'type': 'check_string_consistency',

我有下面的字典清单-

results = [
     {'type': 'check_datatype',
      'kwargs': {'table': 'cars', 'columns': ['car_id','index'], 'd_type': 'str'},
      'datasource_path': '/cars_dataset_ok/',
      'Result': False},
    {'type': 'check_string_consistency',
      'kwargs': {'table': 'cars', 'columns': ['car_id'], 'string_length': 6},
      'datasource_path': '/cars_dataset_ok/',
      'Result': False}
    ]

我想要下面的输出列表，其中键和值字段来自上面列表中的kwargs键-

id |键|值|索引

[[1，表，cars，null]，[1，列，car_id，1]，[1，列，索引，2] [1，数据类型，str，null]，[2，表，cars，null]，[2，列，car\u id，null]，[2，字符串长度，6，null]]

更新-现在，我想在输出中再增加一列-uniquehaschode-->这里唯一的hashcode意味着具有相同键和值的字典应该生成相同的id或哈希。因此，如果字典“kwargs”中的键值对相同，那么它们应该返回相同的哈希代码。输出应该是这样的-

[[1，表，cars，null，uniquehaschode1]，[1，列，car_id，1，uniquehaschode1]，[1，列，索引，2，uniquehaschode1] [1，数据类型，str，null，uniquehaschode1]，[2，表，cars，null，uniquehaschode2]，[2，列，car\u id，null，uniquehaschode2]，[2，字符串长度，6，null，uniquehaschode2]]

另外，如果某个特定的uniquehaschode已经存在，我不想在这个表中插入任何内容

Update2：我想用下面的模式创建一个数据帧。args_id对于每对唯一的（kwargs和check_name）将是相同的。我想每天运行上面的字典列表，因此对于不同的日期运行，args_id应该是相同的，如果唯一的一对（kwargs和check_name）再次出现。我想每天将这个结果存储到一个数据帧中，然后将它放入spark的增量表中

Type|time|args_id
check_datatype|2021-03-29|0
check_string_consistency|2021-03-29|1
check_datatype|2021-03-30|0

直到现在，我一直在使用下面的代码-

type_results = [[elt['type'] for
                   elt in results]
        checkColumns = ['type']
        spark = SparkSession.builder.getOrCreate()
        DF = spark.createDataFrame(data=results, schema=checkColumns)
        DF = DF.withColumn("time", F.current_timestamp())
       DF = DF.withColumn("args_id", F.row_number().over(Window.orderBy(F.monotonically_increasing_id())))

您可能需要：

results = [
     {'type': 'check_datatype',
      'kwargs': {'table': 'cars', 'columns': ['car_id','index'], 'd_type': 'str'},
      'datasource_path': '/cars_dataset_ok/',
      'Result': False},
    {'type': 'check_string_consistency',
      'kwargs': {'table': 'cars', 'columns': ['car_id'], 'string_length': 6},
      'datasource_path': '/cars_dataset_ok/',
      'Result': False}
    ]

result_list = []
for c, l in enumerate(results, start=1):
    for key, value in l['kwargs'].items():
        if isinstance(value,list):
            if len(value) == 1:
                result_list.append([str(c),key,value[0],'null'])
                continue
            for i in value:
                result_list.append([str(c),key,i,str(value.index(i)+1)])
        else:
            result_list.append([str(c),key,value,'null'])

print(result_list)

输出：

[['1', 'table', 'cars', 'null'], ['1', 'columns', 'car_id', '1'], ['1', 'columns', 'index', '2'], ['1', 'd_type', 'str', 'null'], ['2', 'table', 'cars', 'null'], ['2', 'columns', 'car_id', 'null'], ['2', 'string_length', 6, 'null']]

[['1', 'table', 'cars', 'null', '-6654319495930648246-1'], ['1', 'columns', 'car_id', '1', '-6654319495930648246-1'], ['1', 'columns', 'index', '2', '-6654319495930648246-1'], ['1', 'd_type', 'str', 'null', '-6654319495930648246-1'], ['2', 'table', 'cars', 'null', '-3876605863049152209-2'], ['2', 'columns', 'car_id', 'null', '-3876605863049152209-2'], ['2', 'string_length', 6, 'null', '-3876605863049152209-2'], ['3', 'table', 'cars', 'null', '-3876605863049152209-3'], ['3', 'columns', 'car_id', 'null', '-3876605863049152209-3'], ['3', 'string_length', 6, 'null', '-3876605863049152209-3']]

对于更新部分，您可以使用

pip安装映射

：

import maps
results = [
     {'type': 'check_datatype',
      'kwargs': {'table': 'cars', 'columns': ['car_id','index'], 'd_type': 'str'},
      'datasource_path': '/cars_dataset_ok/',
      'Result': False},
    {'type': 'check_string_consistency',
      'kwargs': {'table': 'cars', 'columns': ['car_id'], 'string_length': 6},
      'datasource_path': '/cars_dataset_ok/',
      'Result': False},
    {'type': 'check_string_consistency',
     'kwargs': {'table': 'cars', 'columns': ['car_id'], 'string_length': 6},
     'datasource_path': '/cars_dataset_ok/',
     'Result': False}
    ]
 
result_list = []
for c, l in enumerate(results, start=1):
    h = hash(maps.FrozenMap.recurse(l['kwargs']))
    for key, value in l['kwargs'].items():
        if isinstance(value,list):
            if len(value) == 1:
                result_list.append([str(c),key,value[0],'null', f'{h}-{c}'])
                continue
            for i in value:
                result_list.append([str(c),key,i,str(value.index(i)+1),f'{h}-{c}'])
        else:
            result_list.append([str(c),key,value,'null',f'{h}-{c}'])

print(result_list)

输出：

[['1', 'table', 'cars', 'null'], ['1', 'columns', 'car_id', '1'], ['1', 'columns', 'index', '2'], ['1', 'd_type', 'str', 'null'], ['2', 'table', 'cars', 'null'], ['2', 'columns', 'car_id', 'null'], ['2', 'string_length', 6, 'null']]

[['1', 'table', 'cars', 'null', '-6654319495930648246-1'], ['1', 'columns', 'car_id', '1', '-6654319495930648246-1'], ['1', 'columns', 'index', '2', '-6654319495930648246-1'], ['1', 'd_type', 'str', 'null', '-6654319495930648246-1'], ['2', 'table', 'cars', 'null', '-3876605863049152209-2'], ['2', 'columns', 'car_id', 'null', '-3876605863049152209-2'], ['2', 'string_length', 6, 'null', '-3876605863049152209-2'], ['3', 'table', 'cars', 'null', '-3876605863049152209-3'], ['3', 'columns', 'car_id', 'null', '-3876605863049152209-3'], ['3', 'string_length', 6, 'null', '-3876605863049152209-3']]

那是错误的。预期的输出应将kwargs中的每个键值对作为新行，并且属于同一字典的所有这些行的id应相同。您可以检查我的预期输出是否有问题。您所做的非常简单Hi David，我刚刚编辑了最后一列索引中的输出，如果一个键没有多个值，那么该索引应该为null，如果有多个值，它应该从1开始索引。看到输出，我已经编辑了。好的，我编辑了答案。嗨，大卫，输出是一个打印语句，我实际上希望它是列表格式的，比如[[1，table，cars，null]，[1，columns，car_id，1]…[2，'string_length'，6，null]]好的，我编辑了答案。您也可以将“null”替换为“None”。谢谢david。我不知道为什么，但当我试图将此列表存储在数据框中时，我遇到了以下错误-TypeError:字段值：无法合并类型和checkColumns=['args_id'、'key'、'value'、'list_index']df=spark.createDataFrame（数据=结果_列表，模式=checkColumns）