Python 比较列表中的两个二维列表,并打印不同的行。但是没有一列
以下是:Python 比较列表中的两个二维列表,并打印不同的行。但是没有一列,python,list,multidimensional-array,Python,List,Multidimensional Array,以下是: gnucashumsaetze = [ ['2020-11-27', 'Essen', '4.53'], ['2020-11-27', 'Essen', '10.67'], ['2020-11-30', 'Essen', '4.80'], ['2020-11-30', 'Lebensmittel', '2.78'], ['2020-11-30', 'Essen', '2.31'], ['2020-11-30', 'Kosmetik', '5.58'], ['2020-12
gnucashumsaetze = [
['2020-11-27', 'Essen', '4.53'],
['2020-11-27', 'Essen', '10.67'],
['2020-11-30', 'Essen', '4.80'],
['2020-11-30', 'Lebensmittel', '2.78'],
['2020-11-30', 'Essen', '2.31'],
['2020-11-30', 'Kosmetik', '5.58'],
['2020-12-01', 'Essen', '11.23'],
]
onlineumsaetze = [
['2020-11-27', 'EDEKA ERNST HAUPTBAHNH / MUENCHEN', '4.53']
['2020-11-27', 'Netto Marken-Discount / Ingolstadt', '10.67']
['2020-11-30', 'MUELLER GMBH & CO.KG / NUERNBERG', '4.80']
['2020-11-30', 'Netto Marken-Discount / Frankfurt', '2.31']
['2020-11-30', 'Rossmann 2380 / Ingolstadt', '5.58']
['2020-11-30', 'ALIEXPRESS.COM / Luxembourg', '22.46']
['2020-12-01', 'EDEKA BRAUN / INGOLSTADT', '11.23']
['2020-12-02', 'EDEKA ERNST HAUPTBAHNH / MUENCHEN', '7.03']
]
我想比较两个2d列表并输出不同的结果。但不应比较第二列(第[1]行)。像这样:
['2020-11-30', 'ALIEXPRESS.COM / Luxembourg', '22.46']
['2020-12-01', 'EDEKA BRAUN / INGOLSTADT', '11.23']
['2020-12-02', 'EDEKA ERNST HAUPTBAHNH / MUENCHEN', '7.03']
我已经尝试过的是这个;不幸的是,这是一场灾难:
fehlende_rows = (set((row[0] for row in onlineumsaetze),(row[2] for row in onlineumsaetze)) - set((row[0] for row in gnucashumsaetze),(row[2] for row in gnucashumsaetze)))
print(fehlende_rows)
我发现先写出完整的循环,然后尽可能地把它浓缩成一个列表理解,这真的很有帮助 最好的方法可能是迭代
gnucashumsaetze
并创建一个字符串->集合字典,其中日期作为键,数字作为集合的元素
gnucashumsaetze_dict = {}
for g in gnucashumsaetze:
date, val = g[0], g[2]
# Maybe you want to do val = float(g[2]) instead?
if date not in gnucashumsaetze_dict:
gnucashumsaetze_dict[date] = set()
gnucashumsaetze_dict[date].add(val)
gnucashumsaetze_dict
现在是:
{'2020-11-27': {'10.67', '4.53'},
'2020-11-30': {'2.31', '2.78', '4.80', '5.58'},
'2020-12-01': {'11.23'}}
[['2020-11-30', 'ALIEXPRESS.COM / Luxembourg', '22.46'],
['2020-12-02', 'EDEKA ERNST HAUPTBAHNH / MUENCHEN', '7.03']]
然后,迭代onlineumsaetze
中的每一行,并仅在满足所需条件时将其附加到新列表中
new_onlineumsaetze = []
for o in onlineumsaetze:
date, val = o[0], o[2]
# if date is not in gnucashumsaetze_dict, return empty set
vals = gnucashumsaetze_dict.get(date, set())
if val not in vals:
new_onlineumsaetze.append(o)
new_onlineumsaetze
现在是:
{'2020-11-27': {'10.67', '4.53'},
'2020-11-30': {'2.31', '2.78', '4.80', '5.58'},
'2020-12-01': {'11.23'}}
[['2020-11-30', 'ALIEXPRESS.COM / Luxembourg', '22.46'],
['2020-12-02', 'EDEKA ERNST HAUPTBAHNH / MUENCHEN', '7.03']]
跳过['2020-12-01','EDEKA BRAUN/INGOLSTADT','11.23']
行,因为gnucashumsaetze
有一个['2020-12-01','Essen','11.23']
既然您已经将其作为常规for循环编写,那么就更容易将其浓缩为一个列表
new_onlineumsaetze = [o for o in onlineumsaetze if o[2] not in gnucashumsaetze_dict.get(o[0], set())]
为了解决这个问题,我将使用列表理解 首先仅使用column0和column2创建两个集
gnucashumsaetze_set = set([(row[0], row[2]) for row in gnucashumsaetze])
onlineumsaetze_set = set([(row[0], row[2]) for row in onlineumsaetze])
然后我们得到这两个集合的差
diff_ = onlineumsaetze_set.difference(gnucashumsaetze_set)
对于最终结果,我们在onlineumsaetze中查找与我们获得的数据在column0和column2中匹配的行
res = [row for row in onlineumsaetze if (row[0], row[2]) in diff_]
print(res)
结果
[['2020-11-30', 'ALIEXPRESS.COM / Luxembourg', '22.46'], ['2020-12-02', 'EDEKA ERNST HAUPTBAHNH / MUENCHEN', '7.03']]
是否允许您更改数据的结构?比如把这些列表转换成dicts?你说的“输出不同的”是什么意思?您的意思是,仅当
gnucashumsaetze
第三列中不包含具有相同日期和数值的行时,您才想从onlineumsaetze
输出一行吗?@Diptangsu Goswami,很遗憾,不是。此数据是从HTML网页复制的。@Pranav Hosangadi,是的,确切地说,在第一列和第三列。