Python 比较列表中的两个二维列表,并打印不同的行。但是没有一列

Python 比较列表中的两个二维列表,并打印不同的行。但是没有一列,python,list,multidimensional-array,Python,List,Multidimensional Array,以下是: gnucashumsaetze = [ ['2020-11-27', 'Essen', '4.53'], ['2020-11-27', 'Essen', '10.67'], ['2020-11-30', 'Essen', '4.80'], ['2020-11-30', 'Lebensmittel', '2.78'], ['2020-11-30', 'Essen', '2.31'], ['2020-11-30', 'Kosmetik', '5.58'], ['2020-12

以下是:

gnucashumsaetze = [
 ['2020-11-27', 'Essen', '4.53'],
 ['2020-11-27', 'Essen', '10.67'],
 ['2020-11-30', 'Essen', '4.80'],
 ['2020-11-30', 'Lebensmittel', '2.78'],
 ['2020-11-30', 'Essen', '2.31'],
 ['2020-11-30', 'Kosmetik', '5.58'],
 ['2020-12-01', 'Essen', '11.23'],
]

onlineumsaetze = [
['2020-11-27', 'EDEKA ERNST HAUPTBAHNH  / MUENCHEN', '4.53']
['2020-11-27', 'Netto Marken-Discount  / Ingolstadt', '10.67']
['2020-11-30', 'MUELLER GMBH & CO.KG  / NUERNBERG', '4.80']
['2020-11-30', 'Netto Marken-Discount  / Frankfurt', '2.31']
['2020-11-30', 'Rossmann 2380  / Ingolstadt', '5.58']
['2020-11-30', 'ALIEXPRESS.COM  / Luxembourg', '22.46']
['2020-12-01', 'EDEKA BRAUN  / INGOLSTADT', '11.23']
['2020-12-02', 'EDEKA ERNST HAUPTBAHNH  / MUENCHEN', '7.03']
]
我想比较两个2d列表并输出不同的结果。但不应比较第二列(第[1]行)。像这样:

['2020-11-30', 'ALIEXPRESS.COM  / Luxembourg', '22.46']
['2020-12-01', 'EDEKA BRAUN  / INGOLSTADT', '11.23']
['2020-12-02', 'EDEKA ERNST HAUPTBAHNH  / MUENCHEN', '7.03']
我已经尝试过的是这个;不幸的是,这是一场灾难:

fehlende_rows = (set((row[0] for row in onlineumsaetze),(row[2] for row in onlineumsaetze)) - set((row[0] for row in gnucashumsaetze),(row[2] for row in gnucashumsaetze)))
print(fehlende_rows)

我发现先写出完整的循环,然后尽可能地把它浓缩成一个列表理解,这真的很有帮助

最好的方法可能是迭代
gnucashumsaetze
并创建一个字符串->集合字典,其中日期作为键,数字作为集合的元素

gnucashumsaetze_dict = {}
for g in gnucashumsaetze:
    date, val = g[0], g[2]
    # Maybe you want to do val = float(g[2]) instead?
    if date not in gnucashumsaetze_dict:
        gnucashumsaetze_dict[date] = set()
    gnucashumsaetze_dict[date].add(val)
gnucashumsaetze_dict
现在是:

{'2020-11-27': {'10.67', '4.53'},
 '2020-11-30': {'2.31', '2.78', '4.80', '5.58'},
 '2020-12-01': {'11.23'}}
[['2020-11-30', 'ALIEXPRESS.COM  / Luxembourg', '22.46'],
 ['2020-12-02', 'EDEKA ERNST HAUPTBAHNH  / MUENCHEN', '7.03']]
然后,迭代
onlineumsaetze
中的每一行,并仅在满足所需条件时将其附加到新列表中

new_onlineumsaetze = []
for o in onlineumsaetze:
    date, val = o[0], o[2]
    # if date is not in gnucashumsaetze_dict, return empty set
    vals = gnucashumsaetze_dict.get(date, set()) 
    if val not in vals:
        new_onlineumsaetze.append(o)
new_onlineumsaetze
现在是:

{'2020-11-27': {'10.67', '4.53'},
 '2020-11-30': {'2.31', '2.78', '4.80', '5.58'},
 '2020-12-01': {'11.23'}}
[['2020-11-30', 'ALIEXPRESS.COM  / Luxembourg', '22.46'],
 ['2020-12-02', 'EDEKA ERNST HAUPTBAHNH  / MUENCHEN', '7.03']]
跳过
['2020-12-01','EDEKA BRAUN/INGOLSTADT','11.23']
行,因为
gnucashumsaetze
有一个
['2020-12-01','Essen','11.23']

既然您已经将其作为常规for循环编写,那么就更容易将其浓缩为一个列表

new_onlineumsaetze = [o for o in onlineumsaetze if o[2] not in gnucashumsaetze_dict.get(o[0], set())]

为了解决这个问题,我将使用列表理解

首先仅使用column0和column2创建两个集

gnucashumsaetze_set = set([(row[0], row[2]) for row in gnucashumsaetze])
onlineumsaetze_set = set([(row[0], row[2]) for row in onlineumsaetze])
然后我们得到这两个集合的差

diff_ = onlineumsaetze_set.difference(gnucashumsaetze_set)
对于最终结果,我们在onlineumsaetze中查找与我们获得的数据在column0和column2中匹配的行

res = [row for row in onlineumsaetze if (row[0], row[2]) in diff_]

print(res)
结果

[['2020-11-30', 'ALIEXPRESS.COM  / Luxembourg', '22.46'], ['2020-12-02', 'EDEKA ERNST HAUPTBAHNH  / MUENCHEN', '7.03']]

是否允许您更改数据的结构?比如把这些列表转换成dicts?你说的“输出不同的”是什么意思?您的意思是,仅当
gnucashumsaetze
第三列中不包含具有相同日期和数值的行时,您才想从
onlineumsaetze
输出一行吗?@Diptangsu Goswami,很遗憾,不是。此数据是从HTML网页复制的。@Pranav Hosangadi,是的,确切地说,在第一列和第三列。