比较两个无序列表并查找pyspark中哪些元素不匹配_Pyspark_Pyspark Dataframes

比较两个无序列表并查找pyspark中哪些元素不匹配

pyspark

比较两个无序列表并查找pyspark中哪些元素不匹配,pyspark,pyspark-dataframes,Pyspark,Pyspark Dataframes,为了进一步说明这篇文章中的问题，我有两个无序的列表，我想看看它们是否相等，考虑到重复，而不关心顺序。如果它们不相等，则查找列表中的哪些元素不在另一个列表中以上面提到的帖子为例，假设等号左边的列表是L1，右边的列表是L2 L1 L2 ['one', 'two', 'three'] == ['one', 'two', 'three'] : true ['one', 'two', 'three'] == ['one', 'thre

为了进一步说明这篇文章中的问题，我有两个无序的列表，我想看看它们是否相等，考虑到重复，而不关心顺序。如果它们不相等，则查找列表中的哪些元素不在另一个列表中

以上面提到的帖子为例，假设等号左边的列表是L1，右边的列表是L2

    L1                               L2
['one', 'two', 'three'] == ['one', 'two', 'three'] :  true
['one', 'two', 'three'] == ['one', 'three', 'two'] :  true
['one', 'two', 'three'] == ['one', 'two', 'three', 'three'] :  false, L1:'three'
['one', 'two', 'three'] == ['one', 'two', 'three', 'four'] :  false, L1:'four'
['one', 'two', 'three'] == ['one', 'two', 'four'] :  false, L1:'four', L2:'three'
['one', 'two', 'three'] == ['one'] :  false, L2:'two','three'

输出不必与我描述的完全相同，但基本上我想知道两个列表的比较是真是假，如果是假，L2中的哪些元素不在L1中，L1中的哪些元素不在L2中

@Katriel提供的解决方案是使用

collections

函数，如下所示：

import collections
compare = lambda x, y: collections.Counter(x) == collections.Counter(y)

但它没有提供关于哪些元素不匹配的信息。

pyspark中是否有一种有效的方法来实现这一点？

如果不需要像示例中的第3行那样保留重复匹配，则可以使用double array_（concat除外）来获取输出，并使用“when，Other”根据生成的数组大小获取布尔值。 （spark2.4+）

在作为数组进行比较之前，您不需要删除重复项，除非这样做是自动进行的。

我们可以假设L1保持不变吗？是的，上面的示例适用于L1/L2比较的不同场景。对于第3行，输出为false且为“三”。对于您的用例来说，像这样处理副本是完全必要的吗。我想我可以通过另一种方法检查是否有重复项。因此，不，我不需要像那样检测重复，我们可以在比较之前删除重复，谢谢！令人惊讶的是，数组_可以在一次过中识别不匹配。有没有办法将原始列表添加到不匹配项旁边或单独的列中？很乐意提供帮助。你所说的原产地清单是什么意思？像l1和l2？嗨，是的，基本上我想知道哪个列表出现了不匹配。因此，在本例中，对于第4行，结果可能是

[three:L2，four:L1]

，或者在另一列中，is可能是[L2，L1]

df.withColumn("result", F.concat(F.array_except("L1","L2"),F.array_except("L2","L1")))\
  .withColumn("Boolean", F.when(F.size("result")==0,F.lit(True)).otherwise(F.lit(False))).show(truncate=False)

+-----------------+------------------------+-------------+-------+
|L1               |L2                      |result       |Boolean|
+-----------------+------------------------+-------------+-------+
|[one, two, three]|[one, two, three]       |[]           |true   |
|[one, two, three]|[one, three, two]       |[]           |true   |
|[one, two, three]|[one, two, three, three]|[]           |true   |
|[one, two, three]|[one, two, three, four] |[four]       |false  |
|[one, two, three]|[one, two, four]        |[three, four]|false  |
|[one, two, three]|[one]                   |[two, three] |false  |
+-----------------+------------------------+-------------+-------+