在多个键上合并两个列表的pythonic方法（不带熊猫）_Python_Pandas

在多个键上合并两个列表的pythonic方法（不带熊猫）

python pandas

在多个键上合并两个列表的pythonic方法（不带熊猫）,python,pandas,Python,Pandas,什么是pythonic方法来合并多个键上的两个列表，而不使用字典或熊猫鉴于： a = [[700,0, 'A'], [700,1, 'B'],[704,0, 'C'],[704,1, 'A'],[709,0, 'A'],[710,0, 'A'],[711,0, 'A']] b = [[700,0, 'N'], [700,1, 'J'],[711,0, 'W']] 我希望将a与b合并，这样最终结果将是（注：第一项和第二项=键）：今天，我将把它转换成一个数据帧，然后再转换回一个列表，但我想知道

什么是pythonic方法来合并多个键上的两个列表，而不使用字典或熊猫

鉴于：

a = [[700,0, 'A'], [700,1, 'B'],[704,0, 'C'],[704,1, 'A'],[709,0, 'A'],[710,0, 'A'],[711,0, 'A']]
b = [[700,0, 'N'], [700,1, 'J'],[711,0, 'W']]

我希望将a与b合并，这样最终结果将是（注：第一项和第二项=键）：

今天，我将把它转换成一个数据帧，然后再转换回一个列表，但我想知道是否有一种更简单的/pythonic的方法来实现这一点。key=i[0]和i[1上的字典将导致相对较长的代码

熊猫：

df_a = pd.DataFrame(a, columns=['id','sq','value'])
df_b = pd.DataFrame(b, columns=['id','sq','ext'])
d = pd.merge(df_a,df_b,left_on=['id','sq'], right_on=['id','sq'], how='outer').fillna('')
d.values.tolist()

我得到了想要的结果，但我想跳过熊猫的把戏

对两个列表进行排序，然后对其中一个列表进行传递，并将指针向上移动到另一个列表，以生成合并的list.O（nlogn）复杂性

def sortOnFirst2Cols(m):
    return sorted(m, key = lambda x: (x[0], x[1]))

sorted_a = sortOnFirst2Cols(a)
sorted_b = sortOnFirst2Cols(b)

merged = []
p = 0 # pointer in shorter list
for i in range(len(sorted_a)):
    if p >= len(sorted_b): break # we have reached the end of the other arr, no more items to merge

    if sorted_b[p][0] > sorted_a[i][0] or (sorted_b[p][0] == sorted_a[i][0] and sorted_b[p][1] > sorted_a[i][1]): 
        continue # we need to catch up to the pointer in the other arr

    if sorted_b[p][0] == sorted_a[i][0] and sorted_b[p][1] == sorted_a[i][1]:
        merged.append([*sorted_a[i], sorted_b[p][2]]) # splat (*) the arr from a, and merge with the 3rd item in the arr from b
    p+=1

print(merged)

它不是超干净的。我相信更具python风格的方法是利用现有的

zip

功能，或者是

itertools

中的一些方法

注意：此解决方案使用

“iterable解包运算符”如中所述，这需要Python 3.5或更高版本。如果无法做到这一点，只需使用数组文字符号和排序后的第0、第1和第2个arr项手动创建合并项，然后对两个列表进行排序，然后对其中一个进行传递并向上移动指针以生成合并列表。O（nlogn）复杂性

def sortOnFirst2Cols(m):
    return sorted(m, key = lambda x: (x[0], x[1]))

sorted_a = sortOnFirst2Cols(a)
sorted_b = sortOnFirst2Cols(b)

merged = []
p = 0 # pointer in shorter list
for i in range(len(sorted_a)):
    if p >= len(sorted_b): break # we have reached the end of the other arr, no more items to merge

    if sorted_b[p][0] > sorted_a[i][0] or (sorted_b[p][0] == sorted_a[i][0] and sorted_b[p][1] > sorted_a[i][1]): 
        continue # we need to catch up to the pointer in the other arr

    if sorted_b[p][0] == sorted_a[i][0] and sorted_b[p][1] == sorted_a[i][1]:
        merged.append([*sorted_a[i], sorted_b[p][2]]) # splat (*) the arr from a, and merge with the 3rd item in the arr from b
    p+=1

print(merged)

它不是超干净的。我相信更具python风格的方法是利用现有的

zip

功能，或者是

itertools

中的一些方法

注意：此解决方案使用

“iterable解包运算符”如中所述，它需要Python 3.5或更高版本。如果不可能，只需使用数组文字表示法手动创建合并项，并使用排序后的第0、第1和第2个arr项。您可以使用带有自定义键的

groupby

：

from itertools import groupby, chain

a = [[700,0, 'A'], [700,1, 'B'],[704,0, 'C'],[704,1, 'A'],[709,0, 'A'],[710,0, 'A'],[711,0, 'A']]
b = [[700,0, 'N'], [700,1, 'J'],[711,0, 'W']]

ab = [*a, *b]

ab = sorted(ab, key=lambda x:x[:2])

grouped = groupby(ab, key=lambda x:x[:2])

output = []
for k, g in grouped:
    output.append(k + [x for x in chain.from_iterable(g) if x not in k])

输出

[[700, 0, 'A', 'N'],
 [700, 1, 'B', 'J'],
 [704, 0, 'C'],
 [704, 1, 'A'],
 [709, 0, 'A'],
 [710, 0, 'A'],
 [711, 0, 'A', 'W']]

您可以使用带有自定义键的

groupby

：

from itertools import groupby, chain

a = [[700,0, 'A'], [700,1, 'B'],[704,0, 'C'],[704,1, 'A'],[709,0, 'A'],[710,0, 'A'],[711,0, 'A']]
b = [[700,0, 'N'], [700,1, 'J'],[711,0, 'W']]

ab = [*a, *b]

ab = sorted(ab, key=lambda x:x[:2])

grouped = groupby(ab, key=lambda x:x[:2])

output = []
for k, g in grouped:
    output.append(k + [x for x in chain.from_iterable(g) if x not in k])

输出

[[700, 0, 'A', 'N'],
 [700, 1, 'B', 'J'],
 [704, 0, 'C'],
 [704, 1, 'A'],
 [709, 0, 'A'],
 [710, 0, 'A'],
 [711, 0, 'A', 'W']]

你可以用

您还可以使用相同的排序，然后使用dict组合子列表，然后将键和值展平到所需的子列表中：

di={}
for sl in sorted(a+b, key=kf):
    di.setdefault(tuple(sl[0:2]),[]).extend(sl[2:])
    
result=[list(k)+v for k,v in di.items()]

在这两种情况下：

>>> result
[[700, 0, 'A', 'N'], [700, 1, 'B', 'J'], 
 [704, 0, 'C'], [704, 1, 'A'], 
 [709, 0, 'A'], [710, 0, 'A'], [711, 0, 'A', 'W']]

如果顺序很重要，则需要使用Python 3.6+作为dict方法，或者使用OrderedDict作为早期版本。

您可以使用

您还可以使用相同的排序，然后使用dict组合子列表，然后将键和值展平到所需的子列表中：

di={}
for sl in sorted(a+b, key=kf):
    di.setdefault(tuple(sl[0:2]),[]).extend(sl[2:])
    
result=[list(k)+v for k,v in di.items()]

在这两种情况下：

>>> result
[[700, 0, 'A', 'N'], [700, 1, 'B', 'J'], 
 [704, 0, 'C'], [704, 1, 'A'], 
 [709, 0, 'A'], [710, 0, 'A'], [711, 0, 'A', 'W']]

如果顺序很重要，那么您需要使用Python 3.6+作为dict方法，或者使用OrderedDict作为早期版本。

使用defaultdict有一种简单的方法：

from collections import defaultdict

res = defaultdict(lambda: ['', ''])

for i, vals in enumerate([a, b]):
    for *key, v in vals:
        res[tuple(key)][i] = v

final_res = [[*k, *v] for k, v in res.items()]

使用defaultdict有一种简单的方法：

from collections import defaultdict

res = defaultdict(lambda: ['', ''])

for i, vals in enumerate([a, b]):
    for *key, v in vals:
        res[tuple(key)][i] = v

final_res = [[*k, *v] for k, v in res.items()]

您是否只想在前两项（例如700，0）上合并它们？如果没有附加值，您是否真的需要列表中的空“”？@Chris nope，它可以是空的或其他任何内容，最好是json可通过的内容（而不是nan）您是否只想在前两项（例如700，0）上合并它们？如果没有附加值，你真的需要列表中的空“”吗？@Chris nope，它可以是空的，或者其他什么，最好是json可通过的任何东西（不是nan）。你的代码不会产生相同的期望结果。此外，正如你提到的，（我同意），它不如熊猫把戏。好吧，它现在起作用了。它不能完全满足规范，因为它只找到交集。这个线程中的其他答案应该更好。你的代码不会产生相同的预期结果。此外，正如你提到的，（我同意），它不如熊猫把戏。好吧，它现在可以工作了。它不能完全满足规范要求，因为它只找到交叉点。这个线程中的其他答案应该更好。整洁地处理

案例！向上投票。@acushner感谢您的输入。+1结果（res）不会产生相同的预期结果（但我知道如何从这里开始）@ProcolHarum说得好，edited@acushner太好了！谢谢。我学到了一些新的东西。

案例处理得很好！投票通过。@acushner感谢您的输入。+1虽然结果（res）不会产生相同的预期结果（但我知道如何从这里获得它）@前庭很好，edited@acushner太好了！谢谢。我学到了一些新东西。